�ݺ�ߣ

CAN WE PREDICT
BACKORDER?
(Supply Chain Data)
Salil Chawla, Atharva Nandurdikar , Piyush Srivastava, Anil Thadani, Siddharaj
Deshmukh

WHAT IS BACKORDER?
• Sometimes when a company makes an order for a good, a company or store
may have run out of stock in its inventory. A company who has run out of the
demanded product but has re-ordered the goods would promise its customers
that it would ship the goods when they become available. A customer who is
willing to wait for some time until the company has restocked the merchandise,
would have to place a back order. A back order only exists if customers are willing
to wait for the order.
• A customer order that cannot be filled when presented, and for which the
customer is prepared to wait for some time. The percentage of items backordered
and the number of backorder days are important measures of the quality of a
company's customer service and the effectiveness of its inventory management.

ABOUT DATA
• We have selected the data of (Global Garment Supply Chain Data) from datahub
• In total, it consists 23 attributes out of which 6 are binary, 1 notation and the other 16 are
numeric.
• Among the attributes, some represents the status of transiting, like: lead time of the
product (lead-time, as weeks) and amount of product in transit from source (in transit qty,
units).
• This data represents product supplier’s historical performance. Some suppliers have not
been scored and have a dummy value of -99 loaded.
• brief explanation and data types of the 23 attributes

DATA ISSUES
• Knowing from the data,1% of the products actually went on
backorder, which creates a big imbalance on the classes.
• It is a common problem for a real dataset.
• Preferably, it is required that a classifier provides a balanced
degree of predictive accuracy for both the minority and majority
classes on the dataset. However, in many standard learning
algorithms, it is found that classifiers tend to provide a severely
imbalanced degree of accuracy

First few rows (not normalized but NA values filled Using KNN)
Mention: Assumption is that min and max values are the only ones possible
and hence used that for normalization.

Variable selection
 After cleaning the data, we started applying the models
on different combination of variables.
 Accuracy of KNN model when applied over all the
variables was 77%.
 Need to select particular combination of variables for
better accuracy.
 Variable selection technique is very effective technique
to fetch best set of variables for a model.
 First step is to check which variables are ‘correlated’.

Findings of Correlation
• The dark Blue circles forming a diagonal line show that, a variable is highly
correlated with itself.
• However, there is high correlation between two sets containing three variables
each viz. forecast_3, 6 & 9_months and sales_1, 3,6,& 9_month.
• Choosing one variable from each of these sets will reduce multi-correlation and in
turn increase the accuracy of model.
• If there are K potential Independent Variables, there are 2k distinct subsets to be
tasted.
• How to choose best set of variables for a model?
• What are different techniques for variable selection?
To be continued…

C 5.0 for Variable Selection
• C 5.0 was Implemented over complete dataset to analyze
decision trees.
• It measured predictor importance by determining the
percentage of training set samples that fall into all the terminal
nodes after the split.
• First two predictors ‘national_inv’ ‘forecast_3_month’ in the
first split automatically bears 100% importance.
• Based on the best ‘category’ contained in an attribute, other
variables are ranked as shown in adjoining figure.

Random Forest for Variable
Selection
• To get better accuracy, we applied random
forest which decorrelated 200 trees (classifiers)
generated on different bootstrapped samples
from Training data.
• Averaging the trees helped us to reduce
‘variance’ & avoid ‘overfitting’.
• The out of bag estimate was calculated to find
mean prediction error on each training sample.
• The graph shows attributes in decreasing order of
‘Mean decrease accuracy’.
• Hence final variable set selected for all further
models is:
• 'national_inv', 'in_transit_qty', 'lead_time',
'forecast_3_month', 'sales_1_month', 'min_bank’,
'perf_6_month_avg’,'local_bo_qty','stop_auto_by’,
’deck_risk','went_on_backorder'

Naive Bayes
 It calculates the posterior probability for each class.
 The class with the highest posterior probability is the outcome of prediction.
Assumption:
It works based on the frequency count of each variable, so variables are
independent of each other.
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.

Naïve Bayes Result
Confusion Matrix:
Accuracy :
Highly Imbalanced Dataset
Balanced Dataset

Random Forest
 First applied on Highly Imbalanced data and then tried to Balanced it.
Results on Highly Imbalanced data:
 Used Random Forest because it performs well on classification
model, and it doesn’t require data to be normalized,
correlation doesn’t matter and works on binary split like decision
tree.
 We have tried to overcome the overfitting if any.

 Randomly selects 2/3 of training dataset as training and 1/3 as validation
internally for each tree.
 Random Variable selection – Square root of total number of all predictors for
classification.
 Picks Best Split from all the variables and classify leftover data(1/3) on each tree.
 Chooses the best classification having most votes over all the trees in the forest.
 Best tree with minimum error Rate is coming at 400 with
accuracy of 88.97%.
Confusion Matrix
and Accuracy
ERROR RATE FLOW

Tuned Random Forest
• We tried to find the optimal mtry and
ntree where the out of bag error rate
stabilizes and reach minimum.
• We got at ntree=500, mtry=6

Applying Un-weighted kNN model
(81.6%)

Applying Weighted kNN model
(84.2%)

Applying Neural Network model
 Although it is not mandatory to normalize the data, normalizing helps the
algorithm reach global minima with higher probability.
 Due to lack of convergence for neurons 3 and further we increased
stepmax and threshold values.

Accuracy of Neural Network
(47-50%)
 Optimal no. of hidden layers were found to be 2
 Trying with un-normalized values made little change in the results –
accuracy decreased.

Values of neurons:
Graph of Accuracy

Why clustering may not be good
for our data!!

What is Support Vector Machine ?
Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both
classification or regression challenges. However, it is mostly used in classification problems.
In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of
features you have) with the value of each feature being the value of a particular coordinate.
Then, we perform classification
by finding the hyper-plane that
differentiate the two classes
very well.

Maximal-Margin Classifier (hypothetical classifier)
The numeric input variables (x) in your data (the columns)
form an n-dimensional space. For example,
if you had two input variables, this would form a two-dimensional space.
B0 + (B1 * X1) + (B2 * X2) = 0
Where the coefficients (B1 and B2) that determine the slope
of the line and the intercept (B0) are found by the learning
algorithm, and X1 and X2 are the two input variables.
The distance between the line and the closest data points is
referred to as the margin
These points are called the support vectors.
They support or define the hyperplane.
By plugging in input values into the line equation, you can calculate whether a new point is above or below the line.
Above the line, the equation returns a value greater than 0 and the point belongs to the first class (class 0).
Below the line, the equation returns a value less than 0 and the point belongs to the second class (class 1).
A value close to the line returns a value close to zero and the point may be difficult to classify.

Soft Margin Classifier / Regularization parameter
This change allows some points in the training data
to violate the separating line.
The C parameters defines the amount of violation
of the margin allowed.
During the learning of the hyperplane from data, all training instances
that lie within the distance of the margin will affect the placement
of the hyperplane and are referred to as support vectors.
And as C affects the number of instances that are allowed to fall within the margin, C influences
the number of support vectors used by the model.
The smaller the value of C, the more sensitive the algorithm is to the training data (higher variance and lower
bias).
a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that
hyperplane misclassifies more points.
The larger the value of C, the less sensitive the algorithm is to the training data (lower variance and higher bias).
Large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job
of getting all the training points classified correctly.

Gamma
The gamma parameter defines how far the
influence of a single training example reaches,
with low values meaning ‘far’ and
high values meaning ‘close’.
Kernel
A powerful insight is that the linear SVM can be
rephrased using the inner product of any
two given observations, rather than
the observations themselves. The inner product between
two vectors is the sum of the multiplication of each pair of input
values
The inner product of the vectors [2, 3] and [5, 6] is 2*5 + 3*6
or 28
Radial Kernel SVM (Radial basis function)
K(x,xi) = exp(-gamma * sum((x – xi^2))
Where gamma is a parameter that must be specified to the learning algorithm. A good default value for gamma is 0.1, where gamma
is often 0 < gamma < 1. The radial kernel is very local and can create complex regions within the feature space, like closed polygons
in two-dimensional space.

Data Preparation for SVM
How to best prepare training data when
learning an SVM model.
Numerical Inputs: SVM assumes that our inputs
are numeric. If we have categorical inputs
we may need to covert them to binary dummy variables (one variable for each
category).
Binary Classification: Basic SVM is intended for binary (two-class) classification
problems. Although, extensions have been developed for regression and multi-class
classification.
R package : e1071 package

Business Recommendations:
National Inventory – Define min and max quantity levels at the national
Inventory level.
In Transit Quantity – Keep a track of amount of the product in transit from
the source, as delaying this would affect the business negatively.
Lead Time – Transit time required from the source needs to known well in
advance, and guaranteed.
Forecast 3 Months – Forecast Sales for the next 3 months of the product this
will help to avoid the product from going on backorder.
Sales 1 Month – Sales quantity for the prior 1 month gives us a good guess that how much quantity
would be expected in the current month.
Min_bank – A minimum recommended amount of stock needs to be defined.
Perf_6_moth_avg – Source performance for prior 6 month period from which the quantity is manufactured or is to be brought is to be
considered to avoid backordering.
Local back Order Quantity: Quantity of the particular product that went on backorder in local markets should be studied.
Stop Auto buy : Auto buying of products needs to be kept in check, keeping inventory in perspective.
Deck Risk : The product lying on deck (in shop, in ship containers) needs to be considered.

Algorithm Accuracy on
Imbalanced dataset
Accuracy on
Balanced Dataset
Naïve Bayes 16.3% 43.4%
Random Forest 98.9% 89.5%
kNN unweighted 98.9% 81.6%
Knn weighted 98.9 84.2%
Neural Network 13.4% 48%
SVM 62% 65%
Algorithm Summary

Conclusion
 Compared the accuracy of each algorithm
 Random forest was the best algorithm with an accuracy of 89.53%.
 More feature engineering or data balancing probably could have resulted in better
accuracies.

Reference
 http://www.listendata.com/2014/11/random-forest-with-r.html
 http://projekter.aau.dk/projekter/files/262657498/master_thesis.pdf
 https://old.datahub.io/dataset/global-garment-supply-chain-data

�ݺ�ߣ

Predict Backorder on a supply chain data for an Organization

More Related Content

Predict Backorder on a supply chain data for an Organization