�ݺ�ߣ

1/21
Kürşat İnce
HAVELSAN A.Ş.
Yakup Genç
Gebze Teknik Üniversitesi
Data Analysis for Automobile
Brake Fluid Fill Process Leakage
Detection using Machine Learning
Methods

2/21
Agenda
• Leakage Detection in Automobiles
• Background Information
• Sensor Data and Dataset Preparation
• Methods and Evaluation Metrics
• Results and Discussion
• Conclusion

3/21
Leakage Detection in Automobiles
Filling Station: Brake fluid, power steering fluid, and coolant.

4/21
Filling Process
• Vacuum
• Vacuum pressure drops to 3 mbars in about 50-60 seconds.
• Wait 5 secs.
• Pressure should not increase more than 0.5 mbars.
• Fill
• Filling fluid is pumped into the system. 600-700 ml of brake fluid, or
7-8 l. of coolant.
• The operator does some fixing at the stations.
• If not fixable, the car is moved to the exit.

5/21
Some Background
• Renault & Nissan, “Alliance Vehicle Evaluation Standard (AVES),” 2001
• R. S. Peres, et.al, “Multistage quality control using machine learning in the
automotive industry,” IEEE Access, vol. 7, pp. 79908–79916, 2019
• K. Chen, et.al., “Prediction of weld bead geometry of mag welding based on
XGBoost algorithm,” The International Journal of Advanced Manufacturing
Technology, vol. 101, pp. 2283–2295, Apr 2019.
• E. Ardıç and Y. Genç, “Classification of 1D signals using Deep Neural Networks,”
in 2018 26th Signal Processing and Communications Applications Conference
(SIU), May 2018.

6/21
Discussion for the Background
• Quality control using machine learning methods on factory floor is an active
research area.
• Gradient boosting and random forest provides better results than any other
classical machine learning methods.
• Deep Neural Networks (CNN, etc.) are getting attention by the researchers.
• K. Chen et.al.: XGBoost based models are more interpretable than other black
box models, such as CNNs.

7/21
Sensor Data
• Time series data:
• operation: Type of operation (either ’vacuum’ or ’fill’)
• time: Timestamp for the sensor reading
• machineID: Filling machine identifier
• chassis: Automobile chassis number
• vacuumpressure: Pressure sensor reading during vacuuming
• fillamount: Volume of the filling fluid
• fillpresure: Pressure sensor reading during filling
• 12.151.666 readings at 2 Hz.
• 51250 unique car chassis.

8/21
Samples from the Raw Data

9/21
Samples from the Raw Data

10/21
Dataset Preparation
• Time between consecutive cycles is greater than 2000
milliseconds.
• For each cycle data, extract vacuumpressure, fillamount,
and fillpressure, and a label.
• If more cycles are observed for the same chasis in the
future, label that cycle as 1 (leakage/failed), otherwise label
it as 0 (successful).
• Vectorize each cycle into 200 readings for vacuumpressure,
130 readings for fillamount, and 130 readings for fill
pressure, constructing a 460 feature vector for each cycle.
• The resulting dataset has 53.254 samples, with 51.250
negative (%96.23) and 2004 (%3.77) positive samples.

11/21
Machine Learning Methods
• Random Forest Classifier
• Ensemble learning method
• Random selection of features
• Implementation: Scikit-learn
• Gradient Boosting Classifier
• Ensemble of weak prediction models
• Fit pseudo-residuals
• Implementations: XGBoost, and CatBoost
• Gaussian Process Classifier
• Based on stochastic process
• Non-parametric → Expressive
• Implementation: Scikit-learn

12/21
Convolutional Neural Networks
• Convolutional Neural Networks:
• Deep learning architecture
• Convolutional layers followed by pooling layers
• Extract features from raw data.
• Implemetation: Keras deep learning library on top of Theano library

13/21
Evaluation Metrics
• Accuracy (ACC): In an imbalanced dataset, using accuracy
as a sole evaluation metric can be misleading.
• The default classification, i.e. predicting each sample as 0
(successful) results 96.23% accuracy.
• Area Under the ROC Curve (AUC)

14/21
Evaluation Metrics – continued
• Matthews Correlation Coefficient (MCC): More informative
than confusion matrix measures, i.e. TP, TN, FP, FN
• We report ACC, but watch for AUC, and MCC.

15/21
Experimentation
• Data preparation
• Use grid search to optimize model parameters
• Evaluate with 5-fold stratified cross validation:
• Train model
• Evaluate model
• Report evaluation average

16/21
Model Parameters
Model Defaults Grid Search
Random Forest max_features: ’auto’,
n_estimators: 10,
bootstrap: True,
max_depth: None
max_features: ’sqrt’,
n_estimators: 500,
bootstrap: False,
max_depth: 10
XGBoost base_score:0.5,
learning_rate:0.1,
n_estimate:100
base_score:0.05,
learning_rate:0.01,
n_estimate:500
CatBoost learning_rate: 0.03,
iterations: 1000
learning_rate: 0.01,
iterations: 2000

17/21
Classical ML Method Results
Method ACC AUC MCC
Random Forest
(defaults)
0.9915 0.9354 0.8821
Random Forest
(grid search)
0.9923 0.9476 0.8948
XGBoost
(defaults)
0.9925 0.9501 0.8965
XGBoost
(grid search)
0.9928 0.9497 0.9004
CatBoost
(defaults)
0.9929 0.9522 0.9024
CatBoost
(grid search)
0.9930 0.9522 0.9038
GPC w/ PCA
(17 components)
0.9883 0.9503 0.8490
GPC w/ PCA
(29 components)
0.9893 0.9392 0.8571

18/21
CNN Model Results
Method ACC AUC MCC
CNN_model1 0.9915 0.9445 0.8834
CNN_model2 0.9927 0.9473 0.8983
CNN_model3 0.9925 0.9474 0.8957
CNN_model4 0.9916 0.9386 0.8828
CNN_model5 0.9867 0.9504 0.8394
CNN_model6 0.9906 0.9337 0.8702
CNN_model7 0.9915 0.9438 0.8835

19/21
Results and Discussion
• XGBoost and CatBoost show 5%-7% better performance
• CatBoost is slightly better (~0.3%).
• Although Gaussian process is a powerful technique, it’s
efficiency and run time performance is debatable.
• O(n3) run time performance and memory requirement
• CNN architectures
• Current architectures are not deep enough to perform as good as
gradient boosting classifiers.

20/21
Conclusion
• Time series data from automobile industry
• Binary classification using random forest, gradient boosting,
and Gaussian process classifiers, and CNN deep learning
architectures.
• Gradient boosting methods give promising results.
• CNN did not learn to extract useful features:
• Quality of the time series data.
• Number of samples

21/21
Future Work
• Increase performance of DL methods.
• Data augmentation
• Deeper networks
• LSTM models
• CNN and LSTM in the same model

�ݺ�ߣ

Data Analysis for Automobile Brake Fluid Fill Process Leakage Detection using Machine Learning Methods

More Related Content

Data Analysis for Automobile Brake Fluid Fill Process Leakage Detection using Machine Learning Methods