際際滷

際際滷Share a Scribd company logo
1/21
K端rat 聴nce
HAVELSAN A..
Yakup Gen巽
Gebze Teknik niversitesi
Data Analysis for Automobile
Brake Fluid Fill Process Leakage
Detection using Machine Learning
Methods
2/21
Agenda
 Leakage Detection in Automobiles
 Background Information
 Sensor Data and Dataset Preparation
 Methods and Evaluation Metrics
 Results and Discussion
 Conclusion
3/21
Leakage Detection in Automobiles
Filling Station: Brake fluid, power steering fluid, and coolant.
4/21
Filling Process
 Vacuum
 Vacuum pressure drops to 3 mbars in about 50-60 seconds.
 Wait 5 secs.
 Pressure should not increase more than 0.5 mbars.
 Fill
 Filling fluid is pumped into the system. 600-700 ml of brake fluid, or
7-8 l. of coolant.
 The operator does some fixing at the stations.
 If not fixable, the car is moved to the exit.
5/21
Some Background
 Renault & Nissan, Alliance Vehicle Evaluation Standard (AVES), 2001
 R. S. Peres, et.al, Multistage quality control using machine learning in the
automotive industry, IEEE Access, vol. 7, pp. 7990879916, 2019
 K. Chen, et.al., Prediction of weld bead geometry of mag welding based on
XGBoost algorithm, The International Journal of Advanced Manufacturing
Technology, vol. 101, pp. 22832295, Apr 2019.
 E. Ard脹巽 and Y. Gen巽, Classification of 1D signals using Deep Neural Networks,
in 2018 26th Signal Processing and Communications Applications Conference
(SIU), May 2018.
6/21
Discussion for the Background
 Quality control using machine learning methods on factory floor is an active
research area.
 Gradient boosting and random forest provides better results than any other
classical machine learning methods.
 Deep Neural Networks (CNN, etc.) are getting attention by the researchers.
 K. Chen et.al.: XGBoost based models are more interpretable than other black
box models, such as CNNs.
7/21
Sensor Data
 Time series data:
 operation: Type of operation (either vacuum or fill)
 time: Timestamp for the sensor reading
 machineID: Filling machine identifier
 chassis: Automobile chassis number
 vacuumpressure: Pressure sensor reading during vacuuming
 fillamount: Volume of the filling fluid
 fillpresure: Pressure sensor reading during filling
 12.151.666 readings at 2 Hz.
 51250 unique car chassis.
8/21
Samples from the Raw Data
9/21
Samples from the Raw Data
10/21
Dataset Preparation
 Time between consecutive cycles is greater than 2000
milliseconds.
 For each cycle data, extract vacuumpressure, fillamount,
and fillpressure, and a label.
 If more cycles are observed for the same chasis in the
future, label that cycle as 1 (leakage/failed), otherwise label
it as 0 (successful).
 Vectorize each cycle into 200 readings for vacuumpressure,
130 readings for fillamount, and 130 readings for fill
pressure, constructing a 460 feature vector for each cycle.
 The resulting dataset has 53.254 samples, with 51.250
negative (%96.23) and 2004 (%3.77) positive samples.
11/21
Machine Learning Methods
 Random Forest Classifier
 Ensemble learning method
 Random selection of features
 Implementation: Scikit-learn
 Gradient Boosting Classifier
 Ensemble of weak prediction models
 Fit pseudo-residuals
 Implementations: XGBoost, and CatBoost
 Gaussian Process Classifier
 Based on stochastic process
 Non-parametric  Expressive
 Implementation: Scikit-learn
12/21
Convolutional Neural Networks
 Convolutional Neural Networks:
 Deep learning architecture
 Convolutional layers followed by pooling layers
 Extract features from raw data.
 Implemetation: Keras deep learning library on top of Theano library
13/21
Evaluation Metrics
 Accuracy (ACC): In an imbalanced dataset, using accuracy
as a sole evaluation metric can be misleading.
 The default classification, i.e. predicting each sample as 0
(successful) results 96.23% accuracy.
 Area Under the ROC Curve (AUC)
14/21
Evaluation Metrics  continued
 Matthews Correlation Coefficient (MCC): More informative
than confusion matrix measures, i.e. TP, TN, FP, FN
 We report ACC, but watch for AUC, and MCC.
15/21
Experimentation
 Data preparation
 Use grid search to optimize model parameters
 Evaluate with 5-fold stratified cross validation:
 Train model
 Evaluate model
 Report evaluation average
16/21
Model Parameters
Model Defaults Grid Search
Random Forest max_features: auto,
n_estimators: 10,
bootstrap: True,
max_depth: None
max_features: sqrt,
n_estimators: 500,
bootstrap: False,
max_depth: 10
XGBoost base_score:0.5,
learning_rate:0.1,
n_estimate:100
base_score:0.05,
learning_rate:0.01,
n_estimate:500
CatBoost learning_rate: 0.03,
iterations: 1000
learning_rate: 0.01,
iterations: 2000
17/21
Classical ML Method Results
Method ACC AUC MCC
Random Forest
(defaults)
0.9915 0.9354 0.8821
Random Forest
(grid search)
0.9923 0.9476 0.8948
XGBoost
(defaults)
0.9925 0.9501 0.8965
XGBoost
(grid search)
0.9928 0.9497 0.9004
CatBoost
(defaults)
0.9929 0.9522 0.9024
CatBoost
(grid search)
0.9930 0.9522 0.9038
GPC w/ PCA
(17 components)
0.9883 0.9503 0.8490
GPC w/ PCA
(29 components)
0.9893 0.9392 0.8571
18/21
CNN Model Results
Method ACC AUC MCC
CNN_model1 0.9915 0.9445 0.8834
CNN_model2 0.9927 0.9473 0.8983
CNN_model3 0.9925 0.9474 0.8957
CNN_model4 0.9916 0.9386 0.8828
CNN_model5 0.9867 0.9504 0.8394
CNN_model6 0.9906 0.9337 0.8702
CNN_model7 0.9915 0.9438 0.8835
19/21
Results and Discussion
 XGBoost and CatBoost show 5%-7% better performance
 CatBoost is slightly better (~0.3%).
 Although Gaussian process is a powerful technique, its
efficiency and run time performance is debatable.
 O(n3) run time performance and memory requirement
 CNN architectures
 Current architectures are not deep enough to perform as good as
gradient boosting classifiers.
20/21
Conclusion
 Time series data from automobile industry
 Binary classification using random forest, gradient boosting,
and Gaussian process classifiers, and CNN deep learning
architectures.
 Gradient boosting methods give promising results.
 CNN did not learn to extract useful features:
 Quality of the time series data.
 Number of samples
21/21
Future Work
 Increase performance of DL methods.
 Data augmentation
 Deeper networks
 LSTM models
 CNN and LSTM in the same model
22/21
Thank you

More Related Content

Data Analysis for Automobile Brake Fluid Fill Process Leakage Detection using Machine Learning Methods

  • 1. 1/21 K端rat 聴nce HAVELSAN A.. Yakup Gen巽 Gebze Teknik niversitesi Data Analysis for Automobile Brake Fluid Fill Process Leakage Detection using Machine Learning Methods
  • 2. 2/21 Agenda Leakage Detection in Automobiles Background Information Sensor Data and Dataset Preparation Methods and Evaluation Metrics Results and Discussion Conclusion
  • 3. 3/21 Leakage Detection in Automobiles Filling Station: Brake fluid, power steering fluid, and coolant.
  • 4. 4/21 Filling Process Vacuum Vacuum pressure drops to 3 mbars in about 50-60 seconds. Wait 5 secs. Pressure should not increase more than 0.5 mbars. Fill Filling fluid is pumped into the system. 600-700 ml of brake fluid, or 7-8 l. of coolant. The operator does some fixing at the stations. If not fixable, the car is moved to the exit.
  • 5. 5/21 Some Background Renault & Nissan, Alliance Vehicle Evaluation Standard (AVES), 2001 R. S. Peres, et.al, Multistage quality control using machine learning in the automotive industry, IEEE Access, vol. 7, pp. 7990879916, 2019 K. Chen, et.al., Prediction of weld bead geometry of mag welding based on XGBoost algorithm, The International Journal of Advanced Manufacturing Technology, vol. 101, pp. 22832295, Apr 2019. E. Ard脹巽 and Y. Gen巽, Classification of 1D signals using Deep Neural Networks, in 2018 26th Signal Processing and Communications Applications Conference (SIU), May 2018.
  • 6. 6/21 Discussion for the Background Quality control using machine learning methods on factory floor is an active research area. Gradient boosting and random forest provides better results than any other classical machine learning methods. Deep Neural Networks (CNN, etc.) are getting attention by the researchers. K. Chen et.al.: XGBoost based models are more interpretable than other black box models, such as CNNs.
  • 7. 7/21 Sensor Data Time series data: operation: Type of operation (either vacuum or fill) time: Timestamp for the sensor reading machineID: Filling machine identifier chassis: Automobile chassis number vacuumpressure: Pressure sensor reading during vacuuming fillamount: Volume of the filling fluid fillpresure: Pressure sensor reading during filling 12.151.666 readings at 2 Hz. 51250 unique car chassis.
  • 10. 10/21 Dataset Preparation Time between consecutive cycles is greater than 2000 milliseconds. For each cycle data, extract vacuumpressure, fillamount, and fillpressure, and a label. If more cycles are observed for the same chasis in the future, label that cycle as 1 (leakage/failed), otherwise label it as 0 (successful). Vectorize each cycle into 200 readings for vacuumpressure, 130 readings for fillamount, and 130 readings for fill pressure, constructing a 460 feature vector for each cycle. The resulting dataset has 53.254 samples, with 51.250 negative (%96.23) and 2004 (%3.77) positive samples.
  • 11. 11/21 Machine Learning Methods Random Forest Classifier Ensemble learning method Random selection of features Implementation: Scikit-learn Gradient Boosting Classifier Ensemble of weak prediction models Fit pseudo-residuals Implementations: XGBoost, and CatBoost Gaussian Process Classifier Based on stochastic process Non-parametric Expressive Implementation: Scikit-learn
  • 12. 12/21 Convolutional Neural Networks Convolutional Neural Networks: Deep learning architecture Convolutional layers followed by pooling layers Extract features from raw data. Implemetation: Keras deep learning library on top of Theano library
  • 13. 13/21 Evaluation Metrics Accuracy (ACC): In an imbalanced dataset, using accuracy as a sole evaluation metric can be misleading. The default classification, i.e. predicting each sample as 0 (successful) results 96.23% accuracy. Area Under the ROC Curve (AUC)
  • 14. 14/21 Evaluation Metrics continued Matthews Correlation Coefficient (MCC): More informative than confusion matrix measures, i.e. TP, TN, FP, FN We report ACC, but watch for AUC, and MCC.
  • 15. 15/21 Experimentation Data preparation Use grid search to optimize model parameters Evaluate with 5-fold stratified cross validation: Train model Evaluate model Report evaluation average
  • 16. 16/21 Model Parameters Model Defaults Grid Search Random Forest max_features: auto, n_estimators: 10, bootstrap: True, max_depth: None max_features: sqrt, n_estimators: 500, bootstrap: False, max_depth: 10 XGBoost base_score:0.5, learning_rate:0.1, n_estimate:100 base_score:0.05, learning_rate:0.01, n_estimate:500 CatBoost learning_rate: 0.03, iterations: 1000 learning_rate: 0.01, iterations: 2000
  • 17. 17/21 Classical ML Method Results Method ACC AUC MCC Random Forest (defaults) 0.9915 0.9354 0.8821 Random Forest (grid search) 0.9923 0.9476 0.8948 XGBoost (defaults) 0.9925 0.9501 0.8965 XGBoost (grid search) 0.9928 0.9497 0.9004 CatBoost (defaults) 0.9929 0.9522 0.9024 CatBoost (grid search) 0.9930 0.9522 0.9038 GPC w/ PCA (17 components) 0.9883 0.9503 0.8490 GPC w/ PCA (29 components) 0.9893 0.9392 0.8571
  • 18. 18/21 CNN Model Results Method ACC AUC MCC CNN_model1 0.9915 0.9445 0.8834 CNN_model2 0.9927 0.9473 0.8983 CNN_model3 0.9925 0.9474 0.8957 CNN_model4 0.9916 0.9386 0.8828 CNN_model5 0.9867 0.9504 0.8394 CNN_model6 0.9906 0.9337 0.8702 CNN_model7 0.9915 0.9438 0.8835
  • 19. 19/21 Results and Discussion XGBoost and CatBoost show 5%-7% better performance CatBoost is slightly better (~0.3%). Although Gaussian process is a powerful technique, its efficiency and run time performance is debatable. O(n3) run time performance and memory requirement CNN architectures Current architectures are not deep enough to perform as good as gradient boosting classifiers.
  • 20. 20/21 Conclusion Time series data from automobile industry Binary classification using random forest, gradient boosting, and Gaussian process classifiers, and CNN deep learning architectures. Gradient boosting methods give promising results. CNN did not learn to extract useful features: Quality of the time series data. Number of samples
  • 21. 21/21 Future Work Increase performance of DL methods. Data augmentation Deeper networks LSTM models CNN and LSTM in the same model