�ݺ�ߣ

Institute of Information Technology
University of Dhaka
SELECTION AND REPRESENTATION OF
ATTRIBUTES FOR SOFTWARE DEFECT PREDICTION
Supervised by
Dr. Mohammad Shoyaib
Associate Professor
Presented by
Sadia Sharmin
BSSE-0426

CONTENTS
 Background
 Motivation
 Problem Specification
 Objectives of Research
 Literature Review
 Methodology
 Result Analysis and Discussion
 Future Work
2January2016
2

BACKGROUND
 Software Defect
 Any flaw or imperfection in a software work product or software
process
 Software Defect Prediction
 An approach to find out the defected part earlier before
testing/releasing the product
2January2016
3

AN OVERVIEW OF SOFTWARE DEFECT PREDICTION PROCESS
2January2016
4
Data Set
Pre-
processing
Attribute
Selection
Testing Data
Prediction
Result
Training
Data
Prediction
Model
Training

MOTIVATION
Identifying the software bugs in an early stage
Allocating the test resources efficiently
Minimizing the cost of software development
Improving the quality and productivity of software
2January2016
5

WHY NEED PRE-PROCESSING
 Noisy Data
 Outliers
 Missing value or Conflicting value
 Inconsistency
2January2016
6

WHY NEED ATTRIBUTE SELECTION
 Attributes are not equally important
 No standard set of attributes
2January2016
7

OBJECTIVES OF RESEARCH
 To find out how the existing pre-processing can be used with the
attribute selection methods more efficiently.
 To survey the existing methods and propose a proper attribute
selection method.
2January2016
8

A GENERAL SOFTWARE DEFECT-PRONENESS
PREDICTION FRAMEWORK [1]
 Defect prediction framework :
 Data pre-processor: Log-filtering
 Feature selector: Forward Selection , Backward Elimination
 Learning algorithms : Naïve Bayes, J48, OneR
2January2016
9

A GENERAL SOFTWARE DEFECT-PRONENESS
PREDICTION FRAMEWORK [1]
 Small changes to data representation can have a major impact
 Feature selection one attribute at a time is not a practical solution for
large datasets
 Different learning schemes should be chosen carefully for different
datasets
 There is no clear indication about which combination should be used
for a particular dataset
2January2016
10

HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR
DEFECT PREDICTION?[2]
 Five filter-based feature ranking technique
 Methodology
 Min-max normalization
 Pair of each independent attribute and class attribute
 Ranking the attribute
 Subset selection (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 20)
2January2016
11

HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR
DEFECT PREDICTION?[2]
 Three metrics on average can be enough for building an effective
prediction model
 Eliminating 98.5% of the available metrics improves the result
 It is not confirmed that it will work with all datasets
2January2016
12

CHOOSING SOFTWARE METRICS FOR DEFECT PREDICTION: AN
INVESTIGATION ON FEATURE SELECTION TECHNIQUES[3]
 Hybrid attribute selection approach
 Feature ranking
 Feature subset selection
 Removal of 85% metrics can enhance the performance of the
prediction model
2January2016
13

METHODOLOGY
SAL: Selection of Attribute with Log filtering
2January2016
14
Pre-process
the data with
logarithmic
filter
Rank the
Attribute
Select the
best set of
attributes
Build the
predictor

PRE-PROCESSING
2January2016
15
ln (n + 𝜖) where
𝜖 = 0.01

ATTRIBUTE RANKING
2January2016
16
A1
A2
A3
A4
A5
…
…
…
An

ATTRIBUTE RANKING
2January2016
17
A1
A2
A3
A4
A5
…
…
…
An
A1 0.564
A2 0.764
A3 0.685
A4 0.798
A5 0.892
… …….
… …….
An 0.789
Individual
Balance
value

ATTRIBUTE RANKING
2January2016
18
A1
A2
A3
A4
A5
…
…
…
An
Individual
Balance
value
A1
A2
A3
A4
A5
…
…
…
An
A1A2
A1A3
…….
…….
A3A1
A3A2
…….
…….
AmAn
Pair wise
combination
A1 0.564
A2 0.764
A3 0.685
A4 0.798
A5 0.892
… …….
… …….
An 0.789

ATTRIBUTE RANKING
2January2016
19
A1
A2
A3
A4
A5
…
…
…
An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
… …….
… …….
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5
…
…
…
An
A1A2
A1A3
…….
…….
A3A1
A3A2
…….
…….
AmAn
Pair wise
combination
A1A2 0.896
A1A3 0.734
…… …..
…… …..
A3A1 0.587
A3A2 0.669
…… …..
…… …..
AmAn 0.897
Pair wise
Balance
value

ATTRIBUTE RANKING
2January2016
20
A1
A2
A3
A4
A5
…
…
…
An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
… …….
… …….
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5
…
…
…
An
A1A2
A1A3
…….
…….
A3A1
A3A2
…….
…….
AmAn
Pair wise
combination
Pair wise
Balance
value
Average
Balance
value
for each
attribute
A1A2 0.896
A1A3 0.734
…… …..
…… …..
A3A1 0.587
A3A2 0.669
…… …..
…… …..
AmAn 0.897
A1 0.765
A2 0.534
A3 0.679
A5 0.987
A4 0.869
… .…..
… .…..
An 0.897

ATTRIBUTE RANKING
2January2016
21
A1
A2
A3
A4
A5
…
…
…
An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
… …….
… …….
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5
…
…
…
An
A1A2
A1A3
…….
…….
A3A1
A3A2
…….
…….
AmAn
Pair wise
combination
Pair wise
Balance
value
Average
Balance
value
for each
attribute
Average Balance Value = (Individual
value +
Average value of n pair)/2
A1 0.765
A2 0.534
A3 0.679
A5 0.987
A4 0.869
… .…..
… .…..
An 0.897
A1A2 0.896
A1A3 0.734
…… …..
…… …..
A3A1 0.587
A3A2 0.669
…… …..
…… …..
AmAn 0.897

ATTRIBUTE RANKING
2January2016
22
A1
A2
A3
A4
A5
…
…
…
An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
… …….
… …….
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5
…
…
…
An
A1A2
A1A3
…….
…….
A3A1
A3A2
…….
…….
AmAn
Pair wise
combination
Pair wise
Balance
value
A1 0.765
A2 0.534
A3 0.679
A5 0.887
A4 0.869
… .…..
… .…..
An 0.897
Average
Balance
value
For each
attribute A5 0.887
A4 0.869
A10 0.765
A8 0.750
A9 0.696
… .…..
… .…..
An 0.523
Sorted
Balance value
in decreasing
order
A1A2 0.896
A1A3 0.734
…… …..
…… …..
A3A1 0.587
A3A2 0.669
…… …..
…… …..
AmAn 0.897

SELECT BEST SET OF ATTRIBUTES
2January2016
23
A5
A4
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
Best Set of Attributes

2January2016
24
A5
A4
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes

2January2016
25
A5
A4
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes

2January2016
26
A4
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887

2January2016
27
A4
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887

2January2016
28
A4
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887

2January2016
29
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887
A4 2nd ranked

2January2016
30
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887
A4 2nd ranked
A5A4

2January2016
31
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)
Combined
Balance value

2January2016
32
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)
Combined
Balance value
new value >
previous value

2January2016
33
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5
A5 1st ranked 0.887
A4 2nd ranked

2January2016
34
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
A5A4 0.891

2January2016
35
A10
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
A5A4 0.891

2January2016
36
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
A5A4 0.891
A10 3rd ranked

2January2016
37
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
A5A4 0.891
A10 3rd ranked
A5A4A10

2January2016
38
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
A5A4 0.891
A10 3rd ranked
A5A4A10 0.856 (new)
Combined
Balance value

2January2016
39
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
A5A4 0.891 (previous)
A10 3rd ranked
A5A4A10 0.856 (new)
Combined
Balance value
new value <
previous value

2January2016
40
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
A5A4 0.891
A10 3rd ranked
Discarded

2January2016
41
A8
A9
.…..
.…..
An
Ranking of
Attributes
A5,A4
Continue this process…….

2January2016
42
A5,A4,A9,A12,A7

PERFORMANCE MEASUREMENT SCALES
2January2016
43
Confusion Matrix
Predicted
Actual
TP FN
FP TN
False Positive rate
TruePositiverate
0
1
1
Area Under the ROC curve (AUC)

RESULT AND DISCUSSIONS
 Data set : NASA MDP repository and PROMISE repository
 Classifier : Naïve Bayes
 Performance Metrics : Balance , AUC (Area Under the ROC Curve)
 Programming Language : Java
 Machine Learning Tool : WEKA
2January2016
44

2January2016
45
Comparison of
AUC values of
different methods
Date set Wahono
[4]
Abaei
[5]
Ren [6]
Lowest Highest
CM1 0.702 0.723 0.550 0.724 0.7946
KC1 0.79 0.790 0.592 0.800 0.8006
KC2 - - 0.591 0.796 0.8449
KC3 0.677 - 0.569 0.713 0.8322
KC4 - - - - 0.8059
MC1 - - - - 0.8110
MC2 0.739 - - - 0.7340
MW1 0.724 - 0.534 0.725 0.7340
PC1 0.799 - 0.692 0.882 0.8369
PC2 0.805 - - - 0.8668
PC3 0.78 0.795 - - 0.8068
PC4 0.861 - - - 0.9049
PC5 - - - - 0.9624
JM1 - 0.717 - - 0.7167
AR1 - - - - 0.8167
AR3 - - 0.580 0.699 0.8590
AR4 - - 0.555 0.671 0.8681
AR5 - - 0.614 0.722 0.925
AR6 - - - - 0.7566

Dataset Song [1] Wang [7] Jobaer
[8]
CM1 0.695 0.663 0.5500 0.680
JM1 0.585 0.678 - 0.6152
KC1 0.707 0.718 - 0.7244
KC2 - 0.753 - 0.7835
KC3 0.708 0.693 0.6037 0.7529
KC4 0.691 - - 0.7036
MC1 0.793 - - 0.6904
MC2 0.614 0.620 - 0.6847
MW1 0.661 0.636 0.7202 0.6577
PC1 0.668 0.688 0.5719 0.7040
PC2 - - 0.7046 0.7468
PC3 0.711 0.749 0.7114 0.7232
PC4 0.821 0.854 0.7450 0.8272
PC5 0.904 - - 0.9046
AR1 0.411 - - 0.6651
AR3 0.661 - - 0.8238
AR4 0.683 - - 0.7051
AR6 0.492 - - 0.5471
2January2016
46
Comparison of
Balance values of
different methods

FUTURE WORK
 Cross-project defect prediction
 Using other publicly available datasets
2January2016
47

REFERENCES
2January2016
48
[1] Song, Qinbao, Zihan Jia, Martin Shepperd, Shi Ying, and Shi Ying Jin Liu. "A general
software defect-proneness prediction framework." Software Engineering, IEEE Transactions on
37, no. 3 (2011): 356-370
[2] Wang, Huanjing, Taghi M. Khoshgoftaar, and Naeem Seliya. "How many software metrics
should be selected for defect prediction?" In FLAIRS Conference. 2011
[3] Gao, Kehan, Taghi M. Khoshgoftaar, and Huanjing Wang. "An empirical investigation of
filter attribute selection techniques for software quality classification." In Information Reuse &
Integration, 2009. IRI'09. IEEE International Conference on, pp. 272-277. IEEE, 2009.
[4] Wahono, Romi Satria, and Nanna Suryana Herman. "Genetic Feature Selection for
Software Defect Prediction." Advanced Science Letters 20, no. 1 (2014): 239-244.
[5] Abaei, Golnoush, and Ali Selamat. "A survey on software fault detection based on different
prediction approaches." Vietnam Journal of Computer Science 1, no. 2 (2014): 79-95.
[6] Ren, Jinsheng, Ke Qin, Ying Ma, and Guangchun Luo. "On software defect prediction using
machine learning." Journal of Applied Mathematics 2014 (2014).

REFERENCES
[7] Wang, Shuo, and Xin Yao. "Using class imbalance learning for software defect prediction."
Reliability, IEEE Transactions on 62, no. 2 (2013): 434-443.
[8] Khan, Jobaer, Alim Ul Gias, Md Saeed Siddik, Md Hafizur Rahman, Shah Mostafa Khaled,
and Mohammad Shoyaib. "An attribute selection process for software defect prediction." In
Informatics, Electronics & Vision (ICIEV), 2014 International Conference on, pp. 1-4. IEEE,
2014
2January2016
49

�ݺ�ߣ

Thesis Final Presentation

Recommended

More Related Content

Similar to Thesis Final Presentation (20)

Recently uploaded (20)

Thesis Final Presentation