際際滷

際際滷Share a Scribd company logo
Institute of Information Technology
University of Dhaka
SELECTION AND REPRESENTATION OF
ATTRIBUTES FOR SOFTWARE DEFECT PREDICTION
Supervised by
Dr. Mohammad Shoyaib
Associate Professor
Presented by
Sadia Sharmin
BSSE-0426
CONTENTS
 Background
 Motivation
 Problem Specification
 Objectives of Research
 Literature Review
 Methodology
 Result Analysis and Discussion
 Future Work
2January2016
2
BACKGROUND
 Software Defect
 Any flaw or imperfection in a software work product or software
process
 Software Defect Prediction
 An approach to find out the defected part earlier before
testing/releasing the product
2January2016
3
AN OVERVIEW OF SOFTWARE DEFECT PREDICTION PROCESS
2January2016
4
Data Set
Pre-
processing
Attribute
Selection
Testing Data
Prediction
Result
Training
Data
Prediction
Model
Training
MOTIVATION
Identifying the software bugs in an early stage
Allocating the test resources efficiently
Minimizing the cost of software development
Improving the quality and productivity of software
2January2016
5
WHY NEED PRE-PROCESSING
 Noisy Data
 Outliers
 Missing value or Conflicting value
 Inconsistency
2January2016
6
WHY NEED ATTRIBUTE SELECTION
 Attributes are not equally important
 No standard set of attributes
2January2016
7
OBJECTIVES OF RESEARCH
 To find out how the existing pre-processing can be used with the
attribute selection methods more efficiently.
 To survey the existing methods and propose a proper attribute
selection method.
2January2016
8
A GENERAL SOFTWARE DEFECT-PRONENESS
PREDICTION FRAMEWORK [1]
 Defect prediction framework :
 Data pre-processor: Log-filtering
 Feature selector: Forward Selection , Backward Elimination
 Learning algorithms : Na誰ve Bayes, J48, OneR
2January2016
9
A GENERAL SOFTWARE DEFECT-PRONENESS
PREDICTION FRAMEWORK [1]
 Small changes to data representation can have a major impact
 Feature selection one attribute at a time is not a practical solution for
large datasets
 Different learning schemes should be chosen carefully for different
datasets
 There is no clear indication about which combination should be used
for a particular dataset
2January2016
10
HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR
DEFECT PREDICTION?[2]
 Five filter-based feature ranking technique
 Methodology
 Min-max normalization
 Pair of each independent attribute and class attribute
 Ranking the attribute
 Subset selection (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 20)
2January2016
11
HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR
DEFECT PREDICTION?[2]
 Three metrics on average can be enough for building an effective
prediction model
 Eliminating 98.5% of the available metrics improves the result
 It is not confirmed that it will work with all datasets
2January2016
12
CHOOSING SOFTWARE METRICS FOR DEFECT PREDICTION: AN
INVESTIGATION ON FEATURE SELECTION TECHNIQUES[3]
 Hybrid attribute selection approach
 Feature ranking
 Feature subset selection
 Removal of 85% metrics can enhance the performance of the
prediction model
2January2016
13
METHODOLOGY
SAL: Selection of Attribute with Log filtering
2January2016
14
Pre-process
the data with
logarithmic
filter
Rank the
Attribute
Select the
best set of
attributes
Build the
predictor
PRE-PROCESSING
2January2016
15
ln (n + ) where
 = 0.01
ATTRIBUTE RANKING
2January2016
16
A1
A2
A3
A4
A5



An
ATTRIBUTE RANKING
2January2016
17
A1
A2
A3
A4
A5



An
A1 0.564
A2 0.764
A3 0.685
A4 0.798
A5 0.892
 .
 .
An 0.789
Individual
Balance
value
ATTRIBUTE RANKING
2January2016
18
A1
A2
A3
A4
A5



An
Individual
Balance
value
A1
A2
A3
A4
A5



An
A1A2
A1A3
.
.
A3A1
A3A2
.
.
AmAn
Pair wise
combination
A1 0.564
A2 0.764
A3 0.685
A4 0.798
A5 0.892
 .
 .
An 0.789
ATTRIBUTE RANKING
2January2016
19
A1
A2
A3
A4
A5



An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
 .
 .
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5



An
A1A2
A1A3
.
.
A3A1
A3A2
.
.
AmAn
Pair wise
combination
A1A2 0.896
A1A3 0.734
 ..
 ..
A3A1 0.587
A3A2 0.669
 ..
 ..
AmAn 0.897
Pair wise
Balance
value
ATTRIBUTE RANKING
2January2016
20
A1
A2
A3
A4
A5



An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
 .
 .
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5



An
A1A2
A1A3
.
.
A3A1
A3A2
.
.
AmAn
Pair wise
combination
Pair wise
Balance
value
Average
Balance
value
for each
attribute
A1A2 0.896
A1A3 0.734
 ..
 ..
A3A1 0.587
A3A2 0.669
 ..
 ..
AmAn 0.897
A1 0.765
A2 0.534
A3 0.679
A5 0.987
A4 0.869
 ...
 ...
An 0.897
ATTRIBUTE RANKING
2January2016
21
A1
A2
A3
A4
A5



An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
 .
 .
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5



An
A1A2
A1A3
.
.
A3A1
A3A2
.
.
AmAn
Pair wise
combination
Pair wise
Balance
value
Average
Balance
value
for each
attribute
Average Balance Value = (Individual
value +
Average value of n pair)/2
A1 0.765
A2 0.534
A3 0.679
A5 0.987
A4 0.869
 ...
 ...
An 0.897
A1A2 0.896
A1A3 0.734
 ..
 ..
A3A1 0.587
A3A2 0.669
 ..
 ..
AmAn 0.897
ATTRIBUTE RANKING
2January2016
22
A1
A2
A3
A4
A5



An
A1 0.034
A2 0.034
A3 0.456
A4 0.348
A5 0.784
 .
 .
An 0.789
Individual
Balance
value
A1
A2
A3
A4
A5



An
A1A2
A1A3
.
.
A3A1
A3A2
.
.
AmAn
Pair wise
combination
Pair wise
Balance
value
A1 0.765
A2 0.534
A3 0.679
A5 0.887
A4 0.869
 ...
 ...
An 0.897
Average
Balance
value
For each
attribute A5 0.887
A4 0.869
A10 0.765
A8 0.750
A9 0.696
 ...
 ...
An 0.523
Sorted
Balance value
in decreasing
order
A1A2 0.896
A1A3 0.734
 ..
 ..
A3A1 0.587
A3A2 0.669
 ..
 ..
AmAn 0.897
SELECT BEST SET OF ATTRIBUTES
2January2016
23
A5
A4
A10
A8
A9
...
...
An
Ranking of
Attributes
Best Set of Attributes
SELECT BEST SET OF ATTRIBUTES
2January2016
24
A5
A4
A10
A8
A9
...
...
An
Ranking of
Attributes
Best Set of Attributes
SELECT BEST SET OF ATTRIBUTES
2January2016
25
A5
A4
A10
A8
A9
...
...
An
Ranking of
Attributes
Best Set of Attributes
SELECT BEST SET OF ATTRIBUTES
2January2016
26
A4
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
SELECT BEST SET OF ATTRIBUTES
2January2016
27
A4
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
SELECT BEST SET OF ATTRIBUTES
2January2016
28
A4
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
SELECT BEST SET OF ATTRIBUTES
2January2016
29
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
SELECT BEST SET OF ATTRIBUTES
2January2016
30
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
A5A4
SELECT BEST SET OF ATTRIBUTES
2January2016
31
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)
Combined
Balance value
SELECT BEST SET OF ATTRIBUTES
2January2016
32
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)
Combined
Balance value
new value >
previous value
SELECT BEST SET OF ATTRIBUTES
2January2016
33
A10
A8
A9
...
...
An
Ranking of
Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
SELECT BEST SET OF ATTRIBUTES
2January2016
34
A10
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
SELECT BEST SET OF ATTRIBUTES
2January2016
35
A10
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
SELECT BEST SET OF ATTRIBUTES
2January2016
36
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
SELECT BEST SET OF ATTRIBUTES
2January2016
37
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
A5A4A10
SELECT BEST SET OF ATTRIBUTES
2January2016
38
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
A5A4A10 0.856 (new)
Combined
Balance value
SELECT BEST SET OF ATTRIBUTES
2January2016
39
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
A5A4 0.891 (previous)
A10 3rd ranked
A5A4A10 0.856 (new)
Combined
Balance value
new value <
previous value
SELECT BEST SET OF ATTRIBUTES
2January2016
40
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
Discarded
SELECT BEST SET OF ATTRIBUTES
2January2016
41
A8
A9
...
...
An
Ranking of
Attributes
A5,A4
Best Set of Attributes
Continue this process.
SELECT BEST SET OF ATTRIBUTES
2January2016
42
A5,A4,A9,A12,A7
Best Set of Attributes
PERFORMANCE MEASUREMENT SCALES
2January2016
43
Confusion Matrix
Predicted
Actual
TP FN
FP TN
False Positive rate
TruePositiverate
0
1
1
Area Under the ROC curve (AUC)
RESULT AND DISCUSSIONS
 Data set : NASA MDP repository and PROMISE repository
 Classifier : Na誰ve Bayes
 Performance Metrics : Balance , AUC (Area Under the ROC Curve)
 Programming Language : Java
 Machine Learning Tool : WEKA
2January2016
44
RESULT AND DISCUSSIONS
2January2016
45
Comparison of
AUC values of
different methods
Date set Wahono
[4]
Abaei
[5]
Ren [6]
Lowest Highest
CM1 0.702 0.723 0.550 0.724 0.7946
KC1 0.79 0.790 0.592 0.800 0.8006
KC2 - - 0.591 0.796 0.8449
KC3 0.677 - 0.569 0.713 0.8322
KC4 - - - - 0.8059
MC1 - - - - 0.8110
MC2 0.739 - - - 0.7340
MW1 0.724 - 0.534 0.725 0.7340
PC1 0.799 - 0.692 0.882 0.8369
PC2 0.805 - - - 0.8668
PC3 0.78 0.795 - - 0.8068
PC4 0.861 - - - 0.9049
PC5 - - - - 0.9624
JM1 - 0.717 - - 0.7167
AR1 - - - - 0.8167
AR3 - - 0.580 0.699 0.8590
AR4 - - 0.555 0.671 0.8681
AR5 - - 0.614 0.722 0.925
AR6 - - - - 0.7566
RESULT AND DISCUSSIONS
Dataset Song [1] Wang [7] Jobaer
[8]
CM1 0.695 0.663 0.5500 0.680
JM1 0.585 0.678 - 0.6152
KC1 0.707 0.718 - 0.7244
KC2 - 0.753 - 0.7835
KC3 0.708 0.693 0.6037 0.7529
KC4 0.691 - - 0.7036
MC1 0.793 - - 0.6904
MC2 0.614 0.620 - 0.6847
MW1 0.661 0.636 0.7202 0.6577
PC1 0.668 0.688 0.5719 0.7040
PC2 - - 0.7046 0.7468
PC3 0.711 0.749 0.7114 0.7232
PC4 0.821 0.854 0.7450 0.8272
PC5 0.904 - - 0.9046
AR1 0.411 - - 0.6651
AR3 0.661 - - 0.8238
AR4 0.683 - - 0.7051
AR6 0.492 - - 0.5471
2January2016
46
Comparison of
Balance values of
different methods
FUTURE WORK
 Cross-project defect prediction
 Using other publicly available datasets
2January2016
47
REFERENCES
2January2016
48
[1] Song, Qinbao, Zihan Jia, Martin Shepperd, Shi Ying, and Shi Ying Jin Liu. "A general
software defect-proneness prediction framework." Software Engineering, IEEE Transactions on
37, no. 3 (2011): 356-370
[2] Wang, Huanjing, Taghi M. Khoshgoftaar, and Naeem Seliya. "How many software metrics
should be selected for defect prediction?" In FLAIRS Conference. 2011
[3] Gao, Kehan, Taghi M. Khoshgoftaar, and Huanjing Wang. "An empirical investigation of
filter attribute selection techniques for software quality classification." In Information Reuse &
Integration, 2009. IRI'09. IEEE International Conference on, pp. 272-277. IEEE, 2009.
[4] Wahono, Romi Satria, and Nanna Suryana Herman. "Genetic Feature Selection for
Software Defect Prediction." Advanced Science Letters 20, no. 1 (2014): 239-244.
[5] Abaei, Golnoush, and Ali Selamat. "A survey on software fault detection based on different
prediction approaches." Vietnam Journal of Computer Science 1, no. 2 (2014): 79-95.
[6] Ren, Jinsheng, Ke Qin, Ying Ma, and Guangchun Luo. "On software defect prediction using
machine learning." Journal of Applied Mathematics 2014 (2014).
REFERENCES
[7] Wang, Shuo, and Xin Yao. "Using class imbalance learning for software defect prediction."
Reliability, IEEE Transactions on 62, no. 2 (2013): 434-443.
[8] Khan, Jobaer, Alim Ul Gias, Md Saeed Siddik, Md Hafizur Rahman, Shah Mostafa Khaled,
and Mohammad Shoyaib. "An attribute selection process for software defect prediction." In
Informatics, Electronics & Vision (ICIEV), 2014 International Conference on, pp. 1-4. IEEE,
2014
2January2016
49
2January2016
50

More Related Content

Similar to Thesis Final Presentation (20)

IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET Journal
MSA Attribute ARR Test
MSA  Attribute ARR TestMSA  Attribute ARR Test
MSA Attribute ARR Test
Matt Hansen
Driverless Machine Learning Web App
Driverless Machine Learning Web AppDriverless Machine Learning Web App
Driverless Machine Learning Web App
SayantanGhosh58
Thesis Final Report
Thesis Final ReportThesis Final Report
Thesis Final Report
Sadia Sharmin
IRJET-Attribute Reduction using Apache Spark
IRJET-Attribute Reduction using Apache SparkIRJET-Attribute Reduction using Apache Spark
IRJET-Attribute Reduction using Apache Spark
IRJET Journal
Employee Retention Prediction: A Data Science Project by Devangi Shukla
Employee Retention Prediction: A Data Science Project by Devangi ShuklaEmployee Retention Prediction: A Data Science Project by Devangi Shukla
Employee Retention Prediction: A Data Science Project by Devangi Shukla
Boston Institute of Analytics
Team 16_Report
Team 16_ReportTeam 16_Report
Team 16_Report
Rahul Garg, CSSGB
Team 16_Report
Team 16_ReportTeam 16_Report
Team 16_Report
Naman Kapoor
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
KonkoboUlrichArthur
Less12 Proactive
Less12 ProactiveLess12 Proactive
Less12 Proactive
vivaankumar
R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...
R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...
R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...
Edureka!
Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query Performance
Karen Morton
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
DevOps.com
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache SparkAttribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
IRJET Journal
Stages of FMEA in Total Quality Management
Stages of FMEA in Total Quality Management Stages of FMEA in Total Quality Management
Stages of FMEA in Total Quality Management
Dr.Raja R
Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016
Mayank Prasad
Benchmarking_ML_Tools
Benchmarking_ML_ToolsBenchmarking_ML_Tools
Benchmarking_ML_Tools
Marc Borowczak
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET Journal
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...
IJTET Journal
7qc Tools 173
7qc Tools 1737qc Tools 173
7qc Tools 173
Mechsoft Technologies LLC
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET Journal
MSA Attribute ARR Test
MSA  Attribute ARR TestMSA  Attribute ARR Test
MSA Attribute ARR Test
Matt Hansen
Driverless Machine Learning Web App
Driverless Machine Learning Web AppDriverless Machine Learning Web App
Driverless Machine Learning Web App
SayantanGhosh58
Thesis Final Report
Thesis Final ReportThesis Final Report
Thesis Final Report
Sadia Sharmin
IRJET-Attribute Reduction using Apache Spark
IRJET-Attribute Reduction using Apache SparkIRJET-Attribute Reduction using Apache Spark
IRJET-Attribute Reduction using Apache Spark
IRJET Journal
Employee Retention Prediction: A Data Science Project by Devangi Shukla
Employee Retention Prediction: A Data Science Project by Devangi ShuklaEmployee Retention Prediction: A Data Science Project by Devangi Shukla
Employee Retention Prediction: A Data Science Project by Devangi Shukla
Boston Institute of Analytics
Less12 Proactive
Less12 ProactiveLess12 Proactive
Less12 Proactive
vivaankumar
R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...
R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...
R Tutorial For Beginners | R Programming Tutorial l R Language For Beginners ...
Edureka!
Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query Performance
Karen Morton
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
2020 Testing Trends: Top Predictions for QA Teams to Watch, Join, and Lead
DevOps.com
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache SparkAttribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
IRJET Journal
Stages of FMEA in Total Quality Management
Stages of FMEA in Total Quality Management Stages of FMEA in Total Quality Management
Stages of FMEA in Total Quality Management
Dr.Raja R
Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016Mysql Performance Schema - fossasia 2016
Mysql Performance Schema - fossasia 2016
Mayank Prasad
Benchmarking_ML_Tools
Benchmarking_ML_ToolsBenchmarking_ML_Tools
Benchmarking_ML_Tools
Marc Borowczak
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET Journal
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...
IJTET Journal

Recently uploaded (20)

DevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdfDevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdf
Justin Reock
salesforce development services - Alt digital
salesforce development services - Alt digitalsalesforce development services - Alt digital
salesforce development services - Alt digital
Alt Digital Technologies
Data Storytelling for Portfolio Leaders - Webinar
Data Storytelling for Portfolio Leaders - WebinarData Storytelling for Portfolio Leaders - Webinar
Data Storytelling for Portfolio Leaders - Webinar
OnePlan Solutions
OutSystems User Group Utrecht February 2025.pdf
OutSystems User Group Utrecht February 2025.pdfOutSystems User Group Utrecht February 2025.pdf
OutSystems User Group Utrecht February 2025.pdf
mail496323
AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]
AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]
AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]
haroonsaeed605
Minitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free DownloadMinitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free Download
v3r2eptd2q
Why Every Cables and Wires Manufacturer Needs a Cloud-Based ERP Solutions
Why Every Cables and Wires Manufacturer Needs a Cloud-Based ERP SolutionsWhy Every Cables and Wires Manufacturer Needs a Cloud-Based ERP Solutions
Why Every Cables and Wires Manufacturer Needs a Cloud-Based ERP Solutions
Absolute ERP
Computer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .pptComputer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .ppt
jaysen110
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free DownloadWondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
arshadkhokher01
20 Excel Shortcuts That Will Instantly Save You Hours.pdf
20 Excel Shortcuts That Will Instantly Save You Hours.pdf20 Excel Shortcuts That Will Instantly Save You Hours.pdf
20 Excel Shortcuts That Will Instantly Save You Hours.pdf
mohammadasim74
SolidWorks 2025 Crack free Download updated
SolidWorks 2025 Crack  free Download updatedSolidWorks 2025 Crack  free Download updated
SolidWorks 2025 Crack free Download updated
sanasabaa73
Cybersecurity & Innovation: The Future of Mobile App Development
Cybersecurity & Innovation: The Future of Mobile App DevelopmentCybersecurity & Innovation: The Future of Mobile App Development
Cybersecurity & Innovation: The Future of Mobile App Development
iProgrammer Solutions Private Limited
ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?
ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?
ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?
Ava Isley
Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9
Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9
Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9
Yann-Ga谷l Gu辿h辿neuc
Online Software Testing Training Institute in Delhi Ncr
Online Software Testing Training Institute in Delhi NcrOnline Software Testing Training Institute in Delhi Ncr
Online Software Testing Training Institute in Delhi Ncr
Home
Wondershare Filmora Crack Free Download
Wondershare Filmora  Crack Free DownloadWondershare Filmora  Crack Free Download
Wondershare Filmora Crack Free Download
zqeevcqb3t
SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?
kiran10101khan
Advance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management OdooAdvance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management Odoo
Aagam infotech
Douwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-LatestDouwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-Latest
mubeen010khan
Hire Odoo Developer OnestopDA Experts.
Hire Odoo Developer  OnestopDA Experts.Hire Odoo Developer  OnestopDA Experts.
Hire Odoo Developer OnestopDA Experts.
OnestopDA
DevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdfDevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdf
Justin Reock
salesforce development services - Alt digital
salesforce development services - Alt digitalsalesforce development services - Alt digital
salesforce development services - Alt digital
Alt Digital Technologies
Data Storytelling for Portfolio Leaders - Webinar
Data Storytelling for Portfolio Leaders - WebinarData Storytelling for Portfolio Leaders - Webinar
Data Storytelling for Portfolio Leaders - Webinar
OnePlan Solutions
OutSystems User Group Utrecht February 2025.pdf
OutSystems User Group Utrecht February 2025.pdfOutSystems User Group Utrecht February 2025.pdf
OutSystems User Group Utrecht February 2025.pdf
mail496323
AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]
AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]
AnyDesk Pro 3.7.0 Crack License Key Free Download 2025 [Latest]
haroonsaeed605
Minitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free DownloadMinitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free Download
v3r2eptd2q
Why Every Cables and Wires Manufacturer Needs a Cloud-Based ERP Solutions
Why Every Cables and Wires Manufacturer Needs a Cloud-Based ERP SolutionsWhy Every Cables and Wires Manufacturer Needs a Cloud-Based ERP Solutions
Why Every Cables and Wires Manufacturer Needs a Cloud-Based ERP Solutions
Absolute ERP
Computer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .pptComputer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .ppt
jaysen110
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free DownloadWondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
arshadkhokher01
20 Excel Shortcuts That Will Instantly Save You Hours.pdf
20 Excel Shortcuts That Will Instantly Save You Hours.pdf20 Excel Shortcuts That Will Instantly Save You Hours.pdf
20 Excel Shortcuts That Will Instantly Save You Hours.pdf
mohammadasim74
SolidWorks 2025 Crack free Download updated
SolidWorks 2025 Crack  free Download updatedSolidWorks 2025 Crack  free Download updated
SolidWorks 2025 Crack free Download updated
sanasabaa73
ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?
ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?
ChatGPT and DeepSeek: Which AI Tool Delivers Better User Experience?
Ava Isley
Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9
Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9
Projects Panama, Valhalla, and Babylon: Java is the New Python v0.9
Yann-Ga谷l Gu辿h辿neuc
Online Software Testing Training Institute in Delhi Ncr
Online Software Testing Training Institute in Delhi NcrOnline Software Testing Training Institute in Delhi Ncr
Online Software Testing Training Institute in Delhi Ncr
Home
Wondershare Filmora Crack Free Download
Wondershare Filmora  Crack Free DownloadWondershare Filmora  Crack Free Download
Wondershare Filmora Crack Free Download
zqeevcqb3t
SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?
kiran10101khan
Advance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management OdooAdvance Website Helpdesk Customer Support Ticket Management Odoo
Advance Website Helpdesk Customer Support Ticket Management Odoo
Aagam infotech
Douwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-LatestDouwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-Latest
mubeen010khan
Hire Odoo Developer OnestopDA Experts.
Hire Odoo Developer  OnestopDA Experts.Hire Odoo Developer  OnestopDA Experts.
Hire Odoo Developer OnestopDA Experts.
OnestopDA

Thesis Final Presentation

  • 1. Institute of Information Technology University of Dhaka SELECTION AND REPRESENTATION OF ATTRIBUTES FOR SOFTWARE DEFECT PREDICTION Supervised by Dr. Mohammad Shoyaib Associate Professor Presented by Sadia Sharmin BSSE-0426
  • 2. CONTENTS Background Motivation Problem Specification Objectives of Research Literature Review Methodology Result Analysis and Discussion Future Work 2January2016 2
  • 3. BACKGROUND Software Defect Any flaw or imperfection in a software work product or software process Software Defect Prediction An approach to find out the defected part earlier before testing/releasing the product 2January2016 3
  • 4. AN OVERVIEW OF SOFTWARE DEFECT PREDICTION PROCESS 2January2016 4 Data Set Pre- processing Attribute Selection Testing Data Prediction Result Training Data Prediction Model Training
  • 5. MOTIVATION Identifying the software bugs in an early stage Allocating the test resources efficiently Minimizing the cost of software development Improving the quality and productivity of software 2January2016 5
  • 6. WHY NEED PRE-PROCESSING Noisy Data Outliers Missing value or Conflicting value Inconsistency 2January2016 6
  • 7. WHY NEED ATTRIBUTE SELECTION Attributes are not equally important No standard set of attributes 2January2016 7
  • 8. OBJECTIVES OF RESEARCH To find out how the existing pre-processing can be used with the attribute selection methods more efficiently. To survey the existing methods and propose a proper attribute selection method. 2January2016 8
  • 9. A GENERAL SOFTWARE DEFECT-PRONENESS PREDICTION FRAMEWORK [1] Defect prediction framework : Data pre-processor: Log-filtering Feature selector: Forward Selection , Backward Elimination Learning algorithms : Na誰ve Bayes, J48, OneR 2January2016 9
  • 10. A GENERAL SOFTWARE DEFECT-PRONENESS PREDICTION FRAMEWORK [1] Small changes to data representation can have a major impact Feature selection one attribute at a time is not a practical solution for large datasets Different learning schemes should be chosen carefully for different datasets There is no clear indication about which combination should be used for a particular dataset 2January2016 10
  • 11. HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR DEFECT PREDICTION?[2] Five filter-based feature ranking technique Methodology Min-max normalization Pair of each independent attribute and class attribute Ranking the attribute Subset selection (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 20) 2January2016 11
  • 12. HOW MANY SOFTWARE METRICS SHOULD BE SELECTED FOR DEFECT PREDICTION?[2] Three metrics on average can be enough for building an effective prediction model Eliminating 98.5% of the available metrics improves the result It is not confirmed that it will work with all datasets 2January2016 12
  • 13. CHOOSING SOFTWARE METRICS FOR DEFECT PREDICTION: AN INVESTIGATION ON FEATURE SELECTION TECHNIQUES[3] Hybrid attribute selection approach Feature ranking Feature subset selection Removal of 85% metrics can enhance the performance of the prediction model 2January2016 13
  • 14. METHODOLOGY SAL: Selection of Attribute with Log filtering 2January2016 14 Pre-process the data with logarithmic filter Rank the Attribute Select the best set of attributes Build the predictor
  • 17. ATTRIBUTE RANKING 2January2016 17 A1 A2 A3 A4 A5 An A1 0.564 A2 0.764 A3 0.685 A4 0.798 A5 0.892 . . An 0.789 Individual Balance value
  • 19. ATTRIBUTE RANKING 2January2016 19 A1 A2 A3 A4 A5 An A1 0.034 A2 0.034 A3 0.456 A4 0.348 A5 0.784 . . An 0.789 Individual Balance value A1 A2 A3 A4 A5 An A1A2 A1A3 . . A3A1 A3A2 . . AmAn Pair wise combination A1A2 0.896 A1A3 0.734 .. .. A3A1 0.587 A3A2 0.669 .. .. AmAn 0.897 Pair wise Balance value
  • 20. ATTRIBUTE RANKING 2January2016 20 A1 A2 A3 A4 A5 An A1 0.034 A2 0.034 A3 0.456 A4 0.348 A5 0.784 . . An 0.789 Individual Balance value A1 A2 A3 A4 A5 An A1A2 A1A3 . . A3A1 A3A2 . . AmAn Pair wise combination Pair wise Balance value Average Balance value for each attribute A1A2 0.896 A1A3 0.734 .. .. A3A1 0.587 A3A2 0.669 .. .. AmAn 0.897 A1 0.765 A2 0.534 A3 0.679 A5 0.987 A4 0.869 ... ... An 0.897
  • 21. ATTRIBUTE RANKING 2January2016 21 A1 A2 A3 A4 A5 An A1 0.034 A2 0.034 A3 0.456 A4 0.348 A5 0.784 . . An 0.789 Individual Balance value A1 A2 A3 A4 A5 An A1A2 A1A3 . . A3A1 A3A2 . . AmAn Pair wise combination Pair wise Balance value Average Balance value for each attribute Average Balance Value = (Individual value + Average value of n pair)/2 A1 0.765 A2 0.534 A3 0.679 A5 0.987 A4 0.869 ... ... An 0.897 A1A2 0.896 A1A3 0.734 .. .. A3A1 0.587 A3A2 0.669 .. .. AmAn 0.897
  • 22. ATTRIBUTE RANKING 2January2016 22 A1 A2 A3 A4 A5 An A1 0.034 A2 0.034 A3 0.456 A4 0.348 A5 0.784 . . An 0.789 Individual Balance value A1 A2 A3 A4 A5 An A1A2 A1A3 . . A3A1 A3A2 . . AmAn Pair wise combination Pair wise Balance value A1 0.765 A2 0.534 A3 0.679 A5 0.887 A4 0.869 ... ... An 0.897 Average Balance value For each attribute A5 0.887 A4 0.869 A10 0.765 A8 0.750 A9 0.696 ... ... An 0.523 Sorted Balance value in decreasing order A1A2 0.896 A1A3 0.734 .. .. A3A1 0.587 A3A2 0.669 .. .. AmAn 0.897
  • 23. SELECT BEST SET OF ATTRIBUTES 2January2016 23 A5 A4 A10 A8 A9 ... ... An Ranking of Attributes Best Set of Attributes
  • 24. SELECT BEST SET OF ATTRIBUTES 2January2016 24 A5 A4 A10 A8 A9 ... ... An Ranking of Attributes Best Set of Attributes
  • 25. SELECT BEST SET OF ATTRIBUTES 2January2016 25 A5 A4 A10 A8 A9 ... ... An Ranking of Attributes Best Set of Attributes
  • 26. SELECT BEST SET OF ATTRIBUTES 2January2016 26 A4 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887
  • 27. SELECT BEST SET OF ATTRIBUTES 2January2016 27 A4 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887
  • 28. SELECT BEST SET OF ATTRIBUTES 2January2016 28 A4 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887
  • 29. SELECT BEST SET OF ATTRIBUTES 2January2016 29 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887 A4 2nd ranked
  • 30. SELECT BEST SET OF ATTRIBUTES 2January2016 30 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887 A4 2nd ranked A5A4
  • 31. SELECT BEST SET OF ATTRIBUTES 2January2016 31 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887 (previous) A4 2nd ranked A5A4 0.891 (new) Combined Balance value
  • 32. SELECT BEST SET OF ATTRIBUTES 2January2016 32 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887 (previous) A4 2nd ranked A5A4 0.891 (new) Combined Balance value new value > previous value
  • 33. SELECT BEST SET OF ATTRIBUTES 2January2016 33 A10 A8 A9 ... ... An Ranking of Attributes A5 Best Set of Attributes A5 1st ranked 0.887 A4 2nd ranked
  • 34. SELECT BEST SET OF ATTRIBUTES 2January2016 34 A10 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes A5A4 0.891
  • 35. SELECT BEST SET OF ATTRIBUTES 2January2016 35 A10 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes A5A4 0.891
  • 36. SELECT BEST SET OF ATTRIBUTES 2January2016 36 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes A5A4 0.891 A10 3rd ranked
  • 37. SELECT BEST SET OF ATTRIBUTES 2January2016 37 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes A5A4 0.891 A10 3rd ranked A5A4A10
  • 38. SELECT BEST SET OF ATTRIBUTES 2January2016 38 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes A5A4 0.891 A10 3rd ranked A5A4A10 0.856 (new) Combined Balance value
  • 39. SELECT BEST SET OF ATTRIBUTES 2January2016 39 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes A5A4 0.891 (previous) A10 3rd ranked A5A4A10 0.856 (new) Combined Balance value new value < previous value
  • 40. SELECT BEST SET OF ATTRIBUTES 2January2016 40 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes A5A4 0.891 A10 3rd ranked Discarded
  • 41. SELECT BEST SET OF ATTRIBUTES 2January2016 41 A8 A9 ... ... An Ranking of Attributes A5,A4 Best Set of Attributes Continue this process.
  • 42. SELECT BEST SET OF ATTRIBUTES 2January2016 42 A5,A4,A9,A12,A7 Best Set of Attributes
  • 43. PERFORMANCE MEASUREMENT SCALES 2January2016 43 Confusion Matrix Predicted Actual TP FN FP TN False Positive rate TruePositiverate 0 1 1 Area Under the ROC curve (AUC)
  • 44. RESULT AND DISCUSSIONS Data set : NASA MDP repository and PROMISE repository Classifier : Na誰ve Bayes Performance Metrics : Balance , AUC (Area Under the ROC Curve) Programming Language : Java Machine Learning Tool : WEKA 2January2016 44
  • 45. RESULT AND DISCUSSIONS 2January2016 45 Comparison of AUC values of different methods Date set Wahono [4] Abaei [5] Ren [6] Lowest Highest CM1 0.702 0.723 0.550 0.724 0.7946 KC1 0.79 0.790 0.592 0.800 0.8006 KC2 - - 0.591 0.796 0.8449 KC3 0.677 - 0.569 0.713 0.8322 KC4 - - - - 0.8059 MC1 - - - - 0.8110 MC2 0.739 - - - 0.7340 MW1 0.724 - 0.534 0.725 0.7340 PC1 0.799 - 0.692 0.882 0.8369 PC2 0.805 - - - 0.8668 PC3 0.78 0.795 - - 0.8068 PC4 0.861 - - - 0.9049 PC5 - - - - 0.9624 JM1 - 0.717 - - 0.7167 AR1 - - - - 0.8167 AR3 - - 0.580 0.699 0.8590 AR4 - - 0.555 0.671 0.8681 AR5 - - 0.614 0.722 0.925 AR6 - - - - 0.7566
  • 46. RESULT AND DISCUSSIONS Dataset Song [1] Wang [7] Jobaer [8] CM1 0.695 0.663 0.5500 0.680 JM1 0.585 0.678 - 0.6152 KC1 0.707 0.718 - 0.7244 KC2 - 0.753 - 0.7835 KC3 0.708 0.693 0.6037 0.7529 KC4 0.691 - - 0.7036 MC1 0.793 - - 0.6904 MC2 0.614 0.620 - 0.6847 MW1 0.661 0.636 0.7202 0.6577 PC1 0.668 0.688 0.5719 0.7040 PC2 - - 0.7046 0.7468 PC3 0.711 0.749 0.7114 0.7232 PC4 0.821 0.854 0.7450 0.8272 PC5 0.904 - - 0.9046 AR1 0.411 - - 0.6651 AR3 0.661 - - 0.8238 AR4 0.683 - - 0.7051 AR6 0.492 - - 0.5471 2January2016 46 Comparison of Balance values of different methods
  • 47. FUTURE WORK Cross-project defect prediction Using other publicly available datasets 2January2016 47
  • 48. REFERENCES 2January2016 48 [1] Song, Qinbao, Zihan Jia, Martin Shepperd, Shi Ying, and Shi Ying Jin Liu. "A general software defect-proneness prediction framework." Software Engineering, IEEE Transactions on 37, no. 3 (2011): 356-370 [2] Wang, Huanjing, Taghi M. Khoshgoftaar, and Naeem Seliya. "How many software metrics should be selected for defect prediction?" In FLAIRS Conference. 2011 [3] Gao, Kehan, Taghi M. Khoshgoftaar, and Huanjing Wang. "An empirical investigation of filter attribute selection techniques for software quality classification." In Information Reuse & Integration, 2009. IRI'09. IEEE International Conference on, pp. 272-277. IEEE, 2009. [4] Wahono, Romi Satria, and Nanna Suryana Herman. "Genetic Feature Selection for Software Defect Prediction." Advanced Science Letters 20, no. 1 (2014): 239-244. [5] Abaei, Golnoush, and Ali Selamat. "A survey on software fault detection based on different prediction approaches." Vietnam Journal of Computer Science 1, no. 2 (2014): 79-95. [6] Ren, Jinsheng, Ke Qin, Ying Ma, and Guangchun Luo. "On software defect prediction using machine learning." Journal of Applied Mathematics 2014 (2014).
  • 49. REFERENCES [7] Wang, Shuo, and Xin Yao. "Using class imbalance learning for software defect prediction." Reliability, IEEE Transactions on 62, no. 2 (2013): 434-443. [8] Khan, Jobaer, Alim Ul Gias, Md Saeed Siddik, Md Hafizur Rahman, Shah Mostafa Khaled, and Mohammad Shoyaib. "An attribute selection process for software defect prediction." In Informatics, Electronics & Vision (ICIEV), 2014 International Conference on, pp. 1-4. IEEE, 2014 2January2016 49