ݺߣs from my SPLASH 2010 presentation:
Kalle Aaltonen, Petri Ihantola, Otto Seppälä (2010). Mutation analysis vs. code coverage in automated assessment of students’ testing skills. In: SPLASH ’10: Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion. Reno/Tahoe, Nevada, USA: ACM, pp. 153–160. ISBN: 978-1-4503-0240-1. http://dx.doi.org/10.1145/1869542.1869567
1 of 13
Downloaded 14 times
More Related Content
Mutation Analysis vs. Code Coverage in Automated Assessment of Students’ Testing Skills
4. • Create variaGons automaGcally from the original
program
• Simulate bugs
• A good test will catch many of these mutants
• Assuming these mutants are really different from the original
• We hope this to provide be2er feedback/grading
• We used a byte‐code level mutaGon analysis tool
called Javalanche
MutaGon Analysis
5. • Create variaGons automaGcally from the original
program
• Simulate bugs
• A good test will catch many of these mutants
• Assuming these mutants are really different from the original
• We hope this to provide be2er feedback/grading
• We used a byte‐code level mutaGon analysis tool
called Javalanche
MutaGon Analysis
7. Some Results
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Code coverage
Mutationscore
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Code coverage
Mutationscore
• Data: BST, Hashing, Disjoint Sets assignments
• Most students get full points from the coverage
• MutaGon scores more widely distributed
8. Some Results
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Code coverage
Mutationscore
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Code coverage
Mutationscore
• Data: BST, Hashing, Disjoint Sets assignments
• Most students get full points from the coverage
• MutaGon scores more widely distributed
9. About the Validity of the Results
40 %
50 %
60 %
70 %
80 %
90 %
100 %
Best Suite
Mut. Score 98,0 %
Random Suite 1
Mut. Score 85,4 %
Random Suite 2
Mut. Score 72,0 %
Worst Suite
Mut. Score 54,8 %
10. About the Validity of the Results
40 %
50 %
60 %
70 %
80 %
90 %
100 %
Best Suite
Mut. Score 98,0 %
Random Suite 1
Mut. Score 85,4 %
Random Suite 2
Mut. Score 72,0 %
Worst Suite
Mut. Score 54,8 %
11. Conclusions
• Can be used to pick up suspicious soluGons
– High code coverage but low mutaGon score
• Reduces the importance of unit tests wri2en by the
teacher
– Also able to ensure that unspecified features are
tested (i.e. specified)
• Immediate feedback
– When compared to running
all tests against each soluGon
• Complex parts of the code get more a2enGon
• Able to give feedback from teacher’s own
tests
• Should be combined to other test adequacy
metrics