�ݺ�ߣ

Muta%on Analysis vs. Code Coverage in
Automated Assessment of Students’
Tes%ng Skills
Kalle Aaltonen, Petri Ihantola and O2o Seppälä (Splash – ETS’10)
Aalto University, Finland

What Do We Do?
•  Believe in tesGng
•  Provide programming assignments
– for hundreds of students per course
– where students are asked to submit:
•  Their implementaGon
•  Unit tests covering their own implementaGon
– Use Web‐Cat for automated assessment
•  Grade =
our tests passing (%) * student’s tests passing (%) *
line or branch coverage of student’s tests

How Students Test
three different tests with the same code coverage
assertTrue(1 < 2);
fibonacci(6);
assertTrue(fibonacci(6) >= 0);
assertEquals(8,fibonacci(6));

•  Create variaGons automaGcally from the original
program
•  Simulate bugs
•  A good test will catch many of these mutants
•  Assuming these mutants are really diﬀerent from the original
•  We hope this to provide be2er feedback/grading
•  We used a byte‐code level mutaGon analysis tool
called Javalanche
MutaGon Analysis

Int Fib ( int N) {
    int curr = 1 , prev = 0;
    for ( int i = 0; i <= N; i++) {
        int temp = curr ;
        curr = curr + prev ;
        prev = temp ;
    }
    return prev ;
}
MutaGon Analysis
Examples of Mutants
Int Fib ( int N ) {
    int curr = 1 , prev =0;
    for ( int i = 0; I < N; i++ ) {
        int temp = curr ;
        prev = temp ;
    }
    return prev ;
}
Int Fib ( int N ) {
    for ( int i = 0; i < N; i++ ) {
        int temp = curr ;
        prev = temp ;
    }
    return prev ;
}
Int Fib ( int N ) {
    for ( int i = 1; i <= N; i++ ) {
        int temp = curr ;
        prev = temp ;
    }
    return prev ;
}

Some Results
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Code coverage
Mutationscore
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Code coverage
Mutationscore
•  Data: BST, Hashing, Disjoint Sets assignments
•  Most students get full points from the coverage
•  MutaGon scores more widely distributed

About the Validity of the Results
40 %
50 %
60 %
70 %
80 %
90 %
100 %
Best Suite
Mut. Score 98,0 %
Random Suite 1
Mut. Score 85,4 %
Random Suite 2
Mut. Score 72,0 %
Worst Suite
Mut. Score 54,8 %

Conclusions
•  Can be used to pick up suspicious soluGons
–  High code coverage but low mutaGon score
•  Reduces the importance of unit  tests wri2en by the
teacher
–  Also able to ensure that unspeciﬁed  features are
tested (i.e. speciﬁed)
•  Immediate feedback
–  When compared to running
all tests against each soluGon
•  Complex parts of the code get more a2enGon
•  Able to give feedback from teacher’s own
tests
•  Should be combined to other test adequacy
metrics

Future DirecGons
•  Evaluate in pracGce
•  Data we analyzed is from a course
where tradiGonal coverage was used to
provide feedback from tests.
•  Testability – Test Adequacy – Correctness
•  Use source code mutants directly as
feedback

Thank You!
QuesGons, comments?
petri@cs.hut.fi
Graphics:
Vte.Moncho, h2p://www.flickr.com/photos/maniacpictures/
Don Solo, h2p://www.flickr.com/photos/donsolo/
licensed under the creaGve commons license

�ݺ�ߣ

Mutation Analysis vs. Code Coverage in Automated Assessment of Students’ Testing Skills

More Related Content

Mutation Analysis vs. Code Coverage in Automated Assessment of Students’ Testing Skills