�ݺ�ߣ

A Decade of Comment Quality Assessment:
A Systematic Literature Review
Pooja Rani, Arianna Blasi, Natallia Stulova, Sebastiano
Panichella, Alessandra Gorla, Oscar Nierstrasz
Journal First
University of Zurich, Switzerland
ESEM 2023

2
How software developers spend their time?
Xia, X., Bao, L., Lo, D., Xing, Z., Hassan, A. E., & Li, S. (2017). Measuring program comprehension:
A large-scale
fi
eld study with professionals. IEEE Transactions on Software Engineering.
Comprehension
58%
Navigation
24%
Others
13%
Edition
5%

3
Comprehension
58%
Navigation
24%
Others
13%
Edition
5%
Code comments
/**
* A class representing a window on the screen.
*
* For example:
* <pre>
* Window win = new Window(parent);
* win.show();
* </pre>
*
* @author Sami Shaio
* @version 1.13, 06/08/06
* @see java.awt.BaseWindow
* @see java.awt.Button
*/
class Window extends BaseWindow {
��
}

4
/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
��
}
Code comments

5
/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
��
}
Code comments

6
/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
��
}
Code comments
Trustworthy form of documentation
- McMillan et al. 2010

7
High-quality comments support developers in various activities
/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
��
}
Code comments
Trustworthy form of documentation
- McMillan et al. 2010
- Dekel et al. 2009

8
/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
��
}
Is this a high-
quality comment?
correct?
consistent?
enough?
Other quality attributes?
Code comments

9
Gillies et al. 2011
Quality is a multi-dimensional concept

10
How do researchers measure comment quality?

Systematic literature review
10 years timeline (2011-2020)
195 software engineering venues
332 proceedings
2353 candidate papers
47 relevant papers
11
How do researchers measure comment quality?

12
21 quality attributes
Quality attributes

13
Consistency
Completeness
Accuracy
Readability
Up-to-date-ness
Content relevance
Maintainability
Spelling and grammar
Quality attributes
Quality attributes
23
12
21 quality attributes

14
Some are frequently considered
Consistency
Completeness
Accuracy
Readability
Up-to-date-ness
Content relevance
Maintainability
Quality attributes
Quality attributes

15
Some are rarely considered
Consistency
Completeness
Accuracy
Readability
Up-to-date-ness
Content relevance
Maintainability
Conciseness
Usability
Correctness
Traceability
Accessibility
Coherence
Format
Information organization
Understandability
Documentation technoogy
Internatioalization
Author-related
Quality attributes
Quality attributes

16
Techniques Consistency
Completeness
Accuracy
Readability
Up-to-date-ness
Content relevance
Maintainability
Conciseness
Usability
Correctness
Traceability
Accessibility
Coherence
Format
Understandability
Internatioalization
Author-related
Quality attributes

17
Techniques
Manual assessment
Consistency
ML-based
Heuristic-based
Completeness
Accuracy
Metric-based
Static analysis
NLP
Readability
Up-to-date-ness
Content relevance
DNN-based
Maintainability
Conciseness
First order logic
Usability
Correctness
Traceability
Accessibility
Coherence
Format
Understandability
Empirically
fNIRS
Internatioalization
Author-related
Techniques
Quality attributes

18
Techniques
Manual assessment
Consistency
ML-based
Heuristic-based
Completeness
Accuracy
Metric-based
Static analysis
NLP
Readability
Up-to-date-ness
Content relevance
DNN-based
Maintainability
Conciseness
First order logic
Usability
Correctness
Traceability
Accessibility
Coherence
Format
Understandability
Empirically
fNIRS
Internatioalization
Author-related
Techniques
Quality attributes

Manual assessment
Consistency
ML-based
Heuristic-based
Completeness
Accuracy
Metric-based
Static analysis
NLP
Readability
Up-to-date-ness
Content relevance
DNN-based
Maintainability
Conciseness
First order logic
Usability
Correctness
Traceability
Accessibility
Coherence
Format
Understandability
Empirically
fNIRS
Internatioalization
Author-related
Techniques
Quality attributes
19
Techniques
Manual assessment is still the most
frequent technique to measure
quality attributes

Manual assessment
Consistency
ML-based
Heuristic-based
Completeness
Accuracy
Metric-based
Static analysis
NLP
Readability
Up-to-date-ness
Content relevance
DNN-based
Maintainability
Conciseness
First order logic
Usability
Correctness
Traceability
Accessibility
Coherence
Format
Understandability
Empirically
fNIRS
Internatioalization
Author-related
Techniques
Quality attributes
20
Techniques
Deep learning-based techniques have
not been extensively explored for
comment analysis

Manual assessment
Consistency
ML-based
Heuristic-based
Completeness
Accuracy
Metric-based
Static analysis
NLP
Readability
Up-to-date-ness
Content relevance
DNN-based
Maintainability
Conciseness
First order logic
Usability
Correctness
Traceability
Accessibility
Coherence
Format
Understandability
Empirically
fNIRS
Internatioalization
Author-related
Techniques
Quality attributes
21
Which of these quality attributes are
important to which developers?
Do code reviewers care more for accurate
comments than complete comments? When?
How to automatically measure these
quality attributes?
Can LLMs help writing better comments?

22
Is comments worth the efforts?

23
/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
��
}
Task Design Implementation Maintenance
Summary
Usage
Pointer
Information Quality attributes
Accessibility
Readability
Author-related
Completeness
Consistency
Correctness
Audience User Developer
Future work

24
Replication: https://
zenodo.org/record/
4729054
Paper
pooja-rani-01985b68/
A Decade of Comment Quality Assessment: A Systematic Literature Review

/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
class Window extends BaseWindow{
..
}
class OneHotCategorical(Distribution):
r"""
Creates a one-hot categorical distribution parameterized by :attr:`probs` or
:attr:`logits`.
Samples are one-hot coded vectors of size ``probs.size(-1)``.
.. note:: The `probs` argument must be non-negative, finite and have a non-zero sum,
and it will be normalized to sum to 1 along the last dimension. :attr:`probs`
will return this normalized value.
The `logits` argument will be interpreted as unnormalized log probabilities
and can therefore be any real number. It will likewise be normalized so that
the resulting probabilities sum to 1 along the last dimension. :attr:`logits`
will return this normalized value.
See also: :func:`torch.distributions.Categorical` for specifications of
:attr:`probs` and :attr:`logits`.
Example::
>>> m = OneHotCategorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ]))
>>> m.sample() # equal probability of 0, 1, 2, 3
tensor([ 0., 0., 0., 1.])
Args:
probs (Tensor): event probabilities
logits (Tensor): event log probabilities (unnormalized)
"""
Smalltalk class comment
26
Comments of multi-languages

29
7
6
8 9
10
1 2 3
4 5
Methodology
11
47 papers

47 papers over years
30
0
2
4
6
8
10
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
#
Papers

31
consistency, completeness
Quality attributes
Techniques
Comment types
method comments, inline comments
heuristic-based, machine learning-based
Dimensions analyzed
Data availability
tool, dataset

32
/**
*
* For example:
* <pre>
* win.show();
* </pre>
*
* @version 1.13, 06/08/06
*/
��
}
Code comment metrics
Consistency:
Overlap between the terms used
in comment and code. Higher
value of CIC with higher
readability of that code.
Readability:
Flesch reading-ease test

33
Nearly 50% of the studies still lack in the replicability
dimension, as their respective dataset or tool is often
not publicly accessible.
Data availability

34
48%
19%
8%
4%
4%
2%
2%
6%
6%
2%
6%
4%
2%
4%
4%
2%
2%
8%
4%
2%
2%
Code comments
Method comments
API documentation
License comments
Inline comments
TODO Comments
Software documentation
Deprecation Comments
0% 20% 40% 60% 80%
Java
Python
C++
C
C#
Other
Comment types

35
48%
19%
8%
4%
4%
2%
2%
6%
6%
2%
6%
4%
2%
4%
4%
2%
2%
8%
4%
2%
2%
Code comments
Method comments
API documentation
License comments
Inline comments
TODO Comments
0% 20% 40% 60% 80%
Java
Python
C++
C
C#
Other
52% of the studies
Comment types

36
48%
19%
8%
4%
4%
2%
2%
6%
6%
2%
6%
4%
2%
4%
4%
2%
2%
8%
4%
2%
2%
Code comments
Method comments
API documentation
License comments
Inline comments
TODO Comments
0% 20% 40% 60% 80%
Java
Python
C++
C
C#
Other
48% of the studies
Comment types

37
Comment types
48%
19%
8%
4%
4%
2%
2%
6%
6%
2%
6%
4%
2%
4%
4%
2%
2%
8%
4%
2%
2%
Code comments
Method comments
API documentation
License comments
Inline comments
TODO Comments
0% 20% 40% 60% 80%
Java
Python
C++
C
C#
Other
% of studies

�ݺ�ߣ

A Decade of Comment Quality Assessment: A Systematic Literature Review

Recommended

More Related Content

Similar to A Decade of Comment Quality Assessment: A Systematic Literature Review (20)

More from Pooja Rani (7)

Recently uploaded (20)

A Decade of Comment Quality Assessment: A Systematic Literature Review