際際滷

際際滷Share a Scribd company logo
David E. Losada
Fabio Crestani
A Test Collection for Research on
Depression and Language Use
CLEF 2016, vora (Portugal)
350 million
people sufer from
depression
early
intervention
is fundamental
human expert + technology
current technology
doesn卒t support
early alerts
reactive
works with very
explicit signals
current technology
doesn卒t support
early alerts
reactive
works with very
explicit signals
too often, too late!
instigate research on the onsetof depression
proactivetechnologies
track temporalevolution
early alerts
Text analytics
natural language can be indicative of
personality, social status, emotions,
mental health, disorders, ...
linguistic markers
use of personal pronouns
statistical properties of text
topic models
psychometrics
content vs style
social words
verb tense positive/negative emotions
psychological processes
cognitive processes
Lack of data on depression & language
few collections available
focus on 2-class categorisation
no temporal dimension, no early risk analysis
little context about the tweet writer
difficult to assess whether a mention of
depression is genuine
no way to extract a long history of
tweets (e.g. several years)
little context about the tweet writer
difficult to assess whether a mention of
depression is genuine
no way to extract a long history of
tweets (e.g. several years)
A Thin Line
A Thin Line
no way to extract any history
short messages, little context
A Thin Line
no way to extract any history
short messages, little context
A Test Collection for Research on Depression and Language Use
large history for each redditor (several years)
many subreddits (communities) about different
medical conditions (e.g. depression or
anorexia)
long messages
terms & conditions allow use
for research purposes
large history for each redditor (several years)
many subreddits (communities) about different
medical conditions (e.g. depression or
anorexia)
long messages
terms & conditions allow use
for research purposes
depression group vs control group
depression group vs control group
I am depressed I think I have depression
Adopted extraction method from
Coppersmith et al. 2014:
pattern matching search
search for explicit mentions of diagnosis
(e.g. I was diagnosed with depression)
manual inspection of the results
depression group vs control group
(e.g. My wife has depression, I am a
student interested in depression)
large set of random redditors
from a wide range of subreddits
(news, media, ...)
also included some false positives
from the depression subreddit
retrieved all history
from any subreddit
his/her posts +
his/her comments to other posts
often several years of text
removed the post/comment with
the explicit mention of the
diagnosis (depression group)
redditor profile
pre- & post-diagnosis text
organised the writings in
chronological order
XML archives
redditor profile
collection: main statistics
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
------
------
2/13/13
John Doe's writings
(post or comments)
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
------
------
2/13/13
John Doe's writings
(post or comments)
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
------
------
2/13/13
------
------
2/15/13
John Doe's writings
(post or comments)
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
------
------
2/13/13
------
------
------
------
2/15/13 3/1/13
John Doe's writings
(post or comments)
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
------
------
2/13/13
------
------
------
------
------
------
2/15/13 3/1/13 12/9/16
...John Doe's writings
(post or comments)
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
------
------
2/13/13
------
------
------
------
------
------
2/15/13 3/1/13 12/9/16
...John Doe's writings
(post or comments)
tradeoff
early decision
vs
more informed decision
early prediction task
detect early traces of depression
for each subject, sequentially process
pieces of evidence...
------
------
2/13/13
------
------
------
------
------
------
2/15/13 3/1/13 14/9/16
...John Doe's writings
(post or comments)
tradeoff
early decision
vs
more informed decision
when should I fire an alarm?
early prediction task: performance metric
After seeing k texts a system makes a binary
decision dd about John Doe:
d=1 => possible risk of depression
d=0 => non-risk case
early prediction task: performance metric
After seeing k texts a system makes a binary
decision dd about John Doe:
------
------
2/13/13
(1)
------
------
------
------
2/15/13
(2)
3/10/14
(k)
John Doe's writings
(post or comments) ...
decision (d)
d=1 => possible risk of depression
d=0 => non-risk case
early prediction task: performance metric
------
------
2/13/13
(1)
------
------
------
------
2/15/13
(2)
3/10/14
(k)
John Doe's writings
(post or comments) ...
decision (d)
ERDEO
(d,k)=
Early Risk Detection Error:
cfp
(false positive)
cfn
(false negative)
ctp
* lco
(k) (true positive)
0 (true negative)
Early Risk Detection Error:
ERDEO
(d,k)=
cfp
(false positive)
cfn
(false negative)
ctp
* lco
(k) (true positive)
0 (true negative)
Usually, cfn
>> cfp
cfn
 1, cfp
 expected proportion of positive cases (e.g. 0.01)
True Positive cost: ctp
* lco
(k)
ctp
 cfn
(late detection  no detection)
Latency cost function
experiments
Training Test
403 83 352 54
Training
403 83
------
------
------
------
...
------
------
------
------
2/13/13 2/15/13 3/1/13 12/9/16
single doc
representations
depression language
classifier
------
------
------
------
...
------
------
------
------
3/23/13 3/25/13 1/3/14 2/19/15
--------
--------
--------
John
Doe
Jane
Doe
Jane
Doe
John
Doe
--------
--------
--------
.
.
..
.
1:0.4 2:0.5 ..........+1
1:0.3 3:0.7 ..........-1
.
.
.
feature-based
representations (tfidf weights)
logistic regression
(L1 regularisation)
Test
352 54
random (after 1st message)
------
------
------
------
...
------
------
------
------
2/13/13 2/15/13 3/1/13 14/9/16
rand ({0,1})
.
.
.
Test
352 54
minority class (after 1st message)
------
------
------
------
...
------
------
------
------
2/13/13 2/15/13 3/1/13 14/9/16
1 (risk case)
Test
352 54
first n
1 2 n
------
------
...
------
------
------
------
...
2/13/13 2/15/13 3/1/13
depression language classifier
decision
Test
352 54
dynamic
1 2 n
------
------
...
------
------
------
------
...
2/13/13 2/15/13 3/1/13
depression language classifier
confident about risk?
we finish and predict 1 (risk case)
yes
Test
352 54
dynamic
1 2 n
------
------
...
------
------
------
------
...
2/13/13 2/15/13 3/1/13
depression language classifier
confident about risk?
we wait and see more evidence...
no
Test
352 54
dynamic
1 2 n
------
------
...
------
------
------
------
...
2/13/13 2/15/13 3/1/13
depression language classifier
confident about risk?
we finish and predict 1 (risk case)
yes
Test
352 54
dynamic
1 2 n
------
------
...
------
------
------
------
...
2/13/13 2/15/13 3/1/13
depression language classifier
confident about risk?
we wait and see more evidence...
no
random/minority: poor F1 & ERDE
first n: good F1 but slow at detecting risk cases
dynamic: best balance between correctness & time
results
new collection on
depression & language
early risk detection
algorithms
(preliminary baselines)
methodology for
benchmark
construction
temporal dimension
conclusions
David E. Losada
Fabio Crestani
A Test Collection for Research on Depression and
Language Use
We also thank the
Ministerio de Econom鱈a y Competitividad
of the Goverment of Spain &
FEDER Funds (ref. TIN2015-64282-R)
This research was funded by the Swiss
National Science Foundation
(project Early risk prediction on the Internet:
an evaluation corpus, 2015)
Acknowledgements:
Ehnero. picture pg 1.CC BY NC 2.0.
Gerald Gabernig. picture pg 2.CC BY 2.0.
ankxt. picture pg 3.CC BY 2.0.
NEC Corporation of America. picture pg 4.CC BY 2.0.
Jordi Borrs i Viv坦. picture pgs 5-6 .CC BY NC ND 2.0.
Helen Harrop. picture pg 7.CC BY SA 2.0.
Nilufer Gadgieva. picture pg 8.CC BY NC 2.0.
Alix May. picture pg 9.CC BY NC 2.0.
Justin Lincoln. picture pg 10.CC BY SA 2.0.
Grace McDunnough. picture pgs 11-18 (top).CC BY NC ND 2.0.
Andy Kennelly. picture pgs 19-21.CC BY NC 2.0.
Joel Olives. picture pgs 22-23 (left).CC BY 2.0.
Tim Morgan. picture pg 23 (right).CC BY 2.0.
Conor Lawless. picture pg 24.CC BY 2.0.
Oscar Rethwill. picture pgs 25-32.CC BY 2.0.
Emily. picture pgs 33-37.CC BY NC 2.0.
Tiberiu Ana. picture pg 38.CC BY 2.0.
woodleywonderworks. picture pg 39 (left), 40 (left).CC BY 2.0.
Niko Kaiser. picture pg 39 (right), 41-47.CC BY 2.0.
John Sheets. picture pg 48.CC BY NC 2.0.
Anders Sandberg. picture pg 49.CC BY NC 2.0.
See-ming Lee. picture pg 51.CC BY NC 2.0.

More Related Content

A Test Collection for Research on Depression and Language Use

  • 1. David E. Losada Fabio Crestani A Test Collection for Research on Depression and Language Use CLEF 2016, vora (Portugal)
  • 2. 350 million people sufer from depression
  • 4. human expert + technology
  • 5. current technology doesn卒t support early alerts reactive works with very explicit signals
  • 6. current technology doesn卒t support early alerts reactive works with very explicit signals too often, too late!
  • 7. instigate research on the onsetof depression proactivetechnologies track temporalevolution early alerts
  • 8. Text analytics natural language can be indicative of personality, social status, emotions, mental health, disorders, ...
  • 9. linguistic markers use of personal pronouns statistical properties of text topic models psychometrics content vs style social words verb tense positive/negative emotions psychological processes cognitive processes
  • 10. Lack of data on depression & language few collections available focus on 2-class categorisation no temporal dimension, no early risk analysis
  • 11. little context about the tweet writer difficult to assess whether a mention of depression is genuine no way to extract a long history of tweets (e.g. several years)
  • 12. little context about the tweet writer difficult to assess whether a mention of depression is genuine no way to extract a long history of tweets (e.g. several years)
  • 14. A Thin Line no way to extract any history short messages, little context
  • 15. A Thin Line no way to extract any history short messages, little context
  • 17. large history for each redditor (several years) many subreddits (communities) about different medical conditions (e.g. depression or anorexia) long messages terms & conditions allow use for research purposes
  • 18. large history for each redditor (several years) many subreddits (communities) about different medical conditions (e.g. depression or anorexia) long messages terms & conditions allow use for research purposes
  • 19. depression group vs control group
  • 20. depression group vs control group I am depressed I think I have depression Adopted extraction method from Coppersmith et al. 2014: pattern matching search search for explicit mentions of diagnosis (e.g. I was diagnosed with depression) manual inspection of the results
  • 21. depression group vs control group (e.g. My wife has depression, I am a student interested in depression) large set of random redditors from a wide range of subreddits (news, media, ...) also included some false positives from the depression subreddit
  • 22. retrieved all history from any subreddit his/her posts + his/her comments to other posts often several years of text removed the post/comment with the explicit mention of the diagnosis (depression group) redditor profile
  • 23. pre- & post-diagnosis text organised the writings in chronological order XML archives redditor profile
  • 25. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence...
  • 26. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence... ------ ------ 2/13/13 John Doe's writings (post or comments)
  • 27. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence... ------ ------ 2/13/13 John Doe's writings (post or comments)
  • 28. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence... ------ ------ 2/13/13 ------ ------ 2/15/13 John Doe's writings (post or comments)
  • 29. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence... ------ ------ 2/13/13 ------ ------ ------ ------ 2/15/13 3/1/13 John Doe's writings (post or comments)
  • 30. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence... ------ ------ 2/13/13 ------ ------ ------ ------ ------ ------ 2/15/13 3/1/13 12/9/16 ...John Doe's writings (post or comments)
  • 31. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence... ------ ------ 2/13/13 ------ ------ ------ ------ ------ ------ 2/15/13 3/1/13 12/9/16 ...John Doe's writings (post or comments) tradeoff early decision vs more informed decision
  • 32. early prediction task detect early traces of depression for each subject, sequentially process pieces of evidence... ------ ------ 2/13/13 ------ ------ ------ ------ ------ ------ 2/15/13 3/1/13 14/9/16 ...John Doe's writings (post or comments) tradeoff early decision vs more informed decision when should I fire an alarm?
  • 33. early prediction task: performance metric After seeing k texts a system makes a binary decision dd about John Doe: d=1 => possible risk of depression d=0 => non-risk case
  • 34. early prediction task: performance metric After seeing k texts a system makes a binary decision dd about John Doe: ------ ------ 2/13/13 (1) ------ ------ ------ ------ 2/15/13 (2) 3/10/14 (k) John Doe's writings (post or comments) ... decision (d) d=1 => possible risk of depression d=0 => non-risk case
  • 35. early prediction task: performance metric ------ ------ 2/13/13 (1) ------ ------ ------ ------ 2/15/13 (2) 3/10/14 (k) John Doe's writings (post or comments) ... decision (d) ERDEO (d,k)= Early Risk Detection Error: cfp (false positive) cfn (false negative) ctp * lco (k) (true positive) 0 (true negative)
  • 36. Early Risk Detection Error: ERDEO (d,k)= cfp (false positive) cfn (false negative) ctp * lco (k) (true positive) 0 (true negative) Usually, cfn >> cfp cfn 1, cfp expected proportion of positive cases (e.g. 0.01)
  • 37. True Positive cost: ctp * lco (k) ctp cfn (late detection no detection) Latency cost function
  • 40. Training 403 83 ------ ------ ------ ------ ... ------ ------ ------ ------ 2/13/13 2/15/13 3/1/13 12/9/16 single doc representations depression language classifier ------ ------ ------ ------ ... ------ ------ ------ ------ 3/23/13 3/25/13 1/3/14 2/19/15 -------- -------- -------- John Doe Jane Doe Jane Doe John Doe -------- -------- -------- . . .. . 1:0.4 2:0.5 ..........+1 1:0.3 3:0.7 ..........-1 . . . feature-based representations (tfidf weights) logistic regression (L1 regularisation)
  • 41. Test 352 54 random (after 1st message) ------ ------ ------ ------ ... ------ ------ ------ ------ 2/13/13 2/15/13 3/1/13 14/9/16 rand ({0,1}) . . .
  • 42. Test 352 54 minority class (after 1st message) ------ ------ ------ ------ ... ------ ------ ------ ------ 2/13/13 2/15/13 3/1/13 14/9/16 1 (risk case)
  • 43. Test 352 54 first n 1 2 n ------ ------ ... ------ ------ ------ ------ ... 2/13/13 2/15/13 3/1/13 depression language classifier decision
  • 44. Test 352 54 dynamic 1 2 n ------ ------ ... ------ ------ ------ ------ ... 2/13/13 2/15/13 3/1/13 depression language classifier confident about risk? we finish and predict 1 (risk case) yes
  • 45. Test 352 54 dynamic 1 2 n ------ ------ ... ------ ------ ------ ------ ... 2/13/13 2/15/13 3/1/13 depression language classifier confident about risk? we wait and see more evidence... no
  • 46. Test 352 54 dynamic 1 2 n ------ ------ ... ------ ------ ------ ------ ... 2/13/13 2/15/13 3/1/13 depression language classifier confident about risk? we finish and predict 1 (risk case) yes
  • 47. Test 352 54 dynamic 1 2 n ------ ------ ... ------ ------ ------ ------ ... 2/13/13 2/15/13 3/1/13 depression language classifier confident about risk? we wait and see more evidence... no
  • 48. random/minority: poor F1 & ERDE first n: good F1 but slow at detecting risk cases dynamic: best balance between correctness & time results
  • 49. new collection on depression & language early risk detection algorithms (preliminary baselines) methodology for benchmark construction temporal dimension conclusions
  • 50. David E. Losada Fabio Crestani A Test Collection for Research on Depression and Language Use We also thank the Ministerio de Econom鱈a y Competitividad of the Goverment of Spain & FEDER Funds (ref. TIN2015-64282-R) This research was funded by the Swiss National Science Foundation (project Early risk prediction on the Internet: an evaluation corpus, 2015)
  • 51. Acknowledgements: Ehnero. picture pg 1.CC BY NC 2.0. Gerald Gabernig. picture pg 2.CC BY 2.0. ankxt. picture pg 3.CC BY 2.0. NEC Corporation of America. picture pg 4.CC BY 2.0. Jordi Borrs i Viv坦. picture pgs 5-6 .CC BY NC ND 2.0. Helen Harrop. picture pg 7.CC BY SA 2.0. Nilufer Gadgieva. picture pg 8.CC BY NC 2.0. Alix May. picture pg 9.CC BY NC 2.0. Justin Lincoln. picture pg 10.CC BY SA 2.0. Grace McDunnough. picture pgs 11-18 (top).CC BY NC ND 2.0. Andy Kennelly. picture pgs 19-21.CC BY NC 2.0. Joel Olives. picture pgs 22-23 (left).CC BY 2.0. Tim Morgan. picture pg 23 (right).CC BY 2.0. Conor Lawless. picture pg 24.CC BY 2.0. Oscar Rethwill. picture pgs 25-32.CC BY 2.0. Emily. picture pgs 33-37.CC BY NC 2.0. Tiberiu Ana. picture pg 38.CC BY 2.0. woodleywonderworks. picture pg 39 (left), 40 (left).CC BY 2.0. Niko Kaiser. picture pg 39 (right), 41-47.CC BY 2.0. John Sheets. picture pg 48.CC BY NC 2.0. Anders Sandberg. picture pg 49.CC BY NC 2.0. See-ming Lee. picture pg 51.CC BY NC 2.0.