際際滷

際際滷Share a Scribd company logo
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Stefanie Wiegand & Stuart E. Middleton
University of Southampton IT Innovation Centre
{sw,sem}@it-innovation.soton.ac.uk
Veracity & Velocity of Social Media Content
during Breaking News:
Analysis of November 2015 Paris Shootings
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium 1
 Introduction
 Experiment
 Results
 Discussion
 Future work
Overview
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
What's this all about?
2
 Problems:
 Journalists doing breaking UGC verification  speed vs. accuracy
 Echo chamber can make false rumours go viral
 Automate information gathering  Journalists make the final decision
 Ideas:
 First 60 mins of a UGC post filter by attribution to trusted sources
 Visualise traffic patterns for posts attributed to trusted and untrusted sources
 Can traffic analysis help to verify / debunk content?
 First 5 mins rank UGC not seen before by mention count
 Provide a ranked list of likely eyewitness UGC every 5 mins
 Can we produce a high quality eyewitness UGC feed?
Introduction
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment setup
3
 Data
 5 viral UGC posts (3 eyewitness, 2 debunked) - manually identified
 38GB of serialised data covering the first 6h after the first attack
 5.9M posts, ~40k attributed sources, ~418k unique URLs
 ~160k - 1.8M posts in the first hour per UGC test case
 Technology
 Target UGC Image/Video  TinEye  Duplicate Images/Videos
 Posts  Text extraction  Sources  PostgreSQL
 PostgreSQL  Triple store  Trust knowledge model  Trusted posts
Experiment
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment method
4
 Verification (Experiment 1)
 Filter (un-)trusted content in first 60 mins of 5 target UGC posts
 Examine velocity of trusted and untrusted sources mentioning target UGC
 When is target UGC attributed to trusted sources?
 Identification (Experiment 2)
 Temporally segment first 5 mins of posts for 5 target event times
 Filter old URLs (including alternative URLs)
 Rank by mention frequency
 Does target UGC appear highly in ranked list?
Experiment
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P1
5
Results
0
50
100
150
200
250
300
350
400
10 20 30 40 50 60
contentitems[#]
time [min]
P1
trusted unknown untrusted total
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P2
6
Results
0
200
400
600
800
1000
1200
10 20 30 40 50 60
contentitems[#]
time [min]
P2
trusted unknown untrusted total
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P3
7
Results
0
50
100
150
200
250
10 20 30 40 50 60
contentitems[#]
time [min]
P3
trusted unknown untrusted total
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P3
7
Results
0
1
2
3
4
5
10 20 30 40 50 60
contentitems[#]
time [min]
trusted/untrusted P3
trusted untrusted
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case D1
9
Results
0
500
1000
1500
2000
2500
3000
3500
10 20 30 40 50 60
contentitems[#]
time [min]
D1
trusted unknown untrusted total
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case D2
9
Results
0
500
1000
1500
2000
2500
3000
3500
10 20 30 40 50 60
contentitems[#]
time [min]
D2
trusted unknown untrusted total
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 2
11
Results
Target Image ID P1 P2 P3 D1 D2
number of followers of author 335 1.4k 218 2.8k 151k
content likes 11 408 35 17k 29k
content retweets 83 3.3k 194 22k 30k
total # of tweets
in 60 minute window
483918 162111 811079 1501000 1837173
total # of unique mentioned URLs in
60 minute window
785 4331 535 7907 13252
ranking of target image set in total for
5 minute segment
(top x percent)
9 / 653
(2%)
1 / 603
(1%)
61 / 1097
(6%)
427 / 11605
(4%)
1 / 11337
(1%)
total number of eyewitness content in
5 minute segment
25 2 12 29 30
unique number of eyewitness content
in 5 minute segment
4 1 4 13 14
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
How is this useful to journalists?
12
 Posts by trusted matter for verification
 Wisdom of the crowds is not always wisdom at all
 Twitter "echo chamber" is less useful than a post by a trusted source
 Easier/faster to spot new eyewitness UGC
 Filter feeds to 10s of posts not 1000s of posts
 Reduce information overload for journalists in first 5 mins
 Additional analysis can improve eyewitness UGC further
 Eyewitness classification
 Image analysis (e.g. Exif metadata)
 Author profile pages
Discussion
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Where to go from here
13
 Cross check known facts
 Extend knowledge model to support this
 e.g. image classification of weather/lighting  time & location of event
 e.g. mentions of known event actors
 Use linked open data to visualise source bias
 this can include political, religious or other bias
 Observational study of journalists verifying UGC
 Journalist experts show best practice verification on specific examples
 We train our algorithms on observed best practice
 We check our algorithms results against journalists ground truth
Future work
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium 14
Any questions?
Stefanie Wiegand & Stuart E. Middleton
University of Southampton IT Innovation Centre
email: {sw|sem}@it-innovation.soton.ac.uk
web: www.it-innovation.soton.ac.uk
twitter: @RevealEU, @IT_Innov, @stuart_e_middle
Many thanks for your attention!

More Related Content

Veracity & Velocity of Social Media Content during Breaking News

  • 1. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Stefanie Wiegand & Stuart E. Middleton University of Southampton IT Innovation Centre {sw,sem}@it-innovation.soton.ac.uk Veracity & Velocity of Social Media Content during Breaking News: Analysis of November 2015 Paris Shootings
  • 2. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium 1 Introduction Experiment Results Discussion Future work Overview
  • 3. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium What's this all about? 2 Problems: Journalists doing breaking UGC verification speed vs. accuracy Echo chamber can make false rumours go viral Automate information gathering Journalists make the final decision Ideas: First 60 mins of a UGC post filter by attribution to trusted sources Visualise traffic patterns for posts attributed to trusted and untrusted sources Can traffic analysis help to verify / debunk content? First 5 mins rank UGC not seen before by mention count Provide a ranked list of likely eyewitness UGC every 5 mins Can we produce a high quality eyewitness UGC feed? Introduction
  • 4. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment setup 3 Data 5 viral UGC posts (3 eyewitness, 2 debunked) - manually identified 38GB of serialised data covering the first 6h after the first attack 5.9M posts, ~40k attributed sources, ~418k unique URLs ~160k - 1.8M posts in the first hour per UGC test case Technology Target UGC Image/Video TinEye Duplicate Images/Videos Posts Text extraction Sources PostgreSQL PostgreSQL Triple store Trust knowledge model Trusted posts Experiment
  • 5. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment method 4 Verification (Experiment 1) Filter (un-)trusted content in first 60 mins of 5 target UGC posts Examine velocity of trusted and untrusted sources mentioning target UGC When is target UGC attributed to trusted sources? Identification (Experiment 2) Temporally segment first 5 mins of posts for 5 target event times Filter old URLs (including alternative URLs) Rank by mention frequency Does target UGC appear highly in ranked list? Experiment
  • 6. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment 1 - Case P1 5 Results 0 50 100 150 200 250 300 350 400 10 20 30 40 50 60 contentitems[#] time [min] P1 trusted unknown untrusted total
  • 7. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment 1 - Case P2 6 Results 0 200 400 600 800 1000 1200 10 20 30 40 50 60 contentitems[#] time [min] P2 trusted unknown untrusted total
  • 8. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment 1 - Case P3 7 Results 0 50 100 150 200 250 10 20 30 40 50 60 contentitems[#] time [min] P3 trusted unknown untrusted total
  • 9. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment 1 - Case P3 7 Results 0 1 2 3 4 5 10 20 30 40 50 60 contentitems[#] time [min] trusted/untrusted P3 trusted untrusted
  • 10. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment 1 - Case D1 9 Results 0 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 60 contentitems[#] time [min] D1 trusted unknown untrusted total
  • 11. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment 1 - Case D2 9 Results 0 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 60 contentitems[#] time [min] D2 trusted unknown untrusted total
  • 12. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Experiment 2 11 Results Target Image ID P1 P2 P3 D1 D2 number of followers of author 335 1.4k 218 2.8k 151k content likes 11 408 35 17k 29k content retweets 83 3.3k 194 22k 30k total # of tweets in 60 minute window 483918 162111 811079 1501000 1837173 total # of unique mentioned URLs in 60 minute window 785 4331 535 7907 13252 ranking of target image set in total for 5 minute segment (top x percent) 9 / 653 (2%) 1 / 603 (1%) 61 / 1097 (6%) 427 / 11605 (4%) 1 / 11337 (1%) total number of eyewitness content in 5 minute segment 25 2 12 29 30 unique number of eyewitness content in 5 minute segment 4 1 4 13 14
  • 13. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium How is this useful to journalists? 12 Posts by trusted matter for verification Wisdom of the crowds is not always wisdom at all Twitter "echo chamber" is less useful than a post by a trusted source Easier/faster to spot new eyewitness UGC Filter feeds to 10s of posts not 1000s of posts Reduce information overload for journalists in first 5 mins Additional analysis can improve eyewitness UGC further Eyewitness classification Image analysis (e.g. Exif metadata) Author profile pages Discussion
  • 14. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium Where to go from here 13 Cross check known facts Extend knowledge model to support this e.g. image classification of weather/lighting time & location of event e.g. mentions of known event actors Use linked open data to visualise source bias this can include political, religious or other bias Observational study of journalists verifying UGC Journalist experts show best practice verification on specific examples We train our algorithms on observed best practice We check our algorithms results against journalists ground truth Future work
  • 15. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium 14 Any questions? Stefanie Wiegand & Stuart E. Middleton University of Southampton IT Innovation Centre email: {sw|sem}@it-innovation.soton.ac.uk web: www.it-innovation.soton.ac.uk twitter: @RevealEU, @IT_Innov, @stuart_e_middle Many thanks for your attention!