This document summarizes research on analyzing the veracity and velocity of social media content during breaking news events. The researchers studied posts on Twitter during the first six hours after the November 2015 Paris shootings. They analyzed five viral social media posts, three from eyewitnesses and two that were later debunked. The research aimed to determine how quickly eyewitness posts could be identified and how to filter content from trusted versus untrusted sources to help verify information. The results showed trusted sources attributed to viral posts within an hour and eyewitness posts could be identified and ranked highly within the first five minutes. The research aims to help journalists verify information more quickly during breaking news.
1 of 15
Download to read offline
More Related Content
Veracity & Velocity of Social Media Content during Breaking News
1. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Stefanie Wiegand & Stuart E. Middleton
University of Southampton IT Innovation Centre
{sw,sem}@it-innovation.soton.ac.uk
Veracity & Velocity of Social Media Content
during Breaking News:
Analysis of November 2015 Paris Shootings
2. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium 1
Introduction
Experiment
Results
Discussion
Future work
Overview
3. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
What's this all about?
2
Problems:
Journalists doing breaking UGC verification speed vs. accuracy
Echo chamber can make false rumours go viral
Automate information gathering Journalists make the final decision
Ideas:
First 60 mins of a UGC post filter by attribution to trusted sources
Visualise traffic patterns for posts attributed to trusted and untrusted sources
Can traffic analysis help to verify / debunk content?
First 5 mins rank UGC not seen before by mention count
Provide a ranked list of likely eyewitness UGC every 5 mins
Can we produce a high quality eyewitness UGC feed?
Introduction
4. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment setup
3
Data
5 viral UGC posts (3 eyewitness, 2 debunked) - manually identified
38GB of serialised data covering the first 6h after the first attack
5.9M posts, ~40k attributed sources, ~418k unique URLs
~160k - 1.8M posts in the first hour per UGC test case
Technology
Target UGC Image/Video TinEye Duplicate Images/Videos
Posts Text extraction Sources PostgreSQL
PostgreSQL Triple store Trust knowledge model Trusted posts
Experiment
5. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment method
4
Verification (Experiment 1)
Filter (un-)trusted content in first 60 mins of 5 target UGC posts
Examine velocity of trusted and untrusted sources mentioning target UGC
When is target UGC attributed to trusted sources?
Identification (Experiment 2)
Temporally segment first 5 mins of posts for 5 target event times
Filter old URLs (including alternative URLs)
Rank by mention frequency
Does target UGC appear highly in ranked list?
Experiment
6. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P1
5
Results
0
50
100
150
200
250
300
350
400
10 20 30 40 50 60
contentitems[#]
time [min]
P1
trusted unknown untrusted total
7. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P2
6
Results
0
200
400
600
800
1000
1200
10 20 30 40 50 60
contentitems[#]
time [min]
P2
trusted unknown untrusted total
8. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P3
7
Results
0
50
100
150
200
250
10 20 30 40 50 60
contentitems[#]
time [min]
P3
trusted unknown untrusted total
9. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case P3
7
Results
0
1
2
3
4
5
10 20 30 40 50 60
contentitems[#]
time [min]
trusted/untrusted P3
trusted untrusted
10. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case D1
9
Results
0
500
1000
1500
2000
2500
3000
3500
10 20 30 40 50 60
contentitems[#]
time [min]
D1
trusted unknown untrusted total
11. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 1 - Case D2
9
Results
0
500
1000
1500
2000
2500
3000
3500
10 20 30 40 50 60
contentitems[#]
time [min]
D2
trusted unknown untrusted total
12. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Experiment 2
11
Results
Target Image ID P1 P2 P3 D1 D2
number of followers of author 335 1.4k 218 2.8k 151k
content likes 11 408 35 17k 29k
content retweets 83 3.3k 194 22k 30k
total # of tweets
in 60 minute window
483918 162111 811079 1501000 1837173
total # of unique mentioned URLs in
60 minute window
785 4331 535 7907 13252
ranking of target image set in total for
5 minute segment
(top x percent)
9 / 653
(2%)
1 / 603
(1%)
61 / 1097
(6%)
427 / 11605
(4%)
1 / 11337
(1%)
total number of eyewitness content in
5 minute segment
25 2 12 29 30
unique number of eyewitness content
in 5 minute segment
4 1 4 13 14
13. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
How is this useful to journalists?
12
Posts by trusted matter for verification
Wisdom of the crowds is not always wisdom at all
Twitter "echo chamber" is less useful than a post by a trusted source
Easier/faster to spot new eyewitness UGC
Filter feeds to 10s of posts not 1000s of posts
Reduce information overload for journalists in first 5 mins
Additional analysis can improve eyewitness UGC further
Eyewitness classification
Image analysis (e.g. Exif metadata)
Author profile pages
Discussion
14. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium
Where to go from here
13
Cross check known facts
Extend knowledge model to support this
e.g. image classification of weather/lighting time & location of event
e.g. mentions of known event actors
Use linked open data to visualise source bias
this can include political, religious or other bias
Observational study of journalists verifying UGC
Journalist experts show best practice verification on specific examples
We train our algorithms on observed best practice
We check our algorithms results against journalists ground truth
Future work
15. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu 息 2016 REVEAL consortium 14
Any questions?
Stefanie Wiegand & Stuart E. Middleton
University of Southampton IT Innovation Centre
email: {sw|sem}@it-innovation.soton.ac.uk
web: www.it-innovation.soton.ac.uk
twitter: @RevealEU, @IT_Innov, @stuart_e_middle
Many thanks for your attention!