These are the slides of the overview of the eighth Author Profiling task at PAN-CLEF 2020 presented online. This year task aimed at Profiling Fake News spreaders on Twitter
1 of 19
Download to read offline
More Related Content
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreaders on Twitter
1. 8th Author Profiling task at PAN
Profiling Fake News Spreaders
on Twitter
PAN-AP-2020 CLEF 2020
Online, 22-25 September
Francisco Rangel
Symanto Research
Paolo Rosso
PRHLT Research Center
Universitat Politècnica de Valencia
Bilal Ghanem
Symanto Research
Anastasia Giachanou
PRHLT Research Center
Universitat Politècnica de Valencia
2. Introduction
Author profiling aims at identifying
personal traits such as age, gender,
personality traits, native language,
language variety… from writings?
This is crucial for:
- Marketing.
- Security.
- Forensics.
2
Author
Profiling
PAN’20
3. Task goal
Given a Twitter feed, determine whether
its author is keen to spread fake news or
not.
3
Author
Profiling
Two languages:
English Spanish
PAN’20
4. Corpus
4
Author
Profiling
PAN’20
(EN) English (ES) Spanish
Keen to spread
fake news
Not keen to spread
fake news
Total
Keen to spread
fake news
Not keen to
spread fake news
Total
Training 150 150 300 150 150 300
Test 100 100 200 100 100 200
Total 250 250 500 250 250 500
Methodology
1. Selection of fake news from Politifact and Snopes related sites (+ manual review).
2. Collection of tweets responding to the previous news:
2.1. Manual inspection to ensure that the tweet refers to the news.
2.2. Manual annotation of those tweets supporting vs. rejecting the news.
3. Timeline collection
3.1. Manual review of the tweets to label the fake ones.
3.2. Users with one of more fake tweets are keen to spread them. Otherwise, they are not.
3.3. Removal of tweets referring explicitly to the fake news (to avoid bias).
6. Baselines
6
Author
Profiling
PAN’20
RANDOM A baseline that randomly generates the predictions among the different classes
LSTM An Long Short-Term Memory neural network that uses FastTex embeddings to
represent texts.
CHAR N-GRAMS With values for $n$ from 2 to 6, with a SVM
WORD N-GRAMS With values for $n$ from 1 to 3, with a Neural Network
EIN The Emotionally-Infused Neural (EIN) network with word embedding and
emotional features as the input of an LSTM
Symanto (LDSE) This method represents documents on the basis of the probability distribution of
occurrence of their words in the different classes. The key concept of LDSE is a
weight, representing the probability of a term to belong to one of the different
categories: fake news spreaders / non-spreader. The distribution of weights for
a given document should be closer to the weights of its corresponding category.
LDSE takes advantage of the whole vocabulary
7. 66 participants
33 working notes
22 countries
7
Author
Profiling
PAN’20
Participation
https://mapchart.net/world.html
14. Best results at PAN'20
14
Author
Profiling
PAN’20
Buda and Bolonyai
- n-Grams
- Stylistic features
- Logistic Regression ensemble
Pizarro
- word and char n-grams
- SVM
15. Conclusions
● Several approaches to tackle the task:
○ n-Grams + SVM prevailing.
● Best results in English:
○ Over 67% on average.
○ Best (75%): Buda and Bolonyai - n-Grams + Stylistic features + Logistic Regression ensemble
● Best results in Spanish:
○ Over 73% on average.
○ Best (82%): Pizarro - char & word n-Grams + SVM.
● Error analysis:
○ English:
■ False positives (real news spreaders as fake news spreaders): 35.50%
■ False negatives (fake news spreaders as real news spreaders): 30.03%
○ Spanish:
■ False positives (real news spreaders as fake news spreaders): 20.23%
■ False negatives (fake news spreaders as real news spreaders): 35.09%
Looking at the results, we can conclude:
● It is feasible to automatically identify Fake News Spreaders with high precision
○ ...even when only textual features are used.
● We have to bear in mind false positives since especially in English, they sum up to one-third of the
total predictions, and misclassification might lead to ethical or legal implications.
15
Author
Profiling
PAN’20
17. Industry at PAN (Author Profiling)
17
Author
Profiling
Organisation
Sponsors
PAN’20
This year, the winners of the task are (ex aequo):
● Jakab Buda and Flora Bolonyai, Eötvös
Loránd University, Hungary
● Juan Pizarro, Chile