This document describes a game designed to crowdsource the labeling of text data for computational linguistics research. The game leverages human motivation to play games for entertainment to solve problems like sarcasm detection that are difficult for computers. Players work in pairs to categorize short texts as positive, negative, or sarcastic. Matching categorizations earn points, with bonus points for matching keywords. The goal is to build a labeled dataset that can train machine learning algorithms through self-organization as players form clusters around different difficulty levels.
5. +
Self Organization in Humans
n??
Millions of people play games and spent countless hours for
entertainment.
n??
To give you an example:
Time spent in 1 year > 9 Billion
Time took to built this = 7 million hours
6. +
Self Organization in Humans
n??
Millions of people play games and spent countless hours for
entertainment.
n??
To give you an example:
Time spent in 1 year > 9 Billion
Time took to built this = 20 million hours
7. +
Self Organization in Humans
n??
Could we use this human time to do something useful, while
people still get entertained.
n??
We think, yes!
n??
So, we designed a game which people play purely for
entertainment.
n??
As a side effect of their playing this game, they solve a
problem which computers currently can¡¯t do easily.
8. +
Problems hard for computer ¨C Easy
for Humans
n??
Machine translation.
n??
Identifying objects in given image (Computer vision).
n??
Detecting sarcasm in given text.
n??
And many more¡
9. +
Detecting Sarcasm in Text
n??
Sarcasm transforms the polarity of an apparently positive or
negative utterance into its opposite.
n??
Why it¡¯s difficult: Some of the best approaches for
computational linguistics relies on machine learning
techniques which require large dataset. However, the
currently available datasets are small so limits training of
algorithms.
n??
Our aim in this project is to construct a corpus of text for
computational linguistics researcher to train their existing
algorithms or create more accurate computer linguistic
algorithms.
10. +
Did you get it?
Game Rules
n??
Web-based multiplayer game
n??
2 users to play a single instance of game.
n??
Players have limited communication with each other and
can¡¯t know their partner¡¯s identity.
n??
Each user gets a small paragraph (at max 4 sentences).
n??
User categorizes it as positive, negative or sarcastic. User can
also give two words which she finds useful to identify that.
Each matching word with other partner fetches bonus score.
13. +
155
1:30
Time Left
Did you get it?
Sometimes I need what only you can provide:
your absence.
Score
This text is:
Positive
Negative
Sarcastic
Points: 4
Key Words
Pass
Submit
14. +
985
0:22
Time Left
Did you get it?
Score
Marriage is the chief cause of divorce!
This text is:
Positive
Negative
Sarcastic
Points: 6
Key Words
Pass
Submit
15. +
65
2:55
Time Left
Did you get it?
Score
The 100% American is 99% idiot.
This text is:
Positive
Negative
Sarcastic
Points: 5
Key Words
100%
99%
Pass
Submit
16. +
Did you get it?
Game Rules: Reiteration
n??
Web-based multiplayer game
n??
2 users to play a single instance of game.
n??
Players have limited communication with each other and
can¡¯t know their partner¡¯s identity.
n??
Each user gets a small paragraph (at max 4 sentences).
n??
User categorizes it as positive, negative or sarcastic. User can
also give two words which she finds useful to identify that.
Each matching word with other partner fetches bonus score.
18. +
0
9:58
Time Left
Did you get it?
Score
Por favor, tr¨¢eme un vaso de agua.
This text is:
Positive
Negative
Sarcastic
Points: 23
Key Words
Pass
Submit
20. +
480
5:15
Time Left
Did you get it?
Where she sits she shines, and where she
shines she sits.
Score
This text is:
Positive
Negative
Sarcastic
Points: 25
Key Words
Pass
Submit
21. +
How it works?
n??
User starts with a initial score of zero.
n??
User chooses the pack he wants to play with. Pack 1 being
the easiest and Pack 4 being the most difficult.
n??
Each time user agrees with his/her partner on a particular
text, their score increases by points mentioned below the
text.
n??
If one user clicks on Pass, other user also have to do the
same. He can¡¯t choose any other option until he passes that
question.
22. +
How it works?
n??
Bonus Score:
n??
n??
n??
n??
Let¡¯s say the score of given text is X points.
If players have submitted one key word same, both gets a bonus
score of X.
If players submitted both key words same, they get a bonus score
of 3X.
Key words are useful metadata to train computational linguistics
algorithm.
23. +
65
2:55
Time Left
Did you get it?
Score
The 100% American is 99% idiot.
This text is:
Positive
Negative
Sarcastic
Points: 5
Key Words
100%
99%
Pass
Submit
24. +
How it works?
n??
Dataset:
n??
n??
n??
Twitter dataset with userid (@userid) and hashtag (#hashtag)
removed.
In practice, any sort of data can be used. Eg. Product reviews,
opinions, etc.
Implementation:
n??
JavaScript and Nodejs (>90%) + Python (<10%)
n??
Mongodb for storing data
Platform: Heroku
n??
25. +
How it works?
n??
When N players agrees that a specific text to be of positive/
negative/sarcastic type, we tag that text with specified type.
n??
When users submit result for specific text, the points
associated with text gets recalculated by:
n??
Points = Points + Pheromone * weighted sum
n?? When users agree, pheromone = 1 else pheromone = -1.
n??
Weighted sum adjusts the points of the text.
In our game:
N = 10
Weighted sum = 0.02
26. +
Self Organization of Players
n??
Task: Constructing corpus for computational linguistics
researchers
n??
Agents: Players
n??
Program: Incentive structure ¨C Points and Entertainment
n??
Patterns formation: Clusters around different difficulty packs
n??
Communication: Pheromone based model
28. +
Communication Scheme
n??
Inspired from Pheromone based Model
n??
Pheromones are values that are used to alter the points
associated with each sentence on the basis of the following
formula
n??
Pt = Pt-1 + (pheromone intensity)*W
Pt : Points associated with sentence at time t
Pt-1 : Points associated with sentence at time t-1
Pheromone intensity =1 when players disagree
Pheromone intensity =-1 when players agree
W : Weighted constant 0<W<1
29. +
Agents
n??
Users are synonymously referred as agents
n??
Agents have two operations :
sense_sentence_score(sentence_id)
modify_sentence_score(sentence_id)
31. +
Self Organization: Different
Perspective
Robot Based Model
Human Based Model
Task
Moving a rock
Creating metadata for ML
algorithms
Agent
Robots
Humans
Program
Set of rules
Incentive structure (more points
or entertainment or both)
Communication
mechanism
Electromagnetic radiation
Pheromone deposit
Patterns
Local interactions causing
global behavior
People forming clusters
Fault-Tolerant
Yes
Yes. System is independent of
agents
Dependence on It depends
Initial Conditions
Yes
32. +
Conclusion
n??
Play game.
n??
Better still, collaborate with us to further develop this game.
Email us for accessing code via GitHub.
n??
When robots become dominant, they will still need humans.
So, our species is safe.