狠狠撸

狠狠撸Share a Scribd company logo
Matteo	Manca	@mattemanca
matteomanca@gmail.com
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data,	London,	UK	– November	15th,	2017
From	Social	Science	to	Computational	Social	Science:
Is	Web	Data	the	Key	to	a	More	Effective	Analysis?
Traditional	social	science	approaches
? Pros:	accurate	information	about	user	behaviour	(residence,	mobility	
patterns,	and	habits	in	general).	
? Cons: high	costs,	small	samples,	limited	in	space	and	time,	updated	with	
a	very	low	frequency.
Subjects	are	aware	of	the	fact	that	they	are	being	studied	(this	
generates	a	bias!)
Surveys,	interviews:	
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Social	Science: Study	of	social	phenomena,	i.e.how	people	interact	
to	produce	collective	behavior
New	approaches:	computational	social	science
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
? Shopping
? Economic	activities
? Social	media	activity
? Traveling
? Professional
? Preferences	and	
Opinions	
? Connections
? ...
Social	media	systems	
Smart	devices
Digital	Behavioral	data
Digital	Behavioral	Data
Opportunity for	studying	human	
behavior	and	interactions
Challenges:
? How	to	access	to	data
? Work	with	huge	amount	of	data
? How	to	use	this	data:
? Ethical	concerns
? Privacy
? …	
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Wireless	sensors	:	
? Pros:	wrt	surveys	no	limitation	in	time	and	high	update	frequency
? Cons:	high	costs due	to	the	installation	and	management	of	the	sensors,	
and	the	spatial	limitation.	
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Mobile	Phone	Network	:	
? Pros: large scale studies, good update frequency, no limitation in time
and space
? Cons: not free and public availability of the data due to privacy, security,
and proprietary reasons.
New	sources	of	data
? Pros: covers all aspects of user behavior and life, no temporal or
spatial limitations, allows large-scale studies, accessible in
(almost) real time
? Cons: Data sampling, might be not fully accessible, etc .
Social	Media	Data
New	sources	of	data
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data,	London,	UK	– November	15th,	2017
Digital	Behavioral	Data	Case	Study:
Using social media to characterize urban mobility patterns: State-of-the-art survey and case-study
Digital	Behavioral	Data	Case	Study
Research Question:
To what extent social media data can be exploited to gain knowledge about urban
dynamics and mobility patterns in a city or in a urban area in general?
Barcelona Case study:
Explore urban mobility
patterns: local citizens vs
tourists.
[Using social media to characterize urban mobility patterns: State-of-the-art survey and case-study.
Matteo Manca, Ludovico Boratto, Victor Morell Roman, Oriol Martori i Gallissà, Andreas
Kaltenbrunner – Online Social Networks and Media (OSNEM)]
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Digital	Behavioral	Data	Case	Study
Dataset: Tweet published in Barcelona from Jan 01,2015 to Dec 31, 2015
Pre-processing: Data cleaning, filtering and application of a heuristic to classify locals
and tourists;
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Digital	Behavioral	Data	Case	Study
One-hop paths performed by users in Barcelona:
Shorter paths are visualized
through warmer colors, the
longer a paths the colder
the color tone.
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Digital	Behavioral	Data	Case	Study
Flow	diagrams	of	locals	between	districts	comparing	working	days	and	weekends.	
Locals during working days Locals during weekend
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
● CV: Ciutat Vella
● Ex: Eixample
● Gr: Gràcia
● HG: Horta-Guinardó
● LC: Les Corts
● NB: Nou Barris
● SA: Sant Andreu
● SM: Sant Martí
● Mj: Sants-Montjui?c
● SG: Sarrià-Sant
Gervasi.
Digital	Behavioral	Data	Case	Study
Flow	diagrams	of	tourists	between	districts	comparing	working	days	and	weekends.	
Tourists	during	working	days Tourists during weekend
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
● CV: Ciutat Vella
● Ex: Eixample
● Gr: Gràcia
● HG: Horta-Guinardó
● LC: Les Corts
● NB: Nou Barris
● SA: Sant Andreu
● SM: Sant Martí
● Mj: Sants-Montjui?c
● SG: Sarrià-Sant
Gervasi.
Digital	Behavioral	Data	Case	Study
Analysis	of	user	behavior
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Probability that a user visit L locations Frequency-ranking of top-L most frequent tweet locations
for users who tweet from at least for 5, 10, 30 or 50
different locations.
Digital	Behavioral	Data	Case	Study
Path	statistics	for	tourists	and	locals
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Boxplots with paths distribution
Cummulative distribution function
● The	most	central	district	(like	Ciutat	Vella)	are	more	visited	during	weekends	to	the	
detriment	of	others	like	Nou	Barris.
● Tourists	have	the	same	behavior	during	working	days	and	during	weekends.
● Most	of	the	tourists	paths	involve	the	two	most	touristic	districts	of	Barcelona,	i.e.,	
Ciutat	Vella	and	Eixample.
● Locals	are	more	likely	to	cover	short	or	long	distances,	while	tourists	are	more	common	
to	cover	intermediate	distances.
● In	multi-hop	paths,	the	average	distance	per	hop	is	inversely	proportional	to	the	
number	of	path	hops.
● Independently	of	the	number	of	path	hops,	tourists	are	inclined	to	perform	on	average	
paths	that	involve	longer	hops	in	comparison	to	those	of	the	locals.
● The	probability	to	find	a	user	in	a	location	with	ranking	L	can	be	approximated	by	the	
function	1/L.
Digital	Behavioral	Data	Case	Study
Case	Study	Conclusions
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data,	London,	UK	– November	15th,	2017
Open	Issues
Digital	Behavioral	Data:	open	issues
Bias	in	data
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Human	data	collected	through	the	web	are	biased:
● Gender
● Age
● Technological
● Racial
● …
Example:	Social	media	data	or	mobile	phone	data		may	be	biased.
Biased
Input
Algorithm
Biased
Results
Digital	Behavioral	Data:	open	issues
Activity	bias
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
[Baza-Yates	and	Saez-Trumper,	Hypertext'15]
Facebook Amazon Review Twitter Wikipedia
Percentage	of	users	creating	50%	of	content
Digital	Behavioral	Data:	open	issues
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
● Not	fully	accessible
○ Sampling	bias:	data	retrieved	through	Twitter’s	public	streaming	APIs	cannot	
exceed	1%	of	all	tweets	being	tweeted	at	a	specific	moment	(hard	to	estimate	
bias).	
● Bots,	organizations,	media	are	“special”	users.
● Privacy	issues
Andreas	Kaltenbrunner,	David	Laniado,	Ludovico	Boratto,	Pablo	Aragón
Acknowledgement
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data	Workshop	@mattemanca
Matteo	Manca	@mattemanca
matteomanca@gmail.com
Addressing	Big	Societal	Challenges	with	Digital	Behavioral	Data,	London,	UK	– November	15th,	2017
Thank	you!

More Related Content

From Social Science to Computational Social Science: Is Web Data the Key to a More Effective Analysis?