際際滷

際際滷Share a Scribd company logo
Big Data, Big Tourism
Tourism and Mechanics
/sirmmo/big-data-big-tourism
Big data, big tourism
What are 束Big Data損?
 Excel gets stuck working a
dataset? => 束medium損 data
 Stata/R suffer working a
dataset? => 束big損 data
Where do we get the data?
 Tourists
 Have sensors
 Are sensors
 Are actors
 Attractions
 Are sensors
 Are actors
 Hotels, restaurants
 Are sensors
 Have sensors
Can we access the data?
 Tourists
 Have sensors
 Are sensors
 Are actors
 Attractions
 Are sensors
 Are actors
 Hotels, restaurants
 Are sensors
 Have sensors
Can we access the data?
 Tourists
 Have sensors
 Are sensors
 Are actors
 Attractions
 Are sensors
 Are actors
 Hotels, restaurants
 Are sensors
 Have sensors
Can we access the data?
 Tourists
 Have sensors
 Are sensors
 Are actors
 Attractions
 Are sensors
 Are actors
 Hotels, restaurants
 Are sensors
 Have sensors
Government
Can we access the data?
 Tourists
 Have sensors
 Are sensors
 Are actors
 Attractions
 Are sensors
 Are actors
 Hotels, restaurants
 Are sensors
 Have sensors
Private Sector
Can we access the data?
 Tourists
 Have sensors
 Are sensors
 Are actors
 Attractions
 Are sensors
 Are actors
 Hotels, restaurants
 Are sensors
 Have sensors
Private SectorGovernment
Open(able/ish)
Data
Almost
always
Ok so who owns that data?
 Government
 Bureaucracy-driven data
 Incoherent
 Inconsistent
 Irregular production
 Private Sector
 Deeply integrated with user
experience
 Very 束behavioral損, and as such
very 束real損
 Very business-oriented metrics
Ok so who owns that data?
 Government
 Bureaucracy-driven data
 Incoherent
 Inconsistent
 Irregular production
 Private Sector
 Deeply integrated with user
experience
 Very 束behavioral損, and as such
very 束real損
 Very business-oriented metrics
Ok so who owns that data?
 Government
 Bureaucracy-driven data
 Incoherent
 Inconsistent
 Irregular production
 Private Sector
 Deeply integrated with user
experience
 Very 束behavioral損, and as such
very 束real損
 Very business-oriented metrics
Scraping
 Time consuming
 Power consuming
 Illegal (up to a certain point)
 Unavoidable (up to a certain
point)
Scraping
 It relies on the fact that (most)
web is based on HTML
 And HTML is text
 And JavaScript is text
 And CSS is text
 Everything can be read before
the render
Scraping
 It relies on the fact that (most)
web is based on HTML
 And HTML is text
 And JavaScript is text
 And CSS is text
 Everything can be read before
the render
 Or after the render
Tools
 Not easy for 束complex損 sites
 Some cases come up
 Some tools help
 Maybe knowledge of Xml Query
Language or CSS required
 Some tools are very advanced
 Selenium browser driver
 束headless損 browsers
 Chrome
 https://chrome.google.com/webstore/detai
l/scraper/mbigbapnjcgaffohmbkdlecaccepn
gjd?hl=en
 https://chrome.google.com/webstore/detai
l/web-
scraper/jnhgnonknehpejjnehehllkliplmbmh
n?hl=en
 https://chrome.google.com/webstore/detai
l/advanced-web-
scraper/gpolcofcjjiooogejfbaamdgmgfehgff
 Firefox
 https://addons.mozilla.org/en-
US/firefox/addon/datascraper/
 Web
 https://www.import.io/
 https://scrapinghub.com/portia/
Cases and issues of scraping
 Booking.com
 Amazing website
 Easy navigation for the user
 Issues
 They know!!!
 The website gets a complete
structural overhaul every 6-9
months
 They tend to hate scrapers
 The webpage is empty at the
beginning
Cases and issues of scraping
 Booking.com
 Amazing website
 Easy navigation for the user
 Issues
 They know!!!
 The website gets a complete
structural overhaul every 6-9
months
 They tend to hate scrapers
 The webpage is empty at the
beginning
Cases and issues of scraping
 AirBnB
 Nice navigation
 Full overhaul every 3 months
 Issues
 The page really tracks what kind of
user is accessing
 The visible pages are 13 (only)
 They are randomly generated
every day for the major areas
Cases and issues of scraping
 Weather
 Many sources
 Many formats
 Issues
 Normalization of vocabulary
 Bad weather == Rain == Rainy ==
Cloud Icon == ???
 Normalization of ranges
 Normalization of numbers
 Normalization of periodicity
Apps
Questionnaire
to get user to
explicitly give
data
Information
driven
application to
track user
data
Gamification
and/or
information
platform to
elaborate
and give data
back
Explicit data
 Relies on the users knowing
actions
 Requires real willing acceptance
for sharing information
 Stops at politically correctness
 Implies (almost always)
anonimity
 Questionnaire
 In-place review
 In-place comment
 Bureaucracy
Big data, big tourism
Big data, big tourism
Behavioral data
 Almost always true
 Difficult to get
 Easily contextualizable
 Interactive
 Interconnected
 Application
 Platform
 Social Media integration
 Gamification
 Social Media involvement
Cool, so what can be done?
Getting Data
 Municipalities are setting up
open wireless networks.
 Users can be tracked.
 Services can be offered (and
instrumented)
 Museums can track users within
their premises
 Social Media interactions
Using Data
 Analysis of context of specific
behaviours
 Automated storytelling for city
visits
 Pricing methodologies
 Destination brand analysis
Big and Big-ish Data Tools
 The problem is computational
power
 Lots of work on AI
 Classification
 Generation
 Machine Learning
 Correlations
 DataWarehouses
 Mondrian -
http://community.pentaho.com/projects/
mondrian/
 Big Data DBs
 Cassandra - http://cassandra.apache.org/
 Hadoop - http://hadoop.apache.org/
 Big Data Search
 BigQuery -
https://cloud.google.com/bigquery/
 GraphQL - http://graphql.org/
 Big Data AI/ML
 TensorFlow -
https://www.tensorflow.org/
 ScikitPy - https://www.scipy.org/
A few open questions
 Impact of crowdfunding on tourism-bound projects
 Impact of meta-search-engines on pricing
 Impact (or lack thereof) of destination information websites on user
decisions
 How can the user be 束vetted損 in order to tailor the touristic
experience around her?
 Would such vetting process impact on customer return decisions?
One more thing: Watch out!!
Thanks! Questions?
@ingmmo
marco.montanari@gmail.com
http://ingmmo.com, https://medium.com/@ingmmo
sirmmo
http://it.linkedin.com/in/montanarim/
https://www.facebook.com/marco.montanari
marco.montanari
/sirmmo/big-data-big-tourism

More Related Content

Big data, big tourism

  • 1. Big Data, Big Tourism Tourism and Mechanics /sirmmo/big-data-big-tourism
  • 3. What are 束Big Data損? Excel gets stuck working a dataset? => 束medium損 data Stata/R suffer working a dataset? => 束big損 data
  • 4. Where do we get the data? Tourists Have sensors Are sensors Are actors Attractions Are sensors Are actors Hotels, restaurants Are sensors Have sensors
  • 5. Can we access the data? Tourists Have sensors Are sensors Are actors Attractions Are sensors Are actors Hotels, restaurants Are sensors Have sensors
  • 6. Can we access the data? Tourists Have sensors Are sensors Are actors Attractions Are sensors Are actors Hotels, restaurants Are sensors Have sensors
  • 7. Can we access the data? Tourists Have sensors Are sensors Are actors Attractions Are sensors Are actors Hotels, restaurants Are sensors Have sensors
  • 8. Government Can we access the data? Tourists Have sensors Are sensors Are actors Attractions Are sensors Are actors Hotels, restaurants Are sensors Have sensors Private Sector
  • 9. Can we access the data? Tourists Have sensors Are sensors Are actors Attractions Are sensors Are actors Hotels, restaurants Are sensors Have sensors Private SectorGovernment Open(able/ish) Data Almost always
  • 10. Ok so who owns that data? Government Bureaucracy-driven data Incoherent Inconsistent Irregular production Private Sector Deeply integrated with user experience Very 束behavioral損, and as such very 束real損 Very business-oriented metrics
  • 11. Ok so who owns that data? Government Bureaucracy-driven data Incoherent Inconsistent Irregular production Private Sector Deeply integrated with user experience Very 束behavioral損, and as such very 束real損 Very business-oriented metrics
  • 12. Ok so who owns that data? Government Bureaucracy-driven data Incoherent Inconsistent Irregular production Private Sector Deeply integrated with user experience Very 束behavioral損, and as such very 束real損 Very business-oriented metrics
  • 13. Scraping Time consuming Power consuming Illegal (up to a certain point) Unavoidable (up to a certain point)
  • 14. Scraping It relies on the fact that (most) web is based on HTML And HTML is text And JavaScript is text And CSS is text Everything can be read before the render
  • 15. Scraping It relies on the fact that (most) web is based on HTML And HTML is text And JavaScript is text And CSS is text Everything can be read before the render Or after the render
  • 16. Tools Not easy for 束complex損 sites Some cases come up Some tools help Maybe knowledge of Xml Query Language or CSS required Some tools are very advanced Selenium browser driver 束headless損 browsers Chrome https://chrome.google.com/webstore/detai l/scraper/mbigbapnjcgaffohmbkdlecaccepn gjd?hl=en https://chrome.google.com/webstore/detai l/web- scraper/jnhgnonknehpejjnehehllkliplmbmh n?hl=en https://chrome.google.com/webstore/detai l/advanced-web- scraper/gpolcofcjjiooogejfbaamdgmgfehgff Firefox https://addons.mozilla.org/en- US/firefox/addon/datascraper/ Web https://www.import.io/ https://scrapinghub.com/portia/
  • 17. Cases and issues of scraping Booking.com Amazing website Easy navigation for the user Issues They know!!! The website gets a complete structural overhaul every 6-9 months They tend to hate scrapers The webpage is empty at the beginning
  • 18. Cases and issues of scraping Booking.com Amazing website Easy navigation for the user Issues They know!!! The website gets a complete structural overhaul every 6-9 months They tend to hate scrapers The webpage is empty at the beginning
  • 19. Cases and issues of scraping AirBnB Nice navigation Full overhaul every 3 months Issues The page really tracks what kind of user is accessing The visible pages are 13 (only) They are randomly generated every day for the major areas
  • 20. Cases and issues of scraping Weather Many sources Many formats Issues Normalization of vocabulary Bad weather == Rain == Rainy == Cloud Icon == ??? Normalization of ranges Normalization of numbers Normalization of periodicity
  • 21. Apps Questionnaire to get user to explicitly give data Information driven application to track user data Gamification and/or information platform to elaborate and give data back
  • 22. Explicit data Relies on the users knowing actions Requires real willing acceptance for sharing information Stops at politically correctness Implies (almost always) anonimity Questionnaire In-place review In-place comment Bureaucracy
  • 25. Behavioral data Almost always true Difficult to get Easily contextualizable Interactive Interconnected Application Platform Social Media integration Gamification Social Media involvement
  • 26. Cool, so what can be done? Getting Data Municipalities are setting up open wireless networks. Users can be tracked. Services can be offered (and instrumented) Museums can track users within their premises Social Media interactions Using Data Analysis of context of specific behaviours Automated storytelling for city visits Pricing methodologies Destination brand analysis
  • 27. Big and Big-ish Data Tools The problem is computational power Lots of work on AI Classification Generation Machine Learning Correlations DataWarehouses Mondrian - http://community.pentaho.com/projects/ mondrian/ Big Data DBs Cassandra - http://cassandra.apache.org/ Hadoop - http://hadoop.apache.org/ Big Data Search BigQuery - https://cloud.google.com/bigquery/ GraphQL - http://graphql.org/ Big Data AI/ML TensorFlow - https://www.tensorflow.org/ ScikitPy - https://www.scipy.org/
  • 28. A few open questions Impact of crowdfunding on tourism-bound projects Impact of meta-search-engines on pricing Impact (or lack thereof) of destination information websites on user decisions How can the user be 束vetted損 in order to tailor the touristic experience around her? Would such vetting process impact on customer return decisions?
  • 29. One more thing: Watch out!!