際際滷

際際滷Share a Scribd company logo
Mining the Web
how user-generated content (UGC) can become a data
source for tourism research

Peter A. Johnson and Dr. Renee Sieber, McGill University




                          TTRA Canada Annual Conference, Guelph Ontario
                          Thursday October 15, 2009
Outline

 What is user generated content (UGC)?
 Examples of tourism-related UGC
 Tripadvisor study
 Challenges to UGC
What is UGC?
 User-generated content is:
    content made publicly available over
      the Internet
    re鍖ects creative effort
    created outside of professional
      routines and practices (OECD, 2007)



http://www.oecd.org/dataoecd/57/14/38393115.pdf
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Tripadvisor study
 A popular travel rating site
 Determine the range and nature of reviews
  of Nova Scotia
 Start search queries using nova scotia and
  halifax nova scotia
 Web scrape as many reviews as possible
Web Scraping
 Specialized computer software (robot or
  spider)
 Automated extraction of website data
 Simulates clicks to drill down through a
  web page
 Outputs thousands of records in hours
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
Mining the Web: How user-generated content can become a data source for tourism research
5730 total reviews
5000

                                        4064
3750



2500

                        1513
1250


         153
   0
       Attractions    Restaurants   Accommodations

                      Reviews
Web Scraping Results
Survey vs. UGC
              Survey           UGC
 Sample
            Controlled      Uncontrolled
  Type

Question    Open/Close       Generally
 Type         Ended         Open-Ended

Research
            Investigative   Exploratory
Approach
77 Reviewed Locations
Accommodation Reviews
Attraction Reviews
                   activity
attractions
Restaurant Reviews
Total Destination Review Breakdown

Halifax
Annapolis Royal
Baddeck
Lunenburg
Dartmouth
Yarmouth
                  33%
Digby                                       40%
Other



                   3%
                     4%
                          4%           6%
                               5% 5%
Accommodation Review Ratings
One Star
Two Stars
Three Stars
Four Stars                   7%
Five Stars                        8%


                                       10%

                 53%


                                   22%
Attraction Review Ratings
One Star
Two Stars
Three Stars
Four Stars                 7%
                                5%
Five Stars
                                     9%



              56%

                                     23%
Restaurant Review Ratings
One Star
Two Stars
Three Stars
Four Stars                 7%
Five Stars                      8%

               37%

                                     17%




                          32%
Challenges with UGC

 Quality varies widely
 Vendetta/self promotion
 Legal grey area
 Generalizability?
The Future

 Data gathering and analysis:
       geolocate reviewers
       content analysis of reviews
 Secondary UGC: reviews of reviews
 Instant feedback: iPhone effect
Tripadvisor iPhone Application
Yelp iPhone Application
Take home points

 UGC is an emerging source of data for
  tourism research
 Challenges:
    getting and using UGC
    how to use results at larger scales
Thank You!
     Further Reading
   Girardin, F., Dal Fiore, F., Rattic, C, and Blatt, J. (2008) Leveraging explicitly
    disclosed location information to understand tourist dynamics: a case study.
    Journal of Location Based Services 2(1), 41-56.

   Goodchild, M.F. (2007). Citizens as Sensors: The World of Volunteered
    Geography. Geo Journal 69, 211-221.

   Gorman S P, (2007), Is academia missing the boat for the Geo Web
    revolution? A response to Harveys commentary. Environment and Planning
    B: Planning and Design 34(6), 949  950

   Haklay, Muki, Alex Singleton and Chris Parker, (2008). Web Mapping 2.0: The
    Neogeography of the GeoWeb. Geography Compass 2(6), 2011-2039.

    Contact: peter.johnson2@mail.mcgill.ca

More Related Content

Mining the Web: How user-generated content can become a data source for tourism research

  • 1. Mining the Web how user-generated content (UGC) can become a data source for tourism research Peter A. Johnson and Dr. Renee Sieber, McGill University TTRA Canada Annual Conference, Guelph Ontario Thursday October 15, 2009
  • 2. Outline What is user generated content (UGC)? Examples of tourism-related UGC Tripadvisor study Challenges to UGC
  • 4. User-generated content is: content made publicly available over the Internet re鍖ects creative effort created outside of professional routines and practices (OECD, 2007) http://www.oecd.org/dataoecd/57/14/38393115.pdf
  • 9. Tripadvisor study A popular travel rating site Determine the range and nature of reviews of Nova Scotia Start search queries using nova scotia and halifax nova scotia Web scrape as many reviews as possible
  • 10. Web Scraping Specialized computer software (robot or spider) Automated extraction of website data Simulates clicks to drill down through a web page Outputs thousands of records in hours
  • 23. 5730 total reviews 5000 4064 3750 2500 1513 1250 153 0 Attractions Restaurants Accommodations Reviews
  • 25. Survey vs. UGC Survey UGC Sample Controlled Uncontrolled Type Question Open/Close Generally Type Ended Open-Ended Research Investigative Exploratory Approach
  • 28. Attraction Reviews activity
  • 30. Total Destination Review Breakdown Halifax Annapolis Royal Baddeck Lunenburg Dartmouth Yarmouth 33% Digby 40% Other 3% 4% 4% 6% 5% 5%
  • 31. Accommodation Review Ratings One Star Two Stars Three Stars Four Stars 7% Five Stars 8% 10% 53% 22%
  • 32. Attraction Review Ratings One Star Two Stars Three Stars Four Stars 7% 5% Five Stars 9% 56% 23%
  • 33. Restaurant Review Ratings One Star Two Stars Three Stars Four Stars 7% Five Stars 8% 37% 17% 32%
  • 34. Challenges with UGC Quality varies widely Vendetta/self promotion Legal grey area Generalizability?
  • 35. The Future Data gathering and analysis: geolocate reviewers content analysis of reviews Secondary UGC: reviews of reviews Instant feedback: iPhone effect
  • 38. Take home points UGC is an emerging source of data for tourism research Challenges: getting and using UGC how to use results at larger scales
  • 39. Thank You! Further Reading Girardin, F., Dal Fiore, F., Rattic, C, and Blatt, J. (2008) Leveraging explicitly disclosed location information to understand tourist dynamics: a case study. Journal of Location Based Services 2(1), 41-56. Goodchild, M.F. (2007). Citizens as Sensors: The World of Volunteered Geography. Geo Journal 69, 211-221. Gorman S P, (2007), Is academia missing the boat for the Geo Web revolution? A response to Harveys commentary. Environment and Planning B: Planning and Design 34(6), 949 950 Haklay, Muki, Alex Singleton and Chris Parker, (2008). Web Mapping 2.0: The Neogeography of the GeoWeb. Geography Compass 2(6), 2011-2039. Contact: peter.johnson2@mail.mcgill.ca

Editor's Notes

  • #3: Outline, what to expect from the next 15 minutes