This document summarizes research on using Twitter data from the Czech Republic for data mining purposes. It finds that there are approximately 10,000 active Twitter users in the Czech Republic, with 44% of tweets in Czech and 33% in English. Location data shows the majority of tweets come from Prague, Brno, and Ostrava. Different data mining methods like frequency analysis and semantic similarity are used to analyze topics and identify opinion leaders. The document also finds that Czechs tweet most on Tuesdays and Thursdays, and that Twitter can sometimes predict future search trends based on concerts and events more quickly than Google.
1 of 32
Downloaded 198 times
More Related Content
Twitter as a data mining source
1. Czech Twitter
a data mining source
Josef lerka, WebExpo 2009
Twitter is a free social networking and micro-
blogging service that enables its users to send and
read messages knows as tweets.
Tweets are text-based posts of up to 140 characters
displayed on the author兵s pro鍖le page and delivered
to the author兵s subscribers who are known as
4. Data mining is the process of extracting
patterns from data. As more data are gathered,
data mining is becoming an increasingly
important tool to transform there data into
Different variations would be text mining,
web mining including semantic analysis
5. Twitter Data mining
- makes it easy to use all data mining methods
- adds 併併time兵兵 & 併併space兵兵
- provides real-time picture
- easy connects with other social media (about 30%
users have unique nickname for all platforms)
6. Data mining - different methods
- different variations of semantic distance of
similarities (Jaccard index)
- frequency analysis based on time (are people
happier in the morning or in the evening?)
- frequency analysis based on location
- one of the results -> identi鍖cation of opinion
makers in the social networks
8. Transmission News = 5 APIs in one
5x Twitter News Service accounts
1x Yahoo Geo
1x Google Search AJAX
1x Google Maps
1x Open Calais
and a little bit of Wikipedia
14. Sparrow 1.0
application methodology
- archives all tweets located in Czech republic in
hourly interval via Twitter API (starting June 2009)
- automatically detects language
- identi鍖es Czech tweets with word count dictionary
- compares Czech Twitter statistics with foreign
countries兵 statistics
15. Sparrow 1.0 - June 2009 stats
- about 700.000 tweets
- created by 10,628 unique users who enabled their
geo-location (CZ) or tweeted in Czech
- 5.880 users tweeted at least once in Czech
- 2.424 Czech writing users revealed their geo-location
(usually about 30% of users do that)
16. How many Twitter users are in the Czech republic?
Between 6,000 - 8,000 users write in Czech
1.000 a転 2.000 users prefer English
There are about
10,000 active Twitter users in CR
17. What兵s the Czech Twitter dynamics?
Every four weeks the number of users with at
least one tweet rises about 25%
The number of active users rises 3-5% each week
Absolute number of tweets rises about 25% too
18. What characteristics do Czech tweets have?
2 % are RT
4 % use a 併兵#兵兵
21.5 % represent reply and conversation
34.6 % includes a link
23. This is what we兵ve learned in a few months:
- Czechs tweet most often on Tuesday or Thursday, and
the least in Saturday
Around the world the most popular day is Tuesday, and the
least is Sunday
- The number of tweets rises steadily from the beginning to
the end of the month, then falls and begins rising again.
That means people tweet more at the end of the month
than at the beginning.