The document describes a project to collect comments associated with news articles from various sources like Facebook, Disqus, and Twitter. The comments would be organized by topic and could then be queried to analyze hot topics and public sentiment about different news stories. Data would be gathered through APIs from these sources and by crawling RSS feeds to stay current on new articles.
2. Recognizing I have a problem
● Addicted to news
● My typical browsing pattern :
– Google News
– Hey that's interesting
– Read one article
– Read all the comments about the news in all the
journals especially those where I know I'll disagree with
the general opinion
5. Thanks
Platynereis dumerilii
- PhD in statistics
(Cambridge University)
- Bio engineering
(french Grande Ecole)
- Master's in CS
(french Grande Ecole)
Jean-Baptiste Pettit
6. Why is it interesting
● Every comment is associated with article topics from the
title
– Possibility to query all the comments for a particular
topic
– Possibility to infer hot topics
– Possibility to estimate people's mood about the news
● Mixing different datasources
– Comments will be queried on a regular basis
– Twitter feed will be streamed
7. The data
● Keep in touch with latest articles by crawling RSS feeds
(XML but it's ok)
● For new articles get comments for 1 day via
– Facebook comments API
– Disqus API
→ JSON
● Twitter feed associated with the articles (streaming API)
→ JSON
● Data is easy to engineer if needed