ݺߣ

ݺߣShare a Scribd company logo
Comments Engineering
Flocking comments
Recognizing I have a problem
● Addicted to news
● My typical browsing pattern :
– Google News
– Hey that's interesting
– Read one article
– Read all the comments about the news in all the
journals especially those where I know I'll disagree with
the general opinion
Pipeline
Comments
Fetcher
Topics
inference
URL
RSS
Crawler
Comments
By
Topic
Current topics
Filtering
Classif
Streaming
API
Comments
By
Topic
Current topics
Thanks
Platynereis dumerilii
- PhD in statistics
(Cambridge University)
- Bio engineering
(french Grande Ecole)
- Master's in CS
(french Grande Ecole)
Jean-Baptiste Pettit
Why is it interesting
● Every comment is associated with article topics from the
title
– Possibility to query all the comments for a particular
topic
– Possibility to infer hot topics
– Possibility to estimate people's mood about the news
● Mixing different datasources
– Comments will be queried on a regular basis
– Twitter feed will be streamed
The data
● Keep in touch with latest articles by crawling RSS feeds
(XML but it's ok)
● For new articles get comments for 1 day via
– Facebook comments API
– Disqus API
→ JSON
● Twitter feed associated with the articles (streaming API)
→ JSON
● Data is easy to engineer if needed

More Related Content

Demo4