�ݺ�ߣ

Comments Engineering
Flocking comments

Recognizing I have a problem
● Addicted to news
● My typical browsing pattern :
– Google News
– Hey that's interesting
– Read one article
– Read all the comments about the news in all the
journals especially those where I know I'll disagree with
the general opinion

RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline

RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
Timestamp1
+
URLHash1
Timestamp2
+
URLHash2
Timestamp3
+
URLHash3
Timestamp4
+
URLHash4
links Json Object Json Object Json Object Json Object
Json contains: Title, URL, Description, PubDate

RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
Real time
Filtering and classification
Runs Batch queries
(most liked users, spam detection...)
- Tweets (Topic/TweetID)
- Comments (Topic/timestamp+URL)
-Comments (User/timestamp+URL)
...

Thanks
Platynereis dumerilii
- PhD in statistics
(Cambridge University)
- Master's in CS
(french Grande Ecole)
- Master's in Bio engineering
(french Grande Ecole)
Jean-Baptiste Pettit

Why is it interesting
● Every comment is associated with article topics from the
title
– Possibility to query all the comments for a particular
topic
– Possibility to infer hot topics
– Possibility to estimate people's mood about the news
● Mixing different datasources
– Comments will be queried on a regular basis
– Twitter feed will be streamed

The data
● Keep in touch with latest articles by crawling RSS feeds
(XML but it's ok)
● For new articles get comments for 1 day via
– Facebook comments API
– Disqus API
→ JSON
● Twitter feed associated with the articles (streaming API)
→ JSON
● Data is easy to engineer if needed

�ݺ�ߣ

Demo5

More Related Content

Demo5