際際滷

際際滷Share a Scribd company logo
Comments Engineering
Flocking comments
Recognizing I have a problem
 Addicted to news
 My typical browsing pattern :
 Google News
 Hey that's interesting
 Read one article
 Read all the comments about the news in all the
journals especially those where I know I'll disagree with
the general opinion
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
Timestamp1
+
URLHash1
Timestamp2
+
URLHash2
Timestamp3
+
URLHash3
Timestamp4
+
URLHash4
links Json Object Json Object Json Object Json Object
Json contains: Title, URL, Description, PubDate
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
Real time
Filtering and classification
Runs Batch queries
(most liked users, spam detection...)
- Tweets (Topic/TweetID)
- Comments (Topic/timestamp+URL)
-Comments (User/timestamp+URL)
...
RSS
Crawler
Articles
tracking
Batch
Topics inference
Fetching
Content
Comments
Fetcher
Speed Layer
Batch Layer
Json API
Frontend
Pipeline
Thanks
Platynereis dumerilii
- PhD in statistics
(Cambridge University)
- Master's in CS
(french Grande Ecole)
- Master's in Bio engineering
(french Grande Ecole)
Jean-Baptiste Pettit
Why is it interesting
 Every comment is associated with article topics from the
title
 Possibility to query all the comments for a particular
topic
 Possibility to infer hot topics
 Possibility to estimate people's mood about the news
 Mixing different datasources
 Comments will be queried on a regular basis
 Twitter feed will be streamed
The data
 Keep in touch with latest articles by crawling RSS feeds
(XML but it's ok)
 For new articles get comments for 1 day via
 Facebook comments API
 Disqus API
 JSON
 Twitter feed associated with the articles (streaming API)
 JSON
 Data is easy to engineer if needed

More Related Content

Demo5