際際滷

際際滷Share a Scribd company logo
Content Analytics for Better Search Otis Gospodneti    ≒≒  Sematext International
Agenda Intro: Otis & Sematext
Basic Search
Taming Search Results
Key Phrases
Beyond Search
About Otis Gospodneti  Member: Apache Lucene, Solr, Nutch, Mahout
Author: Lucene in Action 1 & 2
Entrepreneur: Simpy (2004), Lucene Consulting (2005), Sematext Int'l (2007)
Organizer: NY Search & Discovery Meetup
About Sematext Consulting, development, support: Big Data  (Hadoop, HBase, Voldemort...)
Search  (Lucene, Solr, Elastic Search...)
Web Crawling  (Nutch)
Machine Learning  (Mahout)
Basic Search
Taming Search Results Related searches (high query volume)
Search results clustering (fuzzy)
Named Entity Recognition (NER)
Faceted search (structured input)
Example: Related Searches
Example: Results Clustering
Example: Named Entities Sorry, no screenshot, but I  know  sites use this! Really, I do! :)
Example: Faceted Search
Content Analysis: Key Phrases Related searches

More Related Content

Key Phrases for Better Search

Editor's Notes

  1. 10 days of data (5K/min)
  2. 10 days of data (5K/min)