The document discusses how content analytics can enhance search capabilities. It provides examples of how key phrases, collocations, and statistically improbable phrases can be used to power related searches, cluster results, and enable faceted search. Beyond search, these content analytics techniques can be applied to applications like product recommendations, social media analysis, and customer experience analytics.
33. Definitions: Collocations Collocations are phrases whose words are seen together more than you would expect given an estimate of how frequent each individual word is in the given text vs. how often they are seen together in the same text.
36. Definitions: SIPs Statistically Improbably Phrases are phrases that appear in a text more often than you would expect given how often they appear in another text.
49. SIPs at Amazon Amazon SIPs are the most distinctive phrases in the text of books in the Search Inside! program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book.
50. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements.
52. SIPs & News Topic Trending The text for the new (or you can think of it as "current") period goes from now to up to 7 days back. The text for the old (or "past") period is for the 7 days before that.
53. now new text -> (now - 7 days) text -> (now - 14 days)
54. Customer Experience Mindshare Technologies (MT) is a Voice of the Customer company who helps companies make operational improvements based on customer feedback. MT's client list includes many of the world's largest restaurant chains, hotels, car rental agencies, and telecommunications companies. Much of the feedback we collect is from surveys that contain open-ended questions where customers can leave comments. MT has used the Key Phrase Extractor to unlock the value contained in these comments. We are able to identify common problems experienced by customers and are even able to detect emerging topics that are starting to catch fire . Mindshare's clients are able to leverage this information and make operational changes that improve customer experiences .