狠狠撸shows by User: opennlp
/
http://www.slideshare.net/images/logo.gif狠狠撸shows by User: opennlp
/
Tue, 21 Nov 2017 15:26:43 GMT狠狠撸Share feed for 狠狠撸shows by User: opennlpBig Data Spain 2017 - Deriving Actionable Insights from High Volume Media Streams
/slideshow/big-data-spain-2017-deriving-actionable-insights-from-high-volume-media-streams/82457728
bigdataspain2017-171121152643 Media analysts have to deal with with analyzing high volumes of real-time news feeds and social media streams which is often a tedious process because they need to write search profiles for entities. Python tools like NLTK do not scale to large production data sets and cannot be plugged into a distributed scalable frameworks like Apache Flink. Apache Flink being a streaming first engine is ideally suited for ingesting multiple streams of news feeds, social media, blogs etc.. and for being able to do streaming analytics on the various feeds. Natural Language Processing tools like Apache OpenNLP can be plugged into Flink streaming pipelines so as to be able to perform common NLP tasks like Named Entity Recognition (NER), Chunking, and text classification. In this talk, we鈥檒l be building a real-time media analyzer which does Named Entity Recognition (NER) on the individual incoming streams, calculates the co-occurrences of the named entities and aggregates them across multiple streams; index the results into a search engine and being able to query the results for actionable insights. We鈥檒l also be showing as to how to handle multilingual documents for calculating co-occurrences. NLP practitioners will come away from this talk with a better understanding of how the various Apache OpenNLP components can help in processing large streams of data feeds and can easily be plugged into a highly scalable and distributed framework like Apache Flink.]]>
Media analysts have to deal with with analyzing high volumes of real-time news feeds and social media streams which is often a tedious process because they need to write search profiles for entities. Python tools like NLTK do not scale to large production data sets and cannot be plugged into a distributed scalable frameworks like Apache Flink. Apache Flink being a streaming first engine is ideally suited for ingesting multiple streams of news feeds, social media, blogs etc.. and for being able to do streaming analytics on the various feeds. Natural Language Processing tools like Apache OpenNLP can be plugged into Flink streaming pipelines so as to be able to perform common NLP tasks like Named Entity Recognition (NER), Chunking, and text classification. In this talk, we鈥檒l be building a real-time media analyzer which does Named Entity Recognition (NER) on the individual incoming streams, calculates the co-occurrences of the named entities and aggregates them across multiple streams; index the results into a search engine and being able to query the results for actionable insights. We鈥檒l also be showing as to how to handle multilingual documents for calculating co-occurrences. NLP practitioners will come away from this talk with a better understanding of how the various Apache OpenNLP components can help in processing large streams of data feeds and can easily be plugged into a highly scalable and distributed framework like Apache Flink.]]>
Tue, 21 Nov 2017 15:26:43 GMT/slideshow/big-data-spain-2017-deriving-actionable-insights-from-high-volume-media-streams/82457728opennlp@slideshare.net(opennlp)Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media StreamsopennlpMedia analysts have to deal with with analyzing high volumes of real-time news feeds and social media streams which is often a tedious process because they need to write search profiles for entities. Python tools like NLTK do not scale to large production data sets and cannot be plugged into a distributed scalable frameworks like Apache Flink. Apache Flink being a streaming first engine is ideally suited for ingesting multiple streams of news feeds, social media, blogs etc.. and for being able to do streaming analytics on the various feeds. Natural Language Processing tools like Apache OpenNLP can be plugged into Flink streaming pipelines so as to be able to perform common NLP tasks like Named Entity Recognition (NER), Chunking, and text classification. In this talk, we鈥檒l be building a real-time media analyzer which does Named Entity Recognition (NER) on the individual incoming streams, calculates the co-occurrences of the named entities and aggregates them across multiple streams; index the results into a search engine and being able to query the results for actionable insights. We鈥檒l also be showing as to how to handle multilingual documents for calculating co-occurrences. NLP practitioners will come away from this talk with a better understanding of how the various Apache OpenNLP components can help in processing large streams of data feeds and can easily be plugged into a highly scalable and distributed framework like Apache Flink.<img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigdataspain2017-171121152643-thumbnail.jpg?width=120&height=120&fit=bounds" /><br> Media analysts have to deal with with analyzing high volumes of real-time news feeds and social media streams which is often a tedious process because they need to write search profiles for entities. Python tools like NLTK do not scale to large production data sets and cannot be plugged into a distributed scalable frameworks like Apache Flink. Apache Flink being a streaming first engine is ideally suited for ingesting multiple streams of news feeds, social media, blogs etc.. and for being able to do streaming analytics on the various feeds. Natural Language Processing tools like Apache OpenNLP can be plugged into Flink streaming pipelines so as to be able to perform common NLP tasks like Named Entity Recognition (NER), Chunking, and text classification. In this talk, we鈥檒l be building a real-time media analyzer which does Named Entity Recognition (NER) on the individual incoming streams, calculates the co-occurrences of the named entities and aggregates them across multiple streams; index the results into a search engine and being able to query the results for actionable insights. We鈥檒l also be showing as to how to handle multilingual documents for calculating co-occurrences. NLP practitioners will come away from this talk with a better understanding of how the various Apache OpenNLP components can help in processing large streams of data feeds and can easily be plugged into a highly scalable and distributed framework like Apache Flink.
]]>
12344https://cdn.slidesharecdn.com/ss_thumbnails/bigdataspain2017-171121152643-thumbnail.jpg?width=120&height=120&fit=boundspresentationBlackhttp://activitystrea.ms/schema/1.0/posthttp://activitystrea.ms/schema/1.0/posted0https://cdn.slidesharecdn.com/profile-photo-opennlp-48x48.jpg?cb=1523537601The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.opennlp.apache.org