際際滷

際際滷Share a Scribd company logo
Searching and browsing through fragments of TED Talks 
MARIELLA SABATINO  mariella.sabatino@eurecom.fr 
GO! 
25/09/2014 
1
TED is a global set of conferences, held throughout North America, Europe and Asia. 
TED Talks address a wide range of topics within the research and practice of science and culture. 
The speakers are given a maximum of 18 minutes to present their ideas in the most innovative and engaging way they can, often through storytelling. 
TED Talks 
25/09/2014 
2
Problem 
Users are overwhelmed with audiovisual content 
Users browse fast, looking for topic of interest 
Which are the fragments potentially relevant without having to watch the entire video? 
It is very difficult to find interesting documents 
25/09/2014 
3
Research questions 
how to recommend related media fragments within the same video collection 
1 
2 
3 
detect segments of interest in a video? 
recommend related media fragments within the same video collection? 
design a web application that provides a rich environment for exploring a video collection? 
HOW TO: 
25/09/2014 
4
Browsing and recommendation of Media Fragments of TED Talks based on entities extracted in the subtitles 
Integration of the Media Fragments concept and the subtitles enrichment performed by NERD on a Node.js server 
HyperTED 
25/09/2014 
5
Research question 1 
how to recommend related media fragments within the same video collection 
1 
2 
3 
detect segments of interest in a video? 
recommend related media fragments within the same video collection? 
design a web application that provides a rich environment for exploring a video collection? 
HOW TO: 
25/09/2014 
6
2 
3 
What is a NER task? 
1 
Named Entity Recognition (NER) aims to locate and classify elements of textual document into pre-defined categories such as: 
People names; 
Organizations names; 
Places; 
Temporal and numerical expressions. These elements and the categories take respectively the name of entities and ontologies. 
25/09/2014 
7
2 
3 
For example 
1 
This is Nikita, a security guard from one of the bars in St. Petersburg. 
This is Nikita, a security guard from one of the bars in St. Petersburg. 
NER 
Example taken from the transcript of 
https://www.ted.com/talks/2089 
25/09/2014 
8 
PERSON 
FUNCTION 
LOCATION 
Category: type in the NER task. 
Natural Language Processing (NPL) Task  disambiguating URL in a knowledge base. 
E.g. http://dbpedia.org/resource/Saint_Petersburg.
Web Tools that use NER algorithms. 
Open APIs for research use. 
2 
3 
NER extractors 
1 
25/09/2014 
9
2 
3 
NERD 
1 
Compare performance of NER tools available on web. 
Unify the results of NER extractors in a common output. 
http://nerd.eurecom.fr/ 
25/09/2014 
10
2 
3 
NER extractors evaluation 
1 
DOCUMENTS ANALYZED: 5 short TED Talks NUMBER OF EVALUATORS: 1 STEPS OF EVALUATION: 
Selection of the meaningful concepts on the subtitles; 
Run of each extractor; 
Comparison of the results. 
25/09/2014 
11 
PRECISION: the fraction of retrieved documents that are relevant RECALL: is the fraction of relevant documents that are retrieved. F-MEASURE: is the level of accuracy considering both the Precision and the Recall
2 
3 
NER extractors evaluation 
1 
EXTRACTOR 
PRECISION 
RECALL 
F-MEASURE 
AlchemyAPI 
0,15 
0,03 
0,05147488928 
DataTXT 
0,21 
0,36 
0,2652521588 
DBpedia Spotlight 
0,14 
0,37 
0,1994140988 
Lupedia 
0,18 
0,02 
0,04389924763 
OpenCalais 
0,27 
0,09 
0,1347540544 
Saplo 
0,00 
0,00 
0 
Textrazor 
0,17 
0,40 
0,2416065311 
THD 
0,12 
0,05 
0,07485426603 
Wikimeta 
0,13 
0,08 
0,09514781377 
Yahoo! Content Analysis 
0,52 
0,13 
0,202927267 
Zemanta 
0,44 
0,18 
0,2511994999 
Combined 
0,11 
0,54 
0,1859774587 
25/09/2014 
12
http://www.w3.org/TR/media-frags/ 
2 
3 
A Media Fragment is a part of a multimedia object. 
Temporal Fragments 
sections along the time dimension of the media resource with a start and an end point. 
http://www.w3.org/TR/media-frags/ 
Media Fragments 
1 
25/09/2014 
13
2 
3 
TED Talks have paragraphs: 
a human-made subdivision of subtitles. 
MF creation: chapters 
1 
25/09/2014 
14
Extraction of topic from TextRazor and entities from NERD 
Clustering of consecutive chapters which talks about similar topics 
Filtering of those fragments based on annotation relevance 
2 
3 
MF creation: hot spots 
1 
The Hot Spots are those fragments whose relative relevance falls under the first quarter of the final score distribution. 
25/09/2014 
15
Research question 2 
how to recommend related media fragments within the same video collection 
1 
2 
3 
detect segments of interest in a video? 
recommend related media fragments within the same video collection? 
design a web application that provides a rich environment for exploring a video collection? 
HOW TO: 
25/09/2014 
16
1 
3 
A search engine is a system able to access to information previously stored and indexed. 
The search engine indexing is the process of collecting, parsing and storing data to make searches faster. 
We use it for indexing annotations in our database 
Search Engine indexing 
2 
25/09/2014 
17
1 
3 
Because they contain the meaning of the talk 
Because they contain some very useful attributes: 
timing references (startNPT and endNPT); 
uuid; 
relevance references. 
Annotation based index 
2 
WHY ANNOTATIONS? 
25/09/2014 
18 
WHICH ANNOTATIONS? Entities and Topics
1 
3 
ElasticSearch is an open-source search engine. 
It uses Apache Lucene for indexing. 
It aims to make full text search easy by hiding the complexities of Lucene behind a simple RESTful API. 
ElasticSearch 
2 
25/09/2014 
19
1 
3 
ElasticSearch provides a full Query DSL based on JSON to define queries. In general, there are basic queries such as term or prefix. 
HOW TO MAKE A QUERY 
25/09/2014 
20 
ElasticSearch 
2
1 
3 
Recommendation 
2 
Interlinking through chapters and topic 
Interlinking to openCourseware and openUniversity 
25/09/2014 
21
Research question 3 
how to recommend related media fragments within the same video collection 
1 
2 
3 
detect segments of interest in a video? 
recommend related media fragments within the same video collection? 
design a web application that provides a rich environment for exploring a video collection? 
HOW TO: 
25/09/2014 
22
1 
2 
Architecture 
3 
25/09/2014 
23
1 
2 
DEMO 
3 
25/09/2014 
24 
http://linkedtv.eurecom.fr/mediafragmentplayer
Conclusions 
25/09/2014 
25 
Evaluation of NER tools in the context of TED Talks 
HotSpot detection based on topics and entities 
Recommendation algorithm, hyperlinks between fragment of TED talks + external education resources 
Nice and responsive UI
Publications 
25/09/2014 
26 
HyperTED is one of the submitted app at the Challenge at LinkedUP - http://linkedup-challenge.org/ 
Jos辿 Luis Redondo Garc鱈a, Mariella Sabatino, Pasquale Lisena and Rapha谷l Troncy. 
Detecting Hot Spots in Web Videos. In International Semantic Web Conference (ISWC14), Demo

More Related Content

HyperTED - Searching and browsing through fragments of TED Talks

  • 1. Searching and browsing through fragments of TED Talks MARIELLA SABATINO mariella.sabatino@eurecom.fr GO! 25/09/2014 1
  • 2. TED is a global set of conferences, held throughout North America, Europe and Asia. TED Talks address a wide range of topics within the research and practice of science and culture. The speakers are given a maximum of 18 minutes to present their ideas in the most innovative and engaging way they can, often through storytelling. TED Talks 25/09/2014 2
  • 3. Problem Users are overwhelmed with audiovisual content Users browse fast, looking for topic of interest Which are the fragments potentially relevant without having to watch the entire video? It is very difficult to find interesting documents 25/09/2014 3
  • 4. Research questions how to recommend related media fragments within the same video collection 1 2 3 detect segments of interest in a video? recommend related media fragments within the same video collection? design a web application that provides a rich environment for exploring a video collection? HOW TO: 25/09/2014 4
  • 5. Browsing and recommendation of Media Fragments of TED Talks based on entities extracted in the subtitles Integration of the Media Fragments concept and the subtitles enrichment performed by NERD on a Node.js server HyperTED 25/09/2014 5
  • 6. Research question 1 how to recommend related media fragments within the same video collection 1 2 3 detect segments of interest in a video? recommend related media fragments within the same video collection? design a web application that provides a rich environment for exploring a video collection? HOW TO: 25/09/2014 6
  • 7. 2 3 What is a NER task? 1 Named Entity Recognition (NER) aims to locate and classify elements of textual document into pre-defined categories such as: People names; Organizations names; Places; Temporal and numerical expressions. These elements and the categories take respectively the name of entities and ontologies. 25/09/2014 7
  • 8. 2 3 For example 1 This is Nikita, a security guard from one of the bars in St. Petersburg. This is Nikita, a security guard from one of the bars in St. Petersburg. NER Example taken from the transcript of https://www.ted.com/talks/2089 25/09/2014 8 PERSON FUNCTION LOCATION Category: type in the NER task. Natural Language Processing (NPL) Task disambiguating URL in a knowledge base. E.g. http://dbpedia.org/resource/Saint_Petersburg.
  • 9. Web Tools that use NER algorithms. Open APIs for research use. 2 3 NER extractors 1 25/09/2014 9
  • 10. 2 3 NERD 1 Compare performance of NER tools available on web. Unify the results of NER extractors in a common output. http://nerd.eurecom.fr/ 25/09/2014 10
  • 11. 2 3 NER extractors evaluation 1 DOCUMENTS ANALYZED: 5 short TED Talks NUMBER OF EVALUATORS: 1 STEPS OF EVALUATION: Selection of the meaningful concepts on the subtitles; Run of each extractor; Comparison of the results. 25/09/2014 11 PRECISION: the fraction of retrieved documents that are relevant RECALL: is the fraction of relevant documents that are retrieved. F-MEASURE: is the level of accuracy considering both the Precision and the Recall
  • 12. 2 3 NER extractors evaluation 1 EXTRACTOR PRECISION RECALL F-MEASURE AlchemyAPI 0,15 0,03 0,05147488928 DataTXT 0,21 0,36 0,2652521588 DBpedia Spotlight 0,14 0,37 0,1994140988 Lupedia 0,18 0,02 0,04389924763 OpenCalais 0,27 0,09 0,1347540544 Saplo 0,00 0,00 0 Textrazor 0,17 0,40 0,2416065311 THD 0,12 0,05 0,07485426603 Wikimeta 0,13 0,08 0,09514781377 Yahoo! Content Analysis 0,52 0,13 0,202927267 Zemanta 0,44 0,18 0,2511994999 Combined 0,11 0,54 0,1859774587 25/09/2014 12
  • 13. http://www.w3.org/TR/media-frags/ 2 3 A Media Fragment is a part of a multimedia object. Temporal Fragments sections along the time dimension of the media resource with a start and an end point. http://www.w3.org/TR/media-frags/ Media Fragments 1 25/09/2014 13
  • 14. 2 3 TED Talks have paragraphs: a human-made subdivision of subtitles. MF creation: chapters 1 25/09/2014 14
  • 15. Extraction of topic from TextRazor and entities from NERD Clustering of consecutive chapters which talks about similar topics Filtering of those fragments based on annotation relevance 2 3 MF creation: hot spots 1 The Hot Spots are those fragments whose relative relevance falls under the first quarter of the final score distribution. 25/09/2014 15
  • 16. Research question 2 how to recommend related media fragments within the same video collection 1 2 3 detect segments of interest in a video? recommend related media fragments within the same video collection? design a web application that provides a rich environment for exploring a video collection? HOW TO: 25/09/2014 16
  • 17. 1 3 A search engine is a system able to access to information previously stored and indexed. The search engine indexing is the process of collecting, parsing and storing data to make searches faster. We use it for indexing annotations in our database Search Engine indexing 2 25/09/2014 17
  • 18. 1 3 Because they contain the meaning of the talk Because they contain some very useful attributes: timing references (startNPT and endNPT); uuid; relevance references. Annotation based index 2 WHY ANNOTATIONS? 25/09/2014 18 WHICH ANNOTATIONS? Entities and Topics
  • 19. 1 3 ElasticSearch is an open-source search engine. It uses Apache Lucene for indexing. It aims to make full text search easy by hiding the complexities of Lucene behind a simple RESTful API. ElasticSearch 2 25/09/2014 19
  • 20. 1 3 ElasticSearch provides a full Query DSL based on JSON to define queries. In general, there are basic queries such as term or prefix. HOW TO MAKE A QUERY 25/09/2014 20 ElasticSearch 2
  • 21. 1 3 Recommendation 2 Interlinking through chapters and topic Interlinking to openCourseware and openUniversity 25/09/2014 21
  • 22. Research question 3 how to recommend related media fragments within the same video collection 1 2 3 detect segments of interest in a video? recommend related media fragments within the same video collection? design a web application that provides a rich environment for exploring a video collection? HOW TO: 25/09/2014 22
  • 23. 1 2 Architecture 3 25/09/2014 23
  • 24. 1 2 DEMO 3 25/09/2014 24 http://linkedtv.eurecom.fr/mediafragmentplayer
  • 25. Conclusions 25/09/2014 25 Evaluation of NER tools in the context of TED Talks HotSpot detection based on topics and entities Recommendation algorithm, hyperlinks between fragment of TED talks + external education resources Nice and responsive UI
  • 26. Publications 25/09/2014 26 HyperTED is one of the submitted app at the Challenge at LinkedUP - http://linkedup-challenge.org/ Jos辿 Luis Redondo Garc鱈a, Mariella Sabatino, Pasquale Lisena and Rapha谷l Troncy. Detecting Hot Spots in Web Videos. In International Semantic Web Conference (ISWC14), Demo