際際滷

際際滷Share a Scribd company logo
ESBM: An Entity Summarization Benchmark
Qingxia Liu1, Gong Cheng1, Kalpa Gunaratna2, and Yuzhong Qu1
1 National Key Laboratory for Novel Software Technology, Nanjing University, China
2 Samsung Research America, Mountain View CA, USA
 Introduction
 Creating ESBM
 Analyzing ESBM
 Evaluating with ESBM
 Conclusion
2020.06 2
Outline
2020.06 3
Entity Summarization
<Tim Berners Lee, alias, TimBL>
<Tim Berners Lee, name, Tim Berners-Lee>
<Tim Berners Lee, givenName, Tim>
<Tim Berners Lee, birthYear, 1955>
<Tim Berners Lee, birthDate, 1955-06-08>
<Tim Berners Lee, birthPlace, England>
<Tim Berners Lee, birthPlace, London>
<Tim Berners Lee, type, People Educated At Emanuel School>
<Tim Berners Lee, type, Scientist>
<Tim Berners-Lee, child, Ben Berners-Lee>
<Tim Berners-Lee, child, Alice Berners-Lee>
<Conway Berners-Lee, child, Tim Berners-Lee>
<Weaving the Web, author, Tim Berners-Lee>
<Tabulator, author, Tim Berners-Lee>
<Paul Otlet, influenced, Tim Berners-Lee>
<John Postel, influenced, Tim Berners-Lee>
<World Wide Web, developer, Tim Berners-Lee>
<World Wide Web Foundation, foundedBy, Tim Berners-Lee>
<World Wide Web Foundation, keyPerson, Tim Berners-Lee><Tim Berners Lee, type, Living People>
<Tim Berners Lee, type, Person>
<Tim Berners Lee, type, Agent>
<Tim Berners-Lee, award, Royal Society>
<Tim Berners-Lee, award, Royal Academy of Engineering >
<Tim Berners-Lee, award, Order of Merit>
<Tim Berners-Lee, award, Royal Order of the British Empire>
<Tim Berners-Lee, spouse, Rosemary Leith>
<Tim Berners Lee, birthDate, 1955-06-08>
<Tim Berners Lee, birthPlace, England>
<Tim Berners Lee, type, Scientist>
<Tim Berners-Lee, award, Royal Society>
<World Wide Web, developer, Tim Berners-Lee>
Description of Tim Berners-Lee:
Summary:
 RDF Data: T
 triple tT: <subj, pred, obj>
 Entity Description: Desc(e)
 Desc(e) ={tT: subj(t)=e or obj(t)=e}
 triple tDesc(e): <e, property, value>
 values: class, entity, literal
 Entity Summarization (ES): S(e, k)
 SDesc(e) , |S|k
2020.06 4
Entity Summarization
Tim Berners-Lee
England
Scientist
Royal Society
Weaving the Web
Person
Paul Otlet
Tim Tim Berners-Lee
John Postel
1955-06-08
1955
valuesproperties
birthPlace type
type
author
influenced
influenced
name
givenName birthYear
birthDate
award
 Limitations
 Task specificness
 Single dataset
 Small size
 Triple incomprehensiveness
2020.06 5
Existing Benchmarks
1 http://yovisto.com/labs/iswc2012
2 http://wiki.knoesis.org/index.php/FACES
 Motivation
Research Challenges for Entity Summarization:
 Lack of good benchmarks
 Lack of evaluation efforts
 Contributions
 Created an Entity Sumarization Benchmark (ESBM v1.2)
 overcoming the limitations of existing benchmarks
 meeting the desiderata for a successful benchmark
 Evaluated entity summarizers with ESBM
 made the most extensive evaluation effort to date
 evaluated 9 existing general-purpose entity summarizers
 evaluated 1 supervised learning-based entity summarizer for reference
2020.06 6
Our Work
Creating ESBM
2020.06 7
 To satisfy seven desiderata for a successful benchmark[18]
 accessibility, affordability, clarity, relevance, solvability, portability, scalability
 To overcome limitations of available benchmarks
 General-purpose summaries
 Including class-, entity-, literal-valued triples
 Multiple datasets
 Currently largest available benchmark
2020.06 8
Design Goals
[18] Sim, S.E., Easterbrook, S.M., Holt, R.C.: Using benchmarking to advance research: A challenge to software engineering. In: ICSE 2003. pp. 74{83 (2003).
 Datasets
 DBpedia
 imported dump files: instance types, instance types transitive, YAGO types, mappingbased
literals, mappingbased objects, labels, images, homepages, persondata, geo coordinates
mappingbased, and article categories
 LinkedMDB
 removed triples: owl:sameAs
 Entities
sampled from seven large classes:
 DBpedia: Agent, Event, Location, Species, Work
 LinkedMDB: Film, Person
 Triples per entity
 By class: 25.88-52.44 triples
 Overall: 37.62 triples
2020.06 9
Entity Descriptions
2020.06 10
Ground-Truth Summaries
 Task
 30 users
 each assigned 35 entities
 175 entities
 each assigned to 6 users
 Each user created two
summaries for each entity
 for k=5 and k=10
 Total
 6 top-5 summaries
and 6 top-10 summaries
for each entity
 175*6*2=2100 ground-truth summaries
 Usage
 ESBM v1.2: specified training-validation-test splits for 5-fold cross validation
 Early versions: EYRE 2018 workshop, EYRE 2019 workshop
 Desiderata
 Accessibility: permanent identifier on w3id.org
 Affordability: open-source, example code for evaluation
 Clarity: documented clearly and concisely
 Relevance: entities sampled from real datasets
 Solvability: not trivial and not too difficult
 Portability: any general-purpose entity summarizer that can process RDF data
 Scalability: reasonably large and diverse to evaluate mature entity summarizers
2020.06 11
The ESBM Benchmark
Analyzing ESBM
2020.06 12
 175 entities, 6584 triples, 2100 ground-truth summaries
2020.06 13
Basic Statistics
Proportion of triples been selected into ground-truth summaries
Overlap: 4.91 triples
Top-5
summary
Top-10
summary
Overlap between top-5 and top-10 summaries
 Literal-valued triples constitute a large proportion in ground-truth summaries.
 30% in top-5 ground-truth summaries and 25% in top-10 summaries
 Participants are not inclined to select multiple values of a property.
 The average number of distinct properties in top-5 ground-truth summaries is 4.70 (very close to 5)
2020.06 14
Triple Composition
Three bars in each group: Entity descriptions, Top-5 ground-truth summaries, Top-10 ground-truth summaries
 Entity Description
 Jaccard similarity between property sets from each pair of classes is very low.
2020.06 15
Entity Heterogeneity
 Ground-truth Summaries
 Popular properties:
 properties that appear in >50% ground truth summaries for each class
 Only 1~2/13.24 properties are popular in top-5 ground-truth summaries
 The importance of properties is generally contextualized by concrete entities.
2020.06 16
Entity Heterogeneity
 Average overlap between 6 ground-truth summaries
 Moderate degree of agreement
 Comparable with those reported for other benchmarks
2020.06 17
Inter-Rater Agreement
[2] Cheng, G., Tran, T., Qu, Y.: RELIN: relatedness and informativeness-based centrality for entity summarization. In: ISWC 2011, Part I. pp. 114-129 (2011).
[7] Gunaratna, K., Thirunarayan, K., Sheth, A.P.: FACES: diversity-aware entity summarization using incremental hierarchical conceptual clustering. In: AAAI 2015. pp. 116-122 (2015).
[8] Gunaratna, K., Thirunarayan, K., Sheth, A.P., Cheng, G.: Gleaning types for literals in RDF triples with application to entity summarization. In: ESWC 2016. pp. 85-100 (2016).
Evaluating with ESBM
2020.06 18
 Existing Entity Summarizers
 RELIN, DIVERSUM, LinkSUM, FACES, FACES-E, CD
 MPSUM, BAFREC, KAFCA
 ORACLE Entity Summarizer
 k triples that are selected by the most participants into ground-truth summaries
 Supervised Learning-Based Entity Summarizer
 6 models:
 SMOreg, LinearRegression, MultilayerPerceptron, AdditiveRegression, REPTree,
RandomForest
 7 features:
 gfT(global frequency of property), lf(local frequency of property), vfT(frequency of value),
si(self-information of triple)
 isC(value is class), isE(value is entity), isL(value is literal)
2020.06 19
Participating Entity Summarizers
 Evaluation Criteria
 Sm: machine-generated entity summary
 Sh : human-made ground-truth summary
 PR if |Sm|<|Sh |=k
2020.06 20
Settings
Overall Results
2020.06 21
2020.06 22
Results on Different Entity Types
k=5
k=10
 F1 results
 RandomForest, REPTree
achieve the highest F1.
 Four methods
outperform all the
existing entity
summarizers.
 Two methods only fail to
outperform existing
entity summarizers in one
setting.
2020.06 23
Results of Supervised Learning
Demonstrated the powerfulness of supervised learning for entity summarization.
2020.06 24
Results of Supervised Learning
 Features
for each t=<e, p,v> in Desc(e):
 gfT: # triples in the dataset where p appears
 lf: # triples in Desc(e) where p appears
 vfT: # triples in dataset where v appears
 si: self-information of triple t
 isC: whether v is a class
 isE: whether v is an entity
 isL: whether v is a literal
 Results
 significantly effective: gfT, lf
 for LinkedMDB: vfT, si
 not significant: isC, isE, isL
 Existing entity summarizers
 Leading systems: BAFREC, MPSUM
 Supervised Learning method
 Outperforms existing entity summarizers
 Comparing with ORACLE
 Still a large gap for improvement
2020.06 25
Summary of Evaluation Results
Entity summarization on ESBM is a non-trivial task.
Conclusion
2020.06 26
 Evaluation Criteria
 semantic overlap between triples
 Representativeness of Ground Truth
 general-purpose VS. task-specific
 Form of Ground Truth
 set-based VS. scoring-based
2020.06 27
Limitations
 Contributions
 Created an Entity Summarization Benchmark: ESBM
 overcoming the limitations of existing benchmarks
 Evaluated entity summarizers with ESBM
 the most extensive evaluation effort to date
 ESBM
 The currently largest available benchmark for entity summarization
 Entity summarization on ESBM is a non-trivial task
 Permanent link: https://w3id.org/esbm/
 GitHub repository: nju-websoft/ESBM
2020.06 28
Take-home Message
 An Upcoming Paper
 Junyou Li, Gong Cheng, Qingxia Liu, Wen Zhang, Evgeny Kharlamov, Kalpa Gunaratna, Huajun Chen.
Neural Entity Summarization with Joint Encoding and Weak Supervision.
IJCAI-PRICAI 2020
 Deep learning based
 Significantly outperformed all the existing systems on ESBM
Thank you !
Questions ?
2020.06 29
 Contributions
 Created an Entity Summarization Benchmark: ESBM
 overcoming the limitations of existing benchmarks
 Evaluated entity summarizers with ESBM
 the most extensive evaluation effort to date
 ESBM
 The currently largest available benchmark for entity summarization
 Entity summarization on ESBM is a non-trivial task
 Permanent link: https://w3id.org/esbm/
 GitHub repository: nju-websoft/ESBM
2020.06 30
Take-home Message
 An Upcoming Paper
 Junyou Li, Gong Cheng, Qingxia Liu, Wen Zhang, Evgeny Kharlamov, Kalpa Gunaratna, Huajun Chen.
Neural Entity Summarization with Joint Encoding and Weak Supervision.
IJCAI-PRICAI 2020
 Deep learning based
 Significantly outperformed all the existing systems on ESBM

More Related Content

Similar to ESBM: An Entity Summarization Benchmark (ESWC 2020) (20)

REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
SaravanaD2
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
pathsproject
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust  as a Proxy Measure for the Quality of VGI in the Case of OSMTrust  as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
Carsten Keler
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
The Hebrew University of Jerusalem
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Lucidworks
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Sotiris Beis
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
Allen Wu
Planning tools and techniques management
Planning tools and techniques managementPlanning tools and techniques management
Planning tools and techniques management
fizza tanvir
Planning tool and technique \\ Principle of managment
Planning tool and technique \\ Principle of managmentPlanning tool and technique \\ Principle of managment
Planning tool and technique \\ Principle of managment
mmuhammadzulfqar5
Robbins9 ppt09
Robbins9 ppt09Robbins9 ppt09
Robbins9 ppt09
JaveriaSiddiqui12
Knowledge Graph Curation: A Practical Framework
Knowledge Graph Curation: A Practical FrameworkKnowledge Graph Curation: A Practical Framework
Knowledge Graph Curation: A Practical Framework
Elwin Huaman
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group Project
Erik Bebernes
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
Kavita Ganesan
Text Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 KimelfeldText Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 Kimelfeld
Pedro Contreras Flores
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Hendri Karisma
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
Andre Freitas
Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clustering
biagiolicari7
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
SSSW
REVIEW PPT.pptx
REVIEW PPT.pptxREVIEW PPT.pptx
REVIEW PPT.pptx
SaravanaD2
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
pathsproject
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust  as a Proxy Measure for the Quality of VGI in the Case of OSMTrust  as a Proxy Measure for the Quality of VGI in the Case of OSM
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
Carsten Keler
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Lucidworks
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Sotiris Beis
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
Allen Wu
Planning tools and techniques management
Planning tools and techniques managementPlanning tools and techniques management
Planning tools and techniques management
fizza tanvir
Planning tool and technique \\ Principle of managment
Planning tool and technique \\ Principle of managmentPlanning tool and technique \\ Principle of managment
Planning tool and technique \\ Principle of managment
mmuhammadzulfqar5
Knowledge Graph Curation: A Practical Framework
Knowledge Graph Curation: A Practical FrameworkKnowledge Graph Curation: A Practical Framework
Knowledge Graph Curation: A Practical Framework
Elwin Huaman
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group Project
Erik Bebernes
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
Kavita Ganesan
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Hendri Karisma
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
Andre Freitas
Benchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For ClusteringBenchmarking Automated Machine Learning For Clustering
Benchmarking Automated Machine Learning For Clustering
biagiolicari7
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
SSSW

Recently uploaded (20)

Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...
Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...
Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...
Amit Jaspal
Time management tools for presentations/quiz/discussions
Time management tools for presentations/quiz/discussionsTime management tools for presentations/quiz/discussions
Time management tools for presentations/quiz/discussions
adebayoekosanmi
Ukraines European Integration and Elections in EU countries. Informational p...
Ukraines European Integration and Elections in EU countries. Informational p...Ukraines European Integration and Elections in EU countries. Informational p...
Ukraines European Integration and Elections in EU countries. Informational p...
UkraineCrisisMediaCenter
Mobile, Alabama Population- Growth, Trends & Insights.pdf
Mobile, Alabama Population- Growth, Trends & Insights.pdfMobile, Alabama Population- Growth, Trends & Insights.pdf
Mobile, Alabama Population- Growth, Trends & Insights.pdf
localha1230
Chapter 13 Group Behavior, Teams and Conflict.pptx
Chapter  13 Group Behavior, Teams and Conflict.pptxChapter  13 Group Behavior, Teams and Conflict.pptx
Chapter 13 Group Behavior, Teams and Conflict.pptx
vinusbragil1
Research Report on International Students Presentation
Research Report on International Students PresentationResearch Report on International Students Presentation
Research Report on International Students Presentation
al20303
Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)
Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)
Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)
hemish04082005
Play-it-Forward: A Sustainable Sports-Initiative in Zambia
Play-it-Forward: A Sustainable Sports-Initiative in ZambiaPlay-it-Forward: A Sustainable Sports-Initiative in Zambia
Play-it-Forward: A Sustainable Sports-Initiative in Zambia
renubharathi99
Capitalisation in history and its fundamentals
Capitalisation in history and its fundamentalsCapitalisation in history and its fundamentals
Capitalisation in history and its fundamentals
chikuniem
Presentation.pptx is instument of company
Presentation.pptx is instument of companyPresentation.pptx is instument of company
Presentation.pptx is instument of company
bhavyasingh13404
Aquatic Mamalsdddddddddddddcholadeck.pptx
Aquatic Mamalsdddddddddddddcholadeck.pptxAquatic Mamalsdddddddddddddcholadeck.pptx
Aquatic Mamalsdddddddddddddcholadeck.pptx
sarveshsinghbhati
Criminal Profiling in forensic psychology pptx
Criminal Profiling in forensic psychology pptxCriminal Profiling in forensic psychology pptx
Criminal Profiling in forensic psychology pptx
nastaran31
case presentation for pneumonia rle group 2
case presentation for pneumonia rle group 2case presentation for pneumonia rle group 2
case presentation for pneumonia rle group 2
MeegsEstabillo2
Cause and Effect PowerPoint Presentation
Cause and Effect PowerPoint PresentationCause and Effect PowerPoint Presentation
Cause and Effect PowerPoint Presentation
TamaraCarey1
LESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjj
LESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjjLESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjj
LESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjj
jamelagasmin
Small case assignment for Public Relations
Small case assignment for Public RelationsSmall case assignment for Public Relations
Small case assignment for Public Relations
rp12751
Domino's Public Relations Crisis of 2009
Domino's Public Relations Crisis of 2009Domino's Public Relations Crisis of 2009
Domino's Public Relations Crisis of 2009
jk18647
2017-Graduation-Powerpoint-Presentation-1.ppt
2017-Graduation-Powerpoint-Presentation-1.ppt2017-Graduation-Powerpoint-Presentation-1.ppt
2017-Graduation-Powerpoint-Presentation-1.ppt
Sreeram212519
Do Not Be a Victim of Identity Theft 3.30.25.pptx
Do Not Be a Victim of Identity Theft 3.30.25.pptxDo Not Be a Victim of Identity Theft 3.30.25.pptx
Do Not Be a Victim of Identity Theft 3.30.25.pptx
FamilyWorshipCenterD
Public Relations Research Presentation 2024
Public Relations Research Presentation 2024Public Relations Research Presentation 2024
Public Relations Research Presentation 2024
al20303
Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...
Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...
Deconstructing Popularity Bias in Recommender Systems_ Origins, Impacts, and ...
Amit Jaspal
Time management tools for presentations/quiz/discussions
Time management tools for presentations/quiz/discussionsTime management tools for presentations/quiz/discussions
Time management tools for presentations/quiz/discussions
adebayoekosanmi
Ukraines European Integration and Elections in EU countries. Informational p...
Ukraines European Integration and Elections in EU countries. Informational p...Ukraines European Integration and Elections in EU countries. Informational p...
Ukraines European Integration and Elections in EU countries. Informational p...
UkraineCrisisMediaCenter
Mobile, Alabama Population- Growth, Trends & Insights.pdf
Mobile, Alabama Population- Growth, Trends & Insights.pdfMobile, Alabama Population- Growth, Trends & Insights.pdf
Mobile, Alabama Population- Growth, Trends & Insights.pdf
localha1230
Chapter 13 Group Behavior, Teams and Conflict.pptx
Chapter  13 Group Behavior, Teams and Conflict.pptxChapter  13 Group Behavior, Teams and Conflict.pptx
Chapter 13 Group Behavior, Teams and Conflict.pptx
vinusbragil1
Research Report on International Students Presentation
Research Report on International Students PresentationResearch Report on International Students Presentation
Research Report on International Students Presentation
al20303
Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)
Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)
Solution Challenge Submission Sprint 2025 GDG on Campus MM(DU)
hemish04082005
Play-it-Forward: A Sustainable Sports-Initiative in Zambia
Play-it-Forward: A Sustainable Sports-Initiative in ZambiaPlay-it-Forward: A Sustainable Sports-Initiative in Zambia
Play-it-Forward: A Sustainable Sports-Initiative in Zambia
renubharathi99
Capitalisation in history and its fundamentals
Capitalisation in history and its fundamentalsCapitalisation in history and its fundamentals
Capitalisation in history and its fundamentals
chikuniem
Presentation.pptx is instument of company
Presentation.pptx is instument of companyPresentation.pptx is instument of company
Presentation.pptx is instument of company
bhavyasingh13404
Aquatic Mamalsdddddddddddddcholadeck.pptx
Aquatic Mamalsdddddddddddddcholadeck.pptxAquatic Mamalsdddddddddddddcholadeck.pptx
Aquatic Mamalsdddddddddddddcholadeck.pptx
sarveshsinghbhati
Criminal Profiling in forensic psychology pptx
Criminal Profiling in forensic psychology pptxCriminal Profiling in forensic psychology pptx
Criminal Profiling in forensic psychology pptx
nastaran31
case presentation for pneumonia rle group 2
case presentation for pneumonia rle group 2case presentation for pneumonia rle group 2
case presentation for pneumonia rle group 2
MeegsEstabillo2
Cause and Effect PowerPoint Presentation
Cause and Effect PowerPoint PresentationCause and Effect PowerPoint Presentation
Cause and Effect PowerPoint Presentation
TamaraCarey1
LESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjj
LESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjjLESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjj
LESSON 9 ERICH FROMM HUMANISTIC.jjjjjjjj
jamelagasmin
Small case assignment for Public Relations
Small case assignment for Public RelationsSmall case assignment for Public Relations
Small case assignment for Public Relations
rp12751
Domino's Public Relations Crisis of 2009
Domino's Public Relations Crisis of 2009Domino's Public Relations Crisis of 2009
Domino's Public Relations Crisis of 2009
jk18647
2017-Graduation-Powerpoint-Presentation-1.ppt
2017-Graduation-Powerpoint-Presentation-1.ppt2017-Graduation-Powerpoint-Presentation-1.ppt
2017-Graduation-Powerpoint-Presentation-1.ppt
Sreeram212519
Do Not Be a Victim of Identity Theft 3.30.25.pptx
Do Not Be a Victim of Identity Theft 3.30.25.pptxDo Not Be a Victim of Identity Theft 3.30.25.pptx
Do Not Be a Victim of Identity Theft 3.30.25.pptx
FamilyWorshipCenterD
Public Relations Research Presentation 2024
Public Relations Research Presentation 2024Public Relations Research Presentation 2024
Public Relations Research Presentation 2024
al20303

ESBM: An Entity Summarization Benchmark (ESWC 2020)

  • 1. ESBM: An Entity Summarization Benchmark Qingxia Liu1, Gong Cheng1, Kalpa Gunaratna2, and Yuzhong Qu1 1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 Samsung Research America, Mountain View CA, USA
  • 2. Introduction Creating ESBM Analyzing ESBM Evaluating with ESBM Conclusion 2020.06 2 Outline
  • 3. 2020.06 3 Entity Summarization <Tim Berners Lee, alias, TimBL> <Tim Berners Lee, name, Tim Berners-Lee> <Tim Berners Lee, givenName, Tim> <Tim Berners Lee, birthYear, 1955> <Tim Berners Lee, birthDate, 1955-06-08> <Tim Berners Lee, birthPlace, England> <Tim Berners Lee, birthPlace, London> <Tim Berners Lee, type, People Educated At Emanuel School> <Tim Berners Lee, type, Scientist> <Tim Berners-Lee, child, Ben Berners-Lee> <Tim Berners-Lee, child, Alice Berners-Lee> <Conway Berners-Lee, child, Tim Berners-Lee> <Weaving the Web, author, Tim Berners-Lee> <Tabulator, author, Tim Berners-Lee> <Paul Otlet, influenced, Tim Berners-Lee> <John Postel, influenced, Tim Berners-Lee> <World Wide Web, developer, Tim Berners-Lee> <World Wide Web Foundation, foundedBy, Tim Berners-Lee> <World Wide Web Foundation, keyPerson, Tim Berners-Lee><Tim Berners Lee, type, Living People> <Tim Berners Lee, type, Person> <Tim Berners Lee, type, Agent> <Tim Berners-Lee, award, Royal Society> <Tim Berners-Lee, award, Royal Academy of Engineering > <Tim Berners-Lee, award, Order of Merit> <Tim Berners-Lee, award, Royal Order of the British Empire> <Tim Berners-Lee, spouse, Rosemary Leith> <Tim Berners Lee, birthDate, 1955-06-08> <Tim Berners Lee, birthPlace, England> <Tim Berners Lee, type, Scientist> <Tim Berners-Lee, award, Royal Society> <World Wide Web, developer, Tim Berners-Lee> Description of Tim Berners-Lee: Summary:
  • 4. RDF Data: T triple tT: <subj, pred, obj> Entity Description: Desc(e) Desc(e) ={tT: subj(t)=e or obj(t)=e} triple tDesc(e): <e, property, value> values: class, entity, literal Entity Summarization (ES): S(e, k) SDesc(e) , |S|k 2020.06 4 Entity Summarization Tim Berners-Lee England Scientist Royal Society Weaving the Web Person Paul Otlet Tim Tim Berners-Lee John Postel 1955-06-08 1955 valuesproperties birthPlace type type author influenced influenced name givenName birthYear birthDate award
  • 5. Limitations Task specificness Single dataset Small size Triple incomprehensiveness 2020.06 5 Existing Benchmarks 1 http://yovisto.com/labs/iswc2012 2 http://wiki.knoesis.org/index.php/FACES
  • 6. Motivation Research Challenges for Entity Summarization: Lack of good benchmarks Lack of evaluation efforts Contributions Created an Entity Sumarization Benchmark (ESBM v1.2) overcoming the limitations of existing benchmarks meeting the desiderata for a successful benchmark Evaluated entity summarizers with ESBM made the most extensive evaluation effort to date evaluated 9 existing general-purpose entity summarizers evaluated 1 supervised learning-based entity summarizer for reference 2020.06 6 Our Work
  • 8. To satisfy seven desiderata for a successful benchmark[18] accessibility, affordability, clarity, relevance, solvability, portability, scalability To overcome limitations of available benchmarks General-purpose summaries Including class-, entity-, literal-valued triples Multiple datasets Currently largest available benchmark 2020.06 8 Design Goals [18] Sim, S.E., Easterbrook, S.M., Holt, R.C.: Using benchmarking to advance research: A challenge to software engineering. In: ICSE 2003. pp. 74{83 (2003).
  • 9. Datasets DBpedia imported dump files: instance types, instance types transitive, YAGO types, mappingbased literals, mappingbased objects, labels, images, homepages, persondata, geo coordinates mappingbased, and article categories LinkedMDB removed triples: owl:sameAs Entities sampled from seven large classes: DBpedia: Agent, Event, Location, Species, Work LinkedMDB: Film, Person Triples per entity By class: 25.88-52.44 triples Overall: 37.62 triples 2020.06 9 Entity Descriptions
  • 10. 2020.06 10 Ground-Truth Summaries Task 30 users each assigned 35 entities 175 entities each assigned to 6 users Each user created two summaries for each entity for k=5 and k=10 Total 6 top-5 summaries and 6 top-10 summaries for each entity 175*6*2=2100 ground-truth summaries
  • 11. Usage ESBM v1.2: specified training-validation-test splits for 5-fold cross validation Early versions: EYRE 2018 workshop, EYRE 2019 workshop Desiderata Accessibility: permanent identifier on w3id.org Affordability: open-source, example code for evaluation Clarity: documented clearly and concisely Relevance: entities sampled from real datasets Solvability: not trivial and not too difficult Portability: any general-purpose entity summarizer that can process RDF data Scalability: reasonably large and diverse to evaluate mature entity summarizers 2020.06 11 The ESBM Benchmark
  • 13. 175 entities, 6584 triples, 2100 ground-truth summaries 2020.06 13 Basic Statistics Proportion of triples been selected into ground-truth summaries Overlap: 4.91 triples Top-5 summary Top-10 summary Overlap between top-5 and top-10 summaries
  • 14. Literal-valued triples constitute a large proportion in ground-truth summaries. 30% in top-5 ground-truth summaries and 25% in top-10 summaries Participants are not inclined to select multiple values of a property. The average number of distinct properties in top-5 ground-truth summaries is 4.70 (very close to 5) 2020.06 14 Triple Composition Three bars in each group: Entity descriptions, Top-5 ground-truth summaries, Top-10 ground-truth summaries
  • 15. Entity Description Jaccard similarity between property sets from each pair of classes is very low. 2020.06 15 Entity Heterogeneity
  • 16. Ground-truth Summaries Popular properties: properties that appear in >50% ground truth summaries for each class Only 1~2/13.24 properties are popular in top-5 ground-truth summaries The importance of properties is generally contextualized by concrete entities. 2020.06 16 Entity Heterogeneity
  • 17. Average overlap between 6 ground-truth summaries Moderate degree of agreement Comparable with those reported for other benchmarks 2020.06 17 Inter-Rater Agreement [2] Cheng, G., Tran, T., Qu, Y.: RELIN: relatedness and informativeness-based centrality for entity summarization. In: ISWC 2011, Part I. pp. 114-129 (2011). [7] Gunaratna, K., Thirunarayan, K., Sheth, A.P.: FACES: diversity-aware entity summarization using incremental hierarchical conceptual clustering. In: AAAI 2015. pp. 116-122 (2015). [8] Gunaratna, K., Thirunarayan, K., Sheth, A.P., Cheng, G.: Gleaning types for literals in RDF triples with application to entity summarization. In: ESWC 2016. pp. 85-100 (2016).
  • 19. Existing Entity Summarizers RELIN, DIVERSUM, LinkSUM, FACES, FACES-E, CD MPSUM, BAFREC, KAFCA ORACLE Entity Summarizer k triples that are selected by the most participants into ground-truth summaries Supervised Learning-Based Entity Summarizer 6 models: SMOreg, LinearRegression, MultilayerPerceptron, AdditiveRegression, REPTree, RandomForest 7 features: gfT(global frequency of property), lf(local frequency of property), vfT(frequency of value), si(self-information of triple) isC(value is class), isE(value is entity), isL(value is literal) 2020.06 19 Participating Entity Summarizers
  • 20. Evaluation Criteria Sm: machine-generated entity summary Sh : human-made ground-truth summary PR if |Sm|<|Sh |=k 2020.06 20 Settings
  • 22. 2020.06 22 Results on Different Entity Types k=5 k=10
  • 23. F1 results RandomForest, REPTree achieve the highest F1. Four methods outperform all the existing entity summarizers. Two methods only fail to outperform existing entity summarizers in one setting. 2020.06 23 Results of Supervised Learning Demonstrated the powerfulness of supervised learning for entity summarization.
  • 24. 2020.06 24 Results of Supervised Learning Features for each t=<e, p,v> in Desc(e): gfT: # triples in the dataset where p appears lf: # triples in Desc(e) where p appears vfT: # triples in dataset where v appears si: self-information of triple t isC: whether v is a class isE: whether v is an entity isL: whether v is a literal Results significantly effective: gfT, lf for LinkedMDB: vfT, si not significant: isC, isE, isL
  • 25. Existing entity summarizers Leading systems: BAFREC, MPSUM Supervised Learning method Outperforms existing entity summarizers Comparing with ORACLE Still a large gap for improvement 2020.06 25 Summary of Evaluation Results Entity summarization on ESBM is a non-trivial task.
  • 27. Evaluation Criteria semantic overlap between triples Representativeness of Ground Truth general-purpose VS. task-specific Form of Ground Truth set-based VS. scoring-based 2020.06 27 Limitations
  • 28. Contributions Created an Entity Summarization Benchmark: ESBM overcoming the limitations of existing benchmarks Evaluated entity summarizers with ESBM the most extensive evaluation effort to date ESBM The currently largest available benchmark for entity summarization Entity summarization on ESBM is a non-trivial task Permanent link: https://w3id.org/esbm/ GitHub repository: nju-websoft/ESBM 2020.06 28 Take-home Message An Upcoming Paper Junyou Li, Gong Cheng, Qingxia Liu, Wen Zhang, Evgeny Kharlamov, Kalpa Gunaratna, Huajun Chen. Neural Entity Summarization with Joint Encoding and Weak Supervision. IJCAI-PRICAI 2020 Deep learning based Significantly outperformed all the existing systems on ESBM
  • 29. Thank you ! Questions ? 2020.06 29
  • 30. Contributions Created an Entity Summarization Benchmark: ESBM overcoming the limitations of existing benchmarks Evaluated entity summarizers with ESBM the most extensive evaluation effort to date ESBM The currently largest available benchmark for entity summarization Entity summarization on ESBM is a non-trivial task Permanent link: https://w3id.org/esbm/ GitHub repository: nju-websoft/ESBM 2020.06 30 Take-home Message An Upcoming Paper Junyou Li, Gong Cheng, Qingxia Liu, Wen Zhang, Evgeny Kharlamov, Kalpa Gunaratna, Huajun Chen. Neural Entity Summarization with Joint Encoding and Weak Supervision. IJCAI-PRICAI 2020 Deep learning based Significantly outperformed all the existing systems on ESBM