際際滷

際際滷Share a Scribd company logo
Emir Mu単oz 
Fujitsu (Ireland) Limited 
National University of Ireland Galway 
LD4IE 2014 @ ISWC, Riva del Garda, Trentino, Italy. Oct 20th, 2014 
http://bit.ly/1xYTR6Z 
(@emir_munoz)
2
<subject, predicate, object> Domain(predicate)  ?? Range(predicate)  ?? 
3
select distinct ?obj where 
{?sub <http://dbpedia.org/property/isbn> ?obj} 
Lets run the following SPARQL query over endpoint 
And some more ... 
The endpoint response is a table with the values for the isbn property: 
So, what is the correct range for ? 
4 
0 71090 6176526 2 2.7073 140043853 1107020697 2940013968264 0978-02-02+02:00 http://dbpedia.org/resource/N/a "?"@en "ISBN 0-312-85182-0"@en "See text"@en "various"@en 
"ISBN 978-0-465-02656-2, ISBN 0-14-017997-6"@en 
"ISBN 0-553-07875-5 & ISBN 0-553-56166-9"@en 
"The Claiming of Sleeping Beauty: ISBN 0-452-26656-4"@en 
"-2.0"^^<http://dbpedia.org/datatype/second> 
"TBA"@en 
"not available"@en 
"[[#Bibliography"@en
LOV Statistics (by July 7th, 2014): 
446 vocabularies 
10 classes and 20 properties in average 
5 
range of isbn is http://schema.org/Text
but still, is it what Im looking for? what is the syntax? 
6
Etymology 
apo- + apsis 
Noun 
apoapsis (plural apoapsides) 
(astronomy) The point of a body's elliptical orbit about the system's centre of mass where the distance between the body and the centre of mass is at its maximum. 
Property: apoapsis 
[http://en.wiktionary.org/wiki/apoapsis] 
Earth 
Satellite 
dbr:17049_Miron dbo:apoapsis 4.01288e+11 
7
8 
https://github.com/dbpedia/extraction-framework/blob/master/ core/src/main/scala/org/dbpedia/extraction/ontology/OntologyDatatypes.scala
<subject, predicate, object> 
1488-07-28+02:00 
"September 2012"@en 
"--08-26+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
1982-05-23+02:00 
"August 2012"@en 
"--01-24+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
2007-04-11+02:00 
"July 2009"@en 
"--06-11+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 
Lerman et al. (JAIR 2003) 
First column: 
[NUM-NUM-NUM+NUM:NUM] (plain literal) 
Second column: 
[ALPHA<space>NUM] (plain literal + lang) 
Third column: 
[--NUM-NUM+NUM:NUM] (typed literal) 
<http://dbpedia.org/property/date> 
9
Let be the set of content patterns. 
Lerman et al. (JAIR 2003) 
More specific categories 
For the input set: 
That generates the following patterns: 
Values are decomposed in tokens, and 
each token is represented by a syntactic 
class. 
10
2.4 billion RDF triples 
53,230 properties 
Version 3.9 
Split 
Method 
19.25% plain literals 
18.02% typed literals 
62.73% without lang or datatype (xsd:string) 
11
For apoapsis example, we extracted one pattern 
And we also found some other related properties: 
For date example, we extracted 7 patterns 
http://dbpedia.org/ontology/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/ontology/Planet/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/ontology/Spacecraft/apoapsis LARGE/FLOAT_NUMBER 1.0 
http://dbpedia.org/property/apoapsis NUMBER 0.9230769230769231 
http://dbpedia.org/property/apoapsis LARGE/FLOAT_NUMBER 0.75213675 
http://dbpedia.org/property/date -- SMALL_NUMBER - SMALL_NUMBER 0.2 
http://dbpedia.org/property/date ALPHANUMERIC MEDIUM_NUMBER 0.166 
http://dbpedia.org/property/date ALPHANUMERIC 2012 0.032 
http://dbpedia.org/property/date ALPHANUMERIC.ALPHANUMERIC 0.012 
And more  
12
The user has this value: 2014-10-20. 
What property can he use? 
dbp:dateCreated, dbp:dateOfProduction, dbp:dateOpened, dbp:dateSigned, dbp:dateOfPremiere, dbp:date, among others. 
What is the property dbp:admCtrOf used for? 
"town of republic significance of Meleuz"@en (http://dbpedia.org/resource/Meleuz) 
"town of oblast significance of Oktyabrsk"@en (http://dbpedia.org/resource/Oktyabrsk) 
"town of republic significance of Sortavala"@en (http://dbpedia.org/resource/Sortavala) 
э it is used to declare Administrative Control Of 
13
Check for atypical values (outliers) 
Close look into the most (in)frequent patterns 
Possible errors during automatic extraction 
For the dbp:isbn property we can find the following values: 
"summer or autumn 380"@en 
"Late November"@en 
"Fall 1040"@en 
680 
"December, 67 BC"@en 
"April-July 1799"@en 
http://dbpedia.org/resource/New_Year's_Day 
http://dbpedia.org/resource/Second_Intermediate_Period_of_Egypt 
"New moon day of Kartika, celebrations begin two days prior and end two days after that date"@en 
Are they orvalues? 
14
E-mail: user1@domain.com 
Given name: John 
Surname: Snow 
Birthday: 1986-02-14 
A vCard, may be annotated with microformat hCard 
LD4IE Challenge 2014 
vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . ALL_LOWERCASE 0.82 
vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . com 0.69 
vcard:email mailto : ALPHA @ ALPHANUMERIC . ALL_LOWERCASE 0.54 
vcard:email mailto : ALPHA @ ALPHANUMERIC . com 0.46 
vcard:email mailto : ALL_UPPERCASE ****@ ALL_LOWERCASE . ALL_LOWERCASE 0.36 
We can use our database to extract and validate the email: 
vcard:bday NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 
vcard:bday MEDIUM_NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 
also the birthday 
15
Extraction of lexico-syntactic patterns from LD datasets 
Different use cases: 
Search for properties 
Validation of values 
Information extraction based on patterns 
Future work: 
Study of consistency analysis of knowledge bases 
Extension of patterns to cover other knowledge bases 
Among others 
16 
500,000 content patterns
http://emunoz.org 
@emir_munoz 
Emir.Munoz@ie.fujistu.com 
https://github.com/emir-munoz/ld-patterns/

More Related Content

Viewers also liked (12)

DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
Emir Mu単oz
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data Modelling
Emir Mu単oz
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
Emir Mu単oz
Sell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreSell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStore
Robert Douglass
A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
Emir Mu単oz
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
Emir Mu単oz
Why contributing to Drupal is awesome
Why contributing to Drupal is awesomeWhy contributing to Drupal is awesome
Why contributing to Drupal is awesome
Robert Douglass
The Business of Drupal
The Business of DrupalThe Business of Drupal
The Business of Drupal
Robert Douglass
Drupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingDrupal and Interactive Digital Marketing
Drupal and Interactive Digital Marketing
Robert Douglass
ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"
Robert Douglass
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
Robert Douglass
Surface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road AheadSurface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road Ahead
Harshvardhan Singh Chauhan
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data Modelling
Emir Mu単oz
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
Emir Mu単oz
Sell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStoreSell your code: Announcing the DroopyAppStore
Sell your code: Announcing the DroopyAppStore
Robert Douglass
A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
Emir Mu単oz
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
Emir Mu単oz
Why contributing to Drupal is awesome
Why contributing to Drupal is awesomeWhy contributing to Drupal is awesome
Why contributing to Drupal is awesome
Robert Douglass
The Business of Drupal
The Business of DrupalThe Business of Drupal
The Business of Drupal
Robert Douglass
Drupal and Interactive Digital Marketing
Drupal and Interactive Digital MarketingDrupal and Interactive Digital Marketing
Drupal and Interactive Digital Marketing
Robert Douglass
ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"ApacheSolr presentation from "Do it With Drupal"
ApacheSolr presentation from "Do it With Drupal"
Robert Douglass
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
Robert Douglass
Surface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road AheadSurface Care Supremacy of Harpic & Road Ahead
Surface Care Supremacy of Harpic & Road Ahead
Harshvardhan Singh Chauhan

Similar to Learning Content Patterns from Linked Data (20)

SWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQLSWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQL
Mariano Rodriguez-Muro
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Anastasios Skarlatidis
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
Sakthivel C R
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Equipex Biblissima
Craig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchCraig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearch
imarcticblue
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
zouzias
Snmp class
Snmp classSnmp class
Snmp class
aduitsis
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
Enrico Daga
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
John Kunze
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
Scala+data
Scala+dataScala+data
Scala+data
Samir Bessalah
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
Subhas Kumar Ghosh
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
Data Con LA
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Lucidworks
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
Dimitris Kontokostas
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
Krishna Sankar
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
YahooTechConference
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
Martin Odersky
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
Sakthivel C R
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Roman du Mont Saint-Michel: Biblissima's case study with the University of Ca...
Equipex Biblissima
Craig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearchCraig Brown speaks on ElasticSearch
Craig Brown speaks on ElasticSearch
imarcticblue
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
zouzias
Snmp class
Snmp classSnmp class
Snmp class
aduitsis
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
Enrico Daga
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
John Kunze
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
Data Con LA
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Lucidworks
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
Dimitris Kontokostas
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
Krishna Sankar
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
YahooTechConference
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
Martin Odersky
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan

Recently uploaded (20)

GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptxGLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
KunalBhadana3
2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf
2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf
2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf
pbavila
Presentation_DM_applications for another services
Presentation_DM_applications for another servicesPresentation_DM_applications for another services
Presentation_DM_applications for another services
aldowilmeryapita
Implications of Blockchain Technology in Agri-Food Supply Chains
Implications of Blockchain Technology in Agri-Food Supply ChainsImplications of Blockchain Technology in Agri-Food Supply Chains
Implications of Blockchain Technology in Agri-Food Supply Chains
Soumya Mohapatra
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1
PostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.pptPostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.ppt
LonJames2
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwdENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
shekainahrosej
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
ARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider RegistryARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider Registry
Allen Shaw
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihfSTS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
TristanEvasco
Introduction to Microsoft Power BI is a business analytics service
Introduction to Microsoft Power BI is a business analytics serviceIntroduction to Microsoft Power BI is a business analytics service
Introduction to Microsoft Power BI is a business analytics service
Kongu Engineering College, Perundurai, Erode
Pr辿sentation did辿e id辿e pour faire un projet
Pr辿sentation did辿e id辿e pour faire un projetPr辿sentation did辿e id辿e pour faire un projet
Pr辿sentation did辿e id辿e pour faire un projet
tahatraval88
Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....
Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....
Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....
kiranprava2002
Drillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptxDrillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptx
singhsanjays2107
SCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffv
SCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffvSCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffv
SCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffv
TristanEvasco
Model Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartizationModel Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartization
Antons Kranga
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
diagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptxdiagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptx
EdunjobiTunde1
Lecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptxLecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptx
elvis24mutura
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptxGLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
GLOBAL-GOALS-LOCAL-ACTIONS-The-SDG-Journey-from-Vision-to-Reality.pptx
KunalBhadana3
2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf
2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf
2025-02-26_PwC_Global-Compliance-Study-2025 (1).pdf
pbavila
Presentation_DM_applications for another services
Presentation_DM_applications for another servicesPresentation_DM_applications for another services
Presentation_DM_applications for another services
aldowilmeryapita
Implications of Blockchain Technology in Agri-Food Supply Chains
Implications of Blockchain Technology in Agri-Food Supply ChainsImplications of Blockchain Technology in Agri-Food Supply Chains
Implications of Blockchain Technology in Agri-Food Supply Chains
Soumya Mohapatra
Mastering Data Science with Tutort Academy
Mastering Data Science with Tutort AcademyMastering Data Science with Tutort Academy
Mastering Data Science with Tutort Academy
yashikanigam1
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1
PostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.pptPostGIS Workshop: a comprehensive tutorial.ppt
PostGIS Workshop: a comprehensive tutorial.ppt
LonJames2
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwdENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
ENG8-Q4-MOD2.pdfajxnjdabajbadjbiadbiwdhiwdhwdhiwd
shekainahrosej
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
ARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider RegistryARCH 2025: New Mexico Respite Provider Registry
ARCH 2025: New Mexico Respite Provider Registry
Allen Shaw
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihfSTS-PRELIM-2025.pptxtyyfddjugggfssghghihf
STS-PRELIM-2025.pptxtyyfddjugggfssghghihf
TristanEvasco
Pr辿sentation did辿e id辿e pour faire un projet
Pr辿sentation did辿e id辿e pour faire un projetPr辿sentation did辿e id辿e pour faire un projet
Pr辿sentation did辿e id辿e pour faire un projet
tahatraval88
Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....
Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....
Abhijnanasakuntalam Play by Kalidas Based on the translation by Arthur W....
kiranprava2002
Drillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptxDrillingis_optimizedusingartificialneural.pptx
Drillingis_optimizedusingartificialneural.pptx
singhsanjays2107
SCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffv
SCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffvSCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffv
SCIENCE-TECHNOLOGY-AND-SOCIETY.pptxhhgfffv
TristanEvasco
Model Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartizationModel Context Protocol - path to LLM standartization
Model Context Protocol - path to LLM standartization
Antons Kranga
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
diagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptxdiagram ANN of factor and responses.pptx
diagram ANN of factor and responses.pptx
EdunjobiTunde1
Lecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptxLecture 2-DATABASE MODELS lecture 2.pptx
Lecture 2-DATABASE MODELS lecture 2.pptx
elvis24mutura

Learning Content Patterns from Linked Data

  • 1. Emir Mu単oz Fujitsu (Ireland) Limited National University of Ireland Galway LD4IE 2014 @ ISWC, Riva del Garda, Trentino, Italy. Oct 20th, 2014 http://bit.ly/1xYTR6Z (@emir_munoz)
  • 2. 2
  • 3. <subject, predicate, object> Domain(predicate) ?? Range(predicate) ?? 3
  • 4. select distinct ?obj where {?sub <http://dbpedia.org/property/isbn> ?obj} Lets run the following SPARQL query over endpoint And some more ... The endpoint response is a table with the values for the isbn property: So, what is the correct range for ? 4 0 71090 6176526 2 2.7073 140043853 1107020697 2940013968264 0978-02-02+02:00 http://dbpedia.org/resource/N/a "?"@en "ISBN 0-312-85182-0"@en "See text"@en "various"@en "ISBN 978-0-465-02656-2, ISBN 0-14-017997-6"@en "ISBN 0-553-07875-5 & ISBN 0-553-56166-9"@en "The Claiming of Sleeping Beauty: ISBN 0-452-26656-4"@en "-2.0"^^<http://dbpedia.org/datatype/second> "TBA"@en "not available"@en "[[#Bibliography"@en
  • 5. LOV Statistics (by July 7th, 2014): 446 vocabularies 10 classes and 20 properties in average 5 range of isbn is http://schema.org/Text
  • 6. but still, is it what Im looking for? what is the syntax? 6
  • 7. Etymology apo- + apsis Noun apoapsis (plural apoapsides) (astronomy) The point of a body's elliptical orbit about the system's centre of mass where the distance between the body and the centre of mass is at its maximum. Property: apoapsis [http://en.wiktionary.org/wiki/apoapsis] Earth Satellite dbr:17049_Miron dbo:apoapsis 4.01288e+11 7
  • 9. <subject, predicate, object> 1488-07-28+02:00 "September 2012"@en "--08-26+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 1982-05-23+02:00 "August 2012"@en "--01-24+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> 2007-04-11+02:00 "July 2009"@en "--06-11+02:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> Lerman et al. (JAIR 2003) First column: [NUM-NUM-NUM+NUM:NUM] (plain literal) Second column: [ALPHA<space>NUM] (plain literal + lang) Third column: [--NUM-NUM+NUM:NUM] (typed literal) <http://dbpedia.org/property/date> 9
  • 10. Let be the set of content patterns. Lerman et al. (JAIR 2003) More specific categories For the input set: That generates the following patterns: Values are decomposed in tokens, and each token is represented by a syntactic class. 10
  • 11. 2.4 billion RDF triples 53,230 properties Version 3.9 Split Method 19.25% plain literals 18.02% typed literals 62.73% without lang or datatype (xsd:string) 11
  • 12. For apoapsis example, we extracted one pattern And we also found some other related properties: For date example, we extracted 7 patterns http://dbpedia.org/ontology/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/ontology/Planet/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/ontology/Spacecraft/apoapsis LARGE/FLOAT_NUMBER 1.0 http://dbpedia.org/property/apoapsis NUMBER 0.9230769230769231 http://dbpedia.org/property/apoapsis LARGE/FLOAT_NUMBER 0.75213675 http://dbpedia.org/property/date -- SMALL_NUMBER - SMALL_NUMBER 0.2 http://dbpedia.org/property/date ALPHANUMERIC MEDIUM_NUMBER 0.166 http://dbpedia.org/property/date ALPHANUMERIC 2012 0.032 http://dbpedia.org/property/date ALPHANUMERIC.ALPHANUMERIC 0.012 And more 12
  • 13. The user has this value: 2014-10-20. What property can he use? dbp:dateCreated, dbp:dateOfProduction, dbp:dateOpened, dbp:dateSigned, dbp:dateOfPremiere, dbp:date, among others. What is the property dbp:admCtrOf used for? "town of republic significance of Meleuz"@en (http://dbpedia.org/resource/Meleuz) "town of oblast significance of Oktyabrsk"@en (http://dbpedia.org/resource/Oktyabrsk) "town of republic significance of Sortavala"@en (http://dbpedia.org/resource/Sortavala) э it is used to declare Administrative Control Of 13
  • 14. Check for atypical values (outliers) Close look into the most (in)frequent patterns Possible errors during automatic extraction For the dbp:isbn property we can find the following values: "summer or autumn 380"@en "Late November"@en "Fall 1040"@en 680 "December, 67 BC"@en "April-July 1799"@en http://dbpedia.org/resource/New_Year's_Day http://dbpedia.org/resource/Second_Intermediate_Period_of_Egypt "New moon day of Kartika, celebrations begin two days prior and end two days after that date"@en Are they orvalues? 14
  • 15. E-mail: user1@domain.com Given name: John Surname: Snow Birthday: 1986-02-14 A vCard, may be annotated with microformat hCard LD4IE Challenge 2014 vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . ALL_LOWERCASE 0.82 vcard:email mailto : ALPHA PUNCTUATION ALL_LOWERCASE . com 0.69 vcard:email mailto : ALPHA @ ALPHANUMERIC . ALL_LOWERCASE 0.54 vcard:email mailto : ALPHA @ ALPHANUMERIC . com 0.46 vcard:email mailto : ALL_UPPERCASE ****@ ALL_LOWERCASE . ALL_LOWERCASE 0.36 We can use our database to extract and validate the email: vcard:bday NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 vcard:bday MEDIUM_NUMBER - SMALL_NUMBER - SMALL_NUMBER 0.5 also the birthday 15
  • 16. Extraction of lexico-syntactic patterns from LD datasets Different use cases: Search for properties Validation of values Information extraction based on patterns Future work: Study of consistency analysis of knowledge bases Extension of patterns to cover other knowledge bases Among others 16 500,000 content patterns
  • 17. http://emunoz.org @emir_munoz Emir.Munoz@ie.fujistu.com https://github.com/emir-munoz/ld-patterns/