際際滷

際際滷Share a Scribd company logo
SSONDE:Semantic Similarity On
     liNked Data Entities
       (Example/DEMO)
                  Riccardo Albertoni
             ralbertoni@delicias.dia.fi.upm.es
                Ontology Engineering Group
             Departamento de Inteligencia Artificial
                   Facultad de Inform叩tica
              Universidad Polit辿cnica de Madrid


       Acknowledgment: SSONDE in its current
(pre)release results mainly from the research activity
   I did at IMATI-CNR in collaboration with M. De
                      Martino
Linked data Crawling architectural pattern

                           Build analysis   Cluster analysis   Explorative search
                             services                            on resources

                                                       SSONDE


                     LDSPIDER/FUSE           LDIF
                           KI




Riccardo Albertoni                                                             2
Getting started

        SSONDEs code
         is hosted as an Google Code
         project, http://code.google.com/p/ssonde/
         licenced as open source code (GNU GPL v3)

        Info about how getting started are available at
         http://code.google.com/p/ssonde/wiki/GettingStarted




Riccardo Albertoni                                               3
Applying SSONDE

Lets apply SSONDE
 on Linked data resources exposed by
   third parties
      Info about CNR provided by
         Gangemi et all, Data.cnr.it
 To work out similarities among
   researchers according to their research
   interests




                                             Date: 15/03/2012
                                                         4
What to download/install for this Example

        - SSONDE
                     -   svn checkout
                         http://ssonde.googlecode.com/svn/SSONDEv1
                         SSONDEV1DEMO


         For crawling data
        - LDSpider
                     - code.google.com/p/ldspider/
        - Fuseski
                     - http://jena.apache.org/documentation/serving_data/index.ht
                       ml




Riccardo Albertoni                                                                  5
Crawling RDF fragments from data.cnr.it

        Which resources are you interested on?
        1.   Identify a set of seeds entities
                      E.g. researchers,
                       http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleEste
                       rno/ID226 (Riccardo Albertoni)
        2. Figure out which entity features (i.e., Object
        properties and Data properties) are interesting
                      Analyse schemas which are deployed to characterize that
                       kind of resources, e.g.,
                          http://www.cnr.it/ontology/cnr/pubblicazioni.owl
                           (prefix:pub)
                              pub:autoreCNRDi (the papers written by the
                               authors)
                              dc:subject (authors scientific interest)

Riccardo Albertoni                                                                     6
Crawling RDF fragments from data.cnr.it

        3. Crawl metadata of resources to be analyzed
        creating a seedDATACNRIT.txt file
                      java -jar ./LDSpider/ldspider-1.1e.jar -s seedDATACNRIT.txt
                       -b 5 -1 -oe http://localhost:3030/XXX/update
        3 bis. if you have a more precise idea of which
          feature you need, the list of properties to be
          traversed can be specified by f option
                      java -jar ./LDSpider/ldspider-1.1e.jar -b 5 -1
                     -f http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autoreCNRDi
                     -f http://www.cnr.it/ontology/cnr/pubblicazioni.owl#coautore
                     -f http://purl.org/dc/terms/subject -f
                         http://www.w3.org/2004/02/skos/core#broader -s
                       seedDATACNRIT.txt -oe "http://localhost:3030/XXX/update"



Riccardo Albertoni                                                                     7
Configuration                          Similarity                                   Output
    List of Instances                    Context Layer
     Java Class to                                                                Similarity matrix in
                                         Ontology Layer                                  CVS
    generate the list




                                                                                                          SSONDE
                                          Data Layer
     Ref. Context                                                                   n-most similar
                                                                                       entities
     Kind of Store                     Data wrappers                                  In JSON
       Ref. Rules          JENA   JENA       JENA        Virtuoso    ...
   (e.g., JENA rules)      MEM     SDB        TDB         Wrppr




  Local Data Store
      /Cache
                           RDF    SDB        TDB
                          Dumps                          virtuoso    .
                                  Rep.       Rep.


                                   Crawling architectural pattern




                                                                                                         WEB OF
Linked data consumption




                                                                                                          DATA
                            LDSpider +Fuseki                  LDIF


      Third parties        RDF    HTTP DEREFERENCIABLE                SPARQL
 Served Linked dataset    Dumps            URIs                      End Points




                                                                                                                   9
Configuration file 1

        { "StoreConfiguration":{
                 "KindOfStore":"JENATDB",
                 "RDFDocumentURIs":[ ],
                 "TDBDirectory":"data/CNRIT/TDB-0.8.9/CNRR/"
            },                                              List of LOD Entities URI
        "InstanceConfiguration":{                Java class Implementing ListOfInputInstances
                 "InstanceURIsClass":"application.dataCNRIt.GetResearcherIMATIplusCoauthor"
            },
            "OutputConfiguration":{       Similarity Matrix CSV - JSON encoding of top n-most
                 "KindOfOutput":"JSONOrderedResult",              similar
                 "NumberOfOrderedResult":20",
                 "FilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CRRIIntPub.res.json"
            },                               Context Encoded in a format in-house text format/
                                                         hopefully soon in JSON
         "ContextConfiguration":{
                 "ContextFilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CCRIIntPub.ctx"
            }
        }

Riccardo Albertoni                                                                                  10
Data.cnr.it  defining a context
                                                 pub:autoreCNRdi
                 Crawled by Data.CNR.it
                                                     dc:subject
                 pub: 22
                                                  skos:broader

                                  Res 226

                 pub: 26                           Crawled by DBPEDIA

                                                                    Topic:27
                                  Res 225         Topic:25

                 pub: 29

                                                                   Topic:2

                                                    Topic:26
                                   Res 226


                                                                    Topic:23


PREFIX dc: <http://purl.org/dc/terms/>
                                                                             No data
PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#>           properties are
[owl:Thing]-> {{}, { (pub:autoreCNRDi, Inter),(dc:subject, Simil)}}     considered in this
                                                                             context
[owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}}
  Riccardo Albertoni                                                                    11
run SSONDE



From the directory where you have downloaded SONDE (
  e.g., SSONDEV1DEMO)

 java -classpath ./lib/*:./bin SSONDEv1.SemSim
conf JSONconfigurationFile1

 java -classpath ./lib/*:./bin/ SSONDEv1.SemSim -conf
  conf/dataCNRIt/ComplexContextResearchInterest/CCRIInt
  Pub.param.json




Riccardo Albertoni                                        12
Ex. Of Tool to interpret SSONDE results
SSONDE does not have its
  own GUI,
results are interpreted by
  elaborating its similarity
  matrix via

1) Excel files
2) Hierarchical Clustering
Explorer, 3.0, Human-
Computer Interaction Lab
University of Maryland.
http://www.cs.umd.edu/hcil/
multi-cluster/.

                                                             14
Riccardo Albertoni   15
Hierarchical Clustering




Riccardo Albertoni                        16
Data.cnr.it  defining a context
                                                 pub:autoreCNRdi
                 Crawled by Data.CNR.it
                                                     dc:subject
                 pub: 22
                                                  skos:broader

                                  Res 226

                 pub: 26                           Crawled by DBPEDIA

                                                                    Topic:27
                                  Res 225         Topic:25

                 pub: 29

                                                                   Topic:2

                                                    Topic:26
                                   Res 226


                                                                    Topic:23


PREFIX dc: <http://purl.org/dc/terms/>
PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#>
[owl:Thing]->{{},{ (pub:autoreCNRDi, Inter),(dc:subject, Simil)}}
[owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}}
  Riccardo Albertoni                                                           17
What next?

        (i) semantic similarity optimization:
                (i)    the caching of intermediate similarity results
                (ii)   the adoption of MapReduce paradigm to speed up the
                       assessment of semantic similarity;
        (ii) domain driven extensions at data layer:
                (i)  defining new data layer measures suited for geo-
                     referenced entities
                (ii) the multilingual similarity recently proposed by Kartic
                     and Jorge could be considered for inclusion in the
                     data layer of SSONDE
        (iii) definition of interfaces sifting entities according to
            their similarity, e.g., exploiting existing visualization
            frameworks (e.g., Exibit, Google visualization and
            JavaScript InfoVis Toolkit).

Riccardo Albertoni                                                             18
What Id like you to think about 

          Can SSONDE be useful in any of your current
           research activities ?
         Can SSONDE be deployed in some of your future
           projects (proposal)?
         Are you interested in contributing somehow to
           SSONDE?
        Further details are available in
         R. Albertoni, M. De Martino, SSONDE: Semantic
          Similarity On liNked Data Entities, 6th Metadata and
          Semantics Research Conference, 28-30 November
          2012 - C叩diz (Spain) [to appear]
         Framework documentation
                      http://code.google.com/p/ssonde/wiki/GettingStarted

Riccardo Albertoni                                                           19

More Related Content

Similar to SSONDE: Semantic Similarity On liNked Data Entities (20)

PDF
Linked Open data: CNR
DatiGovIT
PPTX
Semantic Web and Related Work at W3C
Ivan Herman
PPTX
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Peter Haase
PDF
SMERST 2013
Bart van Leeuwen
PDF
Webinar: Semantic web for developers
Semantic Web Company
PDF
ISWC GoodRelations Tutorial Part 2
Martin Hepp
PDF
GoodRelations Tutorial Part 2
guestecacad2
PPTX
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
PDF
Some news about the SW
Ivan Herman
PDF
The state of the art in Linked Data
Joshua Shinavier
PPTX
Linked Data: opportunities and challenges
Michael Hausenblas
PPTX
SRBench Streaming RDF SPARQL Benchmark
Jean-Paul Calbimonte
KEY
Introduction to the Semantic Web
Nuxeo
PDF
REST and Linked Data: a match made for domain driven development?
ruyalarcon
PDF
Sharing data on the web (2013)
3 Round Stones
PDF
Sw 5semantic web-primer
okeee
PDF
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
okeee
PDF
Semantic web-primer
okeee
PPSX
The Web of data and web data commons
Jesse Wang
PDF
STI Summit 2011 - Linked data-services-streams
Semantic Technology Institute International
Linked Open data: CNR
DatiGovIT
Semantic Web and Related Work at W3C
Ivan Herman
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Peter Haase
SMERST 2013
Bart van Leeuwen
Webinar: Semantic web for developers
Semantic Web Company
ISWC GoodRelations Tutorial Part 2
Martin Hepp
GoodRelations Tutorial Part 2
guestecacad2
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
Some news about the SW
Ivan Herman
The state of the art in Linked Data
Joshua Shinavier
Linked Data: opportunities and challenges
Michael Hausenblas
SRBench Streaming RDF SPARQL Benchmark
Jean-Paul Calbimonte
Introduction to the Semantic Web
Nuxeo
REST and Linked Data: a match made for domain driven development?
ruyalarcon
Sharing data on the web (2013)
3 Round Stones
Sw 5semantic web-primer
okeee
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
okeee
Semantic web-primer
okeee
The Web of data and web data commons
Jesse Wang
STI Summit 2011 - Linked data-services-streams
Semantic Technology Institute International

More from Riccardo Albertoni (10)

PPTX
Albertoni ldq workshop ESWC 2015
Riccardo Albertoni
PPTX
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Riccardo Albertoni
PPTX
Presentation at MTSR 2012
Riccardo Albertoni
PPT
LusTRE: a Linked Thesaurus fRamework for Environment
Riccardo Albertoni
PPTX
Linkset quality (LWDM 2013)
Riccardo Albertoni
PPTX
Linkset quality
Riccardo Albertoni
PPTX
An ontology driven module for accessing chronic pathology literature- CHRONIO...
Riccardo Albertoni
PDF
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Riccardo Albertoni
PPTX
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
PPTX
SKOS and semantic web best practice to access terminological resources: Natur...
Riccardo Albertoni
Albertoni ldq workshop ESWC 2015
Riccardo Albertoni
Environmental Thesauri Under the Lens of Reusability (EGOVIS 2014)
Riccardo Albertoni
Presentation at MTSR 2012
Riccardo Albertoni
LusTRE: a Linked Thesaurus fRamework for Environment
Riccardo Albertoni
Linkset quality (LWDM 2013)
Riccardo Albertoni
Linkset quality
Riccardo Albertoni
An ontology driven module for accessing chronic pathology literature- CHRONIO...
Riccardo Albertoni
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Riccardo Albertoni
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
SKOS and semantic web best practice to access terminological resources: Natur...
Riccardo Albertoni
Ad

SSONDE: Semantic Similarity On liNked Data Entities

  • 1. SSONDE:Semantic Similarity On liNked Data Entities (Example/DEMO) Riccardo Albertoni ralbertoni@delicias.dia.fi.upm.es Ontology Engineering Group Departamento de Inteligencia Artificial Facultad de Inform叩tica Universidad Polit辿cnica de Madrid Acknowledgment: SSONDE in its current (pre)release results mainly from the research activity I did at IMATI-CNR in collaboration with M. De Martino
  • 2. Linked data Crawling architectural pattern Build analysis Cluster analysis Explorative search services on resources SSONDE LDSPIDER/FUSE LDIF KI Riccardo Albertoni 2
  • 3. Getting started SSONDEs code is hosted as an Google Code project, http://code.google.com/p/ssonde/ licenced as open source code (GNU GPL v3) Info about how getting started are available at http://code.google.com/p/ssonde/wiki/GettingStarted Riccardo Albertoni 3
  • 4. Applying SSONDE Lets apply SSONDE on Linked data resources exposed by third parties Info about CNR provided by Gangemi et all, Data.cnr.it To work out similarities among researchers according to their research interests Date: 15/03/2012 4
  • 5. What to download/install for this Example - SSONDE - svn checkout http://ssonde.googlecode.com/svn/SSONDEv1 SSONDEV1DEMO For crawling data - LDSpider - code.google.com/p/ldspider/ - Fuseski - http://jena.apache.org/documentation/serving_data/index.ht ml Riccardo Albertoni 5
  • 6. Crawling RDF fragments from data.cnr.it Which resources are you interested on? 1. Identify a set of seeds entities E.g. researchers, http://www.cnr.it/ontology/cnr/individuo/unitaDiPersonaleEste rno/ID226 (Riccardo Albertoni) 2. Figure out which entity features (i.e., Object properties and Data properties) are interesting Analyse schemas which are deployed to characterize that kind of resources, e.g., http://www.cnr.it/ontology/cnr/pubblicazioni.owl (prefix:pub) pub:autoreCNRDi (the papers written by the authors) dc:subject (authors scientific interest) Riccardo Albertoni 6
  • 7. Crawling RDF fragments from data.cnr.it 3. Crawl metadata of resources to be analyzed creating a seedDATACNRIT.txt file java -jar ./LDSpider/ldspider-1.1e.jar -s seedDATACNRIT.txt -b 5 -1 -oe http://localhost:3030/XXX/update 3 bis. if you have a more precise idea of which feature you need, the list of properties to be traversed can be specified by f option java -jar ./LDSpider/ldspider-1.1e.jar -b 5 -1 -f http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autoreCNRDi -f http://www.cnr.it/ontology/cnr/pubblicazioni.owl#coautore -f http://purl.org/dc/terms/subject -f http://www.w3.org/2004/02/skos/core#broader -s seedDATACNRIT.txt -oe "http://localhost:3030/XXX/update" Riccardo Albertoni 7
  • 8. Configuration Similarity Output List of Instances Context Layer Java Class to Similarity matrix in Ontology Layer CVS generate the list SSONDE Data Layer Ref. Context n-most similar entities Kind of Store Data wrappers In JSON Ref. Rules JENA JENA JENA Virtuoso ... (e.g., JENA rules) MEM SDB TDB Wrppr Local Data Store /Cache RDF SDB TDB Dumps virtuoso . Rep. Rep. Crawling architectural pattern WEB OF Linked data consumption DATA LDSpider +Fuseki LDIF Third parties RDF HTTP DEREFERENCIABLE SPARQL Served Linked dataset Dumps URIs End Points 9
  • 9. Configuration file 1 { "StoreConfiguration":{ "KindOfStore":"JENATDB", "RDFDocumentURIs":[ ], "TDBDirectory":"data/CNRIT/TDB-0.8.9/CNRR/" }, List of LOD Entities URI "InstanceConfiguration":{ Java class Implementing ListOfInputInstances "InstanceURIsClass":"application.dataCNRIt.GetResearcherIMATIplusCoauthor" }, "OutputConfiguration":{ Similarity Matrix CSV - JSON encoding of top n-most "KindOfOutput":"JSONOrderedResult", similar "NumberOfOrderedResult":20", "FilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CRRIIntPub.res.json" }, Context Encoded in a format in-house text format/ hopefully soon in JSON "ContextConfiguration":{ "ContextFilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CCRIIntPub.ctx" } } Riccardo Albertoni 10
  • 10. Data.cnr.it defining a context pub:autoreCNRdi Crawled by Data.CNR.it dc:subject pub: 22 skos:broader Res 226 pub: 26 Crawled by DBPEDIA Topic:27 Res 225 Topic:25 pub: 29 Topic:2 Topic:26 Res 226 Topic:23 PREFIX dc: <http://purl.org/dc/terms/> No data PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#> properties are [owl:Thing]-> {{}, { (pub:autoreCNRDi, Inter),(dc:subject, Simil)}} considered in this context [owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}} Riccardo Albertoni 11
  • 11. run SSONDE From the directory where you have downloaded SONDE ( e.g., SSONDEV1DEMO) java -classpath ./lib/*:./bin SSONDEv1.SemSim conf JSONconfigurationFile1 java -classpath ./lib/*:./bin/ SSONDEv1.SemSim -conf conf/dataCNRIt/ComplexContextResearchInterest/CCRIInt Pub.param.json Riccardo Albertoni 12
  • 12. Ex. Of Tool to interpret SSONDE results SSONDE does not have its own GUI, results are interpreted by elaborating its similarity matrix via 1) Excel files 2) Hierarchical Clustering Explorer, 3.0, Human- Computer Interaction Lab University of Maryland. http://www.cs.umd.edu/hcil/ multi-cluster/. 14
  • 15. Data.cnr.it defining a context pub:autoreCNRdi Crawled by Data.CNR.it dc:subject pub: 22 skos:broader Res 226 pub: 26 Crawled by DBPEDIA Topic:27 Res 225 Topic:25 pub: 29 Topic:2 Topic:26 Res 226 Topic:23 PREFIX dc: <http://purl.org/dc/terms/> PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#> [owl:Thing]->{{},{ (pub:autoreCNRDi, Inter),(dc:subject, Simil)}} [owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}} Riccardo Albertoni 17
  • 16. What next? (i) semantic similarity optimization: (i) the caching of intermediate similarity results (ii) the adoption of MapReduce paradigm to speed up the assessment of semantic similarity; (ii) domain driven extensions at data layer: (i) defining new data layer measures suited for geo- referenced entities (ii) the multilingual similarity recently proposed by Kartic and Jorge could be considered for inclusion in the data layer of SSONDE (iii) definition of interfaces sifting entities according to their similarity, e.g., exploiting existing visualization frameworks (e.g., Exibit, Google visualization and JavaScript InfoVis Toolkit). Riccardo Albertoni 18
  • 17. What Id like you to think about Can SSONDE be useful in any of your current research activities ? Can SSONDE be deployed in some of your future projects (proposal)? Are you interested in contributing somehow to SSONDE? Further details are available in R. Albertoni, M. De Martino, SSONDE: Semantic Similarity On liNked Data Entities, 6th Metadata and Semantics Research Conference, 28-30 November 2012 - C叩diz (Spain) [to appear] Framework documentation http://code.google.com/p/ssonde/wiki/GettingStarted Riccardo Albertoni 19