際際滷

際際滷Share a Scribd company logo
+




    The Open Provenance
    Model Vocabulary
    Jun Zhao
    University of Oxford
    Jun.zhao@zoo.ox.ac.uk
+
    Outline
       Background about data.gov.uk

       The use cases
            XML serialization
            Data transformation on the fly
            Complex and nested processes
            Provenance of non-digital artifacts

       The Open Provenance Model Vocabulary (OPMV)
            The rationale
            An overview
            Examples

       Future work

       Summary
+
    data.gov.uk

       Linking UK government data

       Aims:
            Provide a set of best practices for government agencies
            Provide the minimum set of tooling and specification to facilitate
              the publication and consumption of data
            Encourage responsible data publishing
+




Downloadedfrom;
Unzippedfrom,etc          input                    output       Madeaccessible
                                      XSLT Processor

                XSLT
              Parameter                                          RDF File
               Binding
                         XSLT
                      Stylesheet                        Who, when,
                                                       which version,
                         XSLT Template                     how
                                                          ContributedbyJeniTennison
+
    On-the-fly Transformation
                                                           Who, when,
                                                             which
                                                            version,
                                                              how
          http://mytransportatio.db/j10




                                          Data
                                     transformation
                                        wrapper




                                                      ContributedbyStuartWilliams
+
      Complex Data Creation Pipeline
                                                                 Document Reset PR

                 GATE Pipeline                                     ANNIE English
                                                                     Tokeniser

                                                               ANNIE English Splitter

    GateXMLRegressionTransformati                                ANNIE POS Tagger
                on
                                                                    Data.gov.uk
                                                               Morphological Analyzer

                                                                Data.gov.uk Flexible
       GateXMLRdfaTransformation                                  Roof Gazetteer

                                                                Data.gov.uk Generic
                                                                     Gazeteer

                                                                 GATE Noun Phrase
                                                                    Chunker
       RdfaRdfXmlTransformation
                                                                Data.gov.uk Generic
                                                                    Transducer

                                                                  TSO Coreference
CourtesyofPaulApplebyfromTSO(DataEnrichmentService)
皆艶姻厩庄界艶壊u壊艶糸b霞e恰艶界顎赫看稼壊


                 S3                      S2                        S1


accessedService
                      wasTriggeredBy         wasTriggeredBy
                 p3                      p2                        p1      Level1:Provenance
                                                                           ofexecuGon
                                iteraGonOfProcess                         atahigherlevel
  hasParentProcess

        p4
             followed   p5        p21        p22
                                                                           Level0:Provenance
                                                                           ofexecuGonata
                                                                           detailedlevel
       wasGeneratedBy            wasGeneratedBy        wasGeneratedBy


        d6               d5         d3         d2    AnarGfact   d1

                                                AdatacollecGon
             wasDerivedFrom             d4
+
    Non-digital Data Objects

       Organizations
            Organizational structure changes over time
            Origin organization, resulting Organization

       Boundary

       Legislation




                            AnorganizaGonontology:hOp://www.epimorphics.com/public/vocabulary/org.html
+
    The Challenges

       Data of different representations, of physical forms, of
         granularity

       Not tooling support

       Provenance across different types of systems
            Identification
            Different terminologies
+
    The Gaps

       A vocabulary being able to describe provenance of all types
         of data, from different systems

       A vocabulary providing enough terms to describe
         provenance accurately

       Guidance on creating and publishing provenance on the Web

       Tool supports for creating and publishing provenance on the
         Web

       Provenance access
+
    The Open Provenance Model
    Vocabulary
       Based on the Open Provenance Model

       Enable responsible data publication, in order to trace the
         responsible agents and to reproduce results

       Enable to describe provenance of any types of data

       An alternative implementation of the OWL OPM Serialization
+
    The Rationale

       Grounded upon existing SW technologies
            Do not explicitly define a graph, OPMGraph
            Named Graphs

       Reuse existing vocabularies

       Lightweight
            3 classes and 12 properties
            Reuse 3 classes from the W3C Time Ontology

       Easy to use and extend
+
    Overview of the Vocabulary

       Defined as a vocabulary expressed using OWL

       Implement the core concepts of the Open Provenance Model

       No specific granularity prescribed

       Partitioned into:
            The Core Module
            Other typed modules: common, xml, gate, sparql
+
            Overview of OPMV
                                                                    wasDerivedFrom

                        Agent                                                         wasUsedAt
                                                             Artifact
                                                                                      wasGeneratedAt
                              wasControlledBy
                                                           used
                                 wasGeneratedBy
                                                            wasPerformedAt          time:
                                                Process                          TemporalEntity
1 prefix   time: http://www.w3.org/2006/time#             wasTriggeredBy

      ObjectproperGesimplemenGngOPM                                 time:Interval       time:Instant
                                                    wasStartedAt
     ObjectproperGesnotasexactly
                                                    wasEndedAt
     de鍖nedinOPM
     rdfs:subClassOfrelaGonships                                  withRespectOf
+
    The When and Who of an Artifact

    _:d0
          rdf:type    opmv:Artifact ;
          opmv:wasGeneratedAt       _:t0 ;
          opmv:wasGeneratedBy [
              rdf:type    opmv:Process ;
              opmv:wasPerformedBy       _:p0
      ]
      .

      _:t0
          rdf:type    time:Instant ;
          time:inXSDDateTime "2010-10-07T12:09:00Z"^^xsd:dateTime ;
      .

      _:p0
          rdf:type    opmv:Agent, foaf:Agent ;
      .
+
     The Creation of An artifact (PC 3)

pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" .
pc1:a3 rdf:type opmv:Artifact
      opmv:wasGeneratedBy [
       rdf:type        opmv:Process;
        opmv:used pc1:p1 ;
        opmv:wasPerformedAt [
           rdf:type    time:Interval ;
           time:hasBeginning [
               a time:Instant ;
               time:inXSDDateTime "{PROCESS START TIME}"^^xsd:dateTime ] ;
           time:hasEnd time:hasBeginning [
               a time:Instant ;
               time:inXSDDateTime "{PROCESS END TIME}"^^xsd:dateTime ]
       ]
].
+
    The Provenance of An Organization

    @Prefix org: <http://www.w3.org/ns/org#>

    eg:org1

        rdf:type org:Organization, opmv:artifact ;

        org:resultedFrom [    ### subPropertyOf opmv:wasGeneratedBy


            rdf:type org:ChangeEvent, opmv:Process ;

            org:originalOrganization eg:org0 ;         ### subPropertyOf opmv:used


        ]

    .
+
    Using Named Graphs for OPM Accounts
    pc1:gr_273 {

        pc1:p5 rdf:type opmv:Process ;

              rdfs:label "Reslice 1" .

        pc1:a3 rdf:type opmv:Artifact

         opmv:wasGeneratedBy [

              rdf:type     opmv:Process;

              opmv:used pc1:p1 ] .

        pc1:p1   rdf:type opmv:Artifact .

    }

    pcl:gr_273 rdf:type <http://www.w3.org/2004/03/trix/rdfg-1/Graph> .
+
    Comparison with OPM OWL

       A more intuitive OWL ontology and RDF representation

       Take full advantage of SW technologies

       Lack of explicit semantics for graph membership

       Less expressivity, e.g. no cardinality constraints
+
    Future Development

       More typed modules

       A guide on how to publish provenance
            Where and how much
            What is the minimum provenance
            How to represent the information
+
    Summary

       The vocabulary is well-accepted and easy to understand for
         the data.gov.uk team

       Experimental adoption, not yet large scale production

       Missing the guidance on what provenance information to be
         created and published, and how

       Lack of ideas about how provenance information will be used
This work is created by Jun Zhao
     and licensed under a Creative
    Commons Attribution-Share Alike
+              3.0 License
     (http://creativecommons.org/
          licenses/by-sa/3.0/)

More Related Content

2010 10 provxg_datagovuk

  • 1. + The Open Provenance Model Vocabulary Jun Zhao University of Oxford Jun.zhao@zoo.ox.ac.uk
  • 2. + Outline Background about data.gov.uk The use cases XML serialization Data transformation on the fly Complex and nested processes Provenance of non-digital artifacts The Open Provenance Model Vocabulary (OPMV) The rationale An overview Examples Future work Summary
  • 3. + data.gov.uk Linking UK government data Aims: Provide a set of best practices for government agencies Provide the minimum set of tooling and specification to facilitate the publication and consumption of data Encourage responsible data publishing
  • 4. + Downloadedfrom; Unzippedfrom,etc input output Madeaccessible XSLT Processor XSLT Parameter RDF File Binding XSLT Stylesheet Who, when, which version, XSLT Template how ContributedbyJeniTennison
  • 5. + On-the-fly Transformation Who, when, which version, how http://mytransportatio.db/j10 Data transformation wrapper ContributedbyStuartWilliams
  • 6. + Complex Data Creation Pipeline Document Reset PR GATE Pipeline ANNIE English Tokeniser ANNIE English Splitter GateXMLRegressionTransformati ANNIE POS Tagger on Data.gov.uk Morphological Analyzer Data.gov.uk Flexible GateXMLRdfaTransformation Roof Gazetteer Data.gov.uk Generic Gazeteer GATE Noun Phrase Chunker RdfaRdfXmlTransformation Data.gov.uk Generic Transducer TSO Coreference CourtesyofPaulApplebyfromTSO(DataEnrichmentService)
  • 7. 皆艶姻厩庄界艶壊u壊艶糸b霞e恰艶界顎赫看稼壊 S3 S2 S1 accessedService wasTriggeredBy wasTriggeredBy p3 p2 p1 Level1:Provenance ofexecuGon iteraGonOfProcess atahigherlevel hasParentProcess p4 followed p5 p21 p22 Level0:Provenance ofexecuGonata detailedlevel wasGeneratedBy wasGeneratedBy wasGeneratedBy d6 d5 d3 d2 AnarGfact d1 AdatacollecGon wasDerivedFrom d4
  • 8. + Non-digital Data Objects Organizations Organizational structure changes over time Origin organization, resulting Organization Boundary Legislation AnorganizaGonontology:hOp://www.epimorphics.com/public/vocabulary/org.html
  • 9. + The Challenges Data of different representations, of physical forms, of granularity Not tooling support Provenance across different types of systems Identification Different terminologies
  • 10. + The Gaps A vocabulary being able to describe provenance of all types of data, from different systems A vocabulary providing enough terms to describe provenance accurately Guidance on creating and publishing provenance on the Web Tool supports for creating and publishing provenance on the Web Provenance access
  • 11. + The Open Provenance Model Vocabulary Based on the Open Provenance Model Enable responsible data publication, in order to trace the responsible agents and to reproduce results Enable to describe provenance of any types of data An alternative implementation of the OWL OPM Serialization
  • 12. + The Rationale Grounded upon existing SW technologies Do not explicitly define a graph, OPMGraph Named Graphs Reuse existing vocabularies Lightweight 3 classes and 12 properties Reuse 3 classes from the W3C Time Ontology Easy to use and extend
  • 13. + Overview of the Vocabulary Defined as a vocabulary expressed using OWL Implement the core concepts of the Open Provenance Model No specific granularity prescribed Partitioned into: The Core Module Other typed modules: common, xml, gate, sparql
  • 14. + Overview of OPMV wasDerivedFrom Agent wasUsedAt Artifact wasGeneratedAt wasControlledBy used wasGeneratedBy wasPerformedAt time: Process TemporalEntity 1 prefix time: http://www.w3.org/2006/time# wasTriggeredBy ObjectproperGesimplemenGngOPM time:Interval time:Instant wasStartedAt ObjectproperGesnotasexactly wasEndedAt de鍖nedinOPM rdfs:subClassOfrelaGonships withRespectOf
  • 15. + The When and Who of an Artifact _:d0 rdf:type opmv:Artifact ; opmv:wasGeneratedAt _:t0 ; opmv:wasGeneratedBy [ rdf:type opmv:Process ; opmv:wasPerformedBy _:p0 ] . _:t0 rdf:type time:Instant ; time:inXSDDateTime "2010-10-07T12:09:00Z"^^xsd:dateTime ; . _:p0 rdf:type opmv:Agent, foaf:Agent ; .
  • 16. + The Creation of An artifact (PC 3) pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" . pc1:a3 rdf:type opmv:Artifact opmv:wasGeneratedBy [ rdf:type opmv:Process; opmv:used pc1:p1 ; opmv:wasPerformedAt [ rdf:type time:Interval ; time:hasBeginning [ a time:Instant ; time:inXSDDateTime "{PROCESS START TIME}"^^xsd:dateTime ] ; time:hasEnd time:hasBeginning [ a time:Instant ; time:inXSDDateTime "{PROCESS END TIME}"^^xsd:dateTime ] ] ].
  • 17. + The Provenance of An Organization @Prefix org: <http://www.w3.org/ns/org#> eg:org1 rdf:type org:Organization, opmv:artifact ; org:resultedFrom [ ### subPropertyOf opmv:wasGeneratedBy rdf:type org:ChangeEvent, opmv:Process ; org:originalOrganization eg:org0 ; ### subPropertyOf opmv:used ] .
  • 18. + Using Named Graphs for OPM Accounts pc1:gr_273 { pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" . pc1:a3 rdf:type opmv:Artifact opmv:wasGeneratedBy [ rdf:type opmv:Process; opmv:used pc1:p1 ] . pc1:p1 rdf:type opmv:Artifact . } pcl:gr_273 rdf:type <http://www.w3.org/2004/03/trix/rdfg-1/Graph> .
  • 19. + Comparison with OPM OWL A more intuitive OWL ontology and RDF representation Take full advantage of SW technologies Lack of explicit semantics for graph membership Less expressivity, e.g. no cardinality constraints
  • 20. + Future Development More typed modules A guide on how to publish provenance Where and how much What is the minimum provenance How to represent the information
  • 21. + Summary The vocabulary is well-accepted and easy to understand for the data.gov.uk team Experimental adoption, not yet large scale production Missing the guidance on what provenance information to be created and published, and how Lack of ideas about how provenance information will be used
  • 22. This work is created by Jun Zhao and licensed under a Creative Commons Attribution-Share Alike + 3.0 License (http://creativecommons.org/ licenses/by-sa/3.0/)