際際滷

際際滷Share a Scribd company logo
Provenance in the Dynamic, Collaborative New
                  Science




                    Dr Jun Zhao
               Department of Zoology
                University of Oxford
              jun.zhao@zoo.ox.ac.uk
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
Technological infrastructure for the preservation and efficient
retrieval and reuse of scientific workflows in a range of disciplines
Packaging, preserving and publishing
Astronomy Use Case:
     A Repeater's Story
   Dealing with big amounts of tabular
    data
   A lot of small scripts to avoid creating
    blackbox process
   Local resource sharing, public
    access only after publication
   Data must be frequently updated
    from external data repositories
   Data updates must be tested before
    being executed
   Data must be locally stored with
    versioning
   ... we don't like to spread [the tasks]
    and lose controls who is doing
    what ...
Research Objects
http:/www.wf4ever-project.org
                                       
                                           Aggregation  Pointers or literals of
                                           internal and external content;
                                       
                                           Identity Equivalence, equality;
                                       
                                           Metadata  A reusable object;
                                       
                                           Lifecycle  Stages of development.
                                           Impacts on available functionality;
                                       
                                           Versioning  Recording changes;
                                       
                                           Security  Access, authentication,
                                           ownership, trust;
                                       
                                           Graceful Degradation of
                                           Understanding  Opaque RO
                                           domain content.
                                       
                                           Mixed stewardship
                                       
                                           Provenance
       ROs are Content Aware Objects
                                            
                                                Of compound objects
         that bundle things together
                                            
                                                Of evolutions
                                            
                                                Of dynamic objects and static
                                                objects
Biology Use Case: A Reuser's Story
   Takes a set of genes from gene experiment results
    performed by others, as read in a scientific paper
   Perform 'dry' analysis to understand which genes and
    which biological processes were disturbed by which
    chemical compounds
       basic affymetrix data processing
       statistical analysis to identify genes that are significantly
        differentially expressed under different conditions (with/without the
        compounds)
       find those pathways that are most prominent among the filtered
        genes
Biology Use Case: A Reuser's Story
   Search for existing experiments from
    myExperiment (http://myexperiment.org)
   Challenge: Understand the workflow
       Perform test runs with test data and his own data
       Read others' logs
       Read annotations to workflows
   Reuse scripts from colleagues and perform
    tests that his colleagues are familiar with
How Can It be Supported?
   A reference to the source of the data and the people to acknowledge for it.
   The initial hypothesis
   The conceptual workflow or a summary of the experiment plan
   References to workflows that were tested, with comments on their application for
    the user's use case
   The workflow of the user's, possibly with a backlog of previous versions that the
    user wishes to keep for reference (with notes and comments)
   The runs of the user's own workflow, results and the recorded steps that lead to
    the results, in some cases with comments for later reference (e.g. 'here I used
    parameter A, next time I may try B')
   The final hypothesis, with comments.
   A reference to the results of the workflow
   Design logs that record the user's considerations while making the workflow
   Run logs that record the user's considerations while running and interpreting the
    workflow
Where is Linked Data?
The Role of Linked Data in Wf4Ever
   Collaborative science
   Dynamic science
   Open science
Provenance Challenge
   Identity
   Context
   Storage
   Retrieval
Take home
   Provenance should be user-driven
   Linked Data should be a means to an end
   http://www.wf4ever-project.org
Acknowledgement
   Marco Roos of Leiden Unveristy (NL) and Jose
    Enrique Ruiz of Instituto de Astrof鱈sica de
    Andaluc鱈a (Spain)
   Carole Goble of University of Manchester (UK)
    and Jose Manuel Gomez of iSOCO (Spain)
   Hui Hua and Jenny Molly of University of
    Oxford (UK)

More Related Content

Similar to 2011 03-provenance-workshop-edingurgh (20)

Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science Preservation
Joint ALMA Observatory
VO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever projectVO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever project
Joint ALMA Observatory
NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)
timfu
Expert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentExpert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning Environment
Wolfgang Reinhardt
Piloting agile project management
Piloting agile project managementPiloting agile project management
Piloting agile project management
Natalie Collins
The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
Simeon Warner
La pr辿sentation de Jean-Paul de Vooght la soir辿e Citoyens Capteurs de la Ca...
La pr辿sentation de Jean-Paul de Vooght  la soir辿e Citoyens Capteurs de la Ca...La pr辿sentation de Jean-Paul de Vooght  la soir辿e Citoyens Capteurs de la Ca...
La pr辿sentation de Jean-Paul de Vooght la soir辿e Citoyens Capteurs de la Ca...
CitoyensCapteurs
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
GenomeInABottle
Workflow Preservation
Workflow PreservationWorkflow Preservation
Workflow Preservation
Jose Enrique Ruiz
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
Jose Manuel G坦mez-P辿rez
Learning Objects
Learning ObjectsLearning Objects
Learning Objects
johnmill
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
seanb
Post-graduate course: Object technology: Persistence.
Post-graduate course: Object technology: Persistence.Post-graduate course: Object technology: Persistence.
Post-graduate course: Object technology: Persistence.
Baltasar Garc鱈a Perez-Schofield
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
Oscar Corcho
Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012
Mar鱈a Poveda Villal坦n
Core Java for Selenium
Core Java for SeleniumCore Java for Selenium
Core Java for Selenium
Rajathi-QA
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
Melanie Courtot
PERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - IntroductionPERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - Introduction
PERICLES_FP7
Recognition at end of Year 1
Recognition at end of Year 1Recognition at end of Year 1
Recognition at end of Year 1
FET AWARE project - Self Awareness in Autonomic Systems
Empirical se 2013-01-17
Empirical se 2013-01-17Empirical se 2013-01-17
Empirical se 2013-01-17
Ivica Crnkovic
Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science Preservation
Joint ALMA Observatory
VO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever projectVO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever project
Joint ALMA Observatory
NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)
timfu
Expert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentExpert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning Environment
Wolfgang Reinhardt
Piloting agile project management
Piloting agile project managementPiloting agile project management
Piloting agile project management
Natalie Collins
The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
Simeon Warner
La pr辿sentation de Jean-Paul de Vooght la soir辿e Citoyens Capteurs de la Ca...
La pr辿sentation de Jean-Paul de Vooght  la soir辿e Citoyens Capteurs de la Ca...La pr辿sentation de Jean-Paul de Vooght  la soir辿e Citoyens Capteurs de la Ca...
La pr辿sentation de Jean-Paul de Vooght la soir辿e Citoyens Capteurs de la Ca...
CitoyensCapteurs
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
GenomeInABottle
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
Jose Manuel G坦mez-P辿rez
Learning Objects
Learning ObjectsLearning Objects
Learning Objects
johnmill
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
seanb
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
Oscar Corcho
Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012
Mar鱈a Poveda Villal坦n
Core Java for Selenium
Core Java for SeleniumCore Java for Selenium
Core Java for Selenium
Rajathi-QA
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
Melanie Courtot
PERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - IntroductionPERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - Introduction
PERICLES_FP7
Empirical se 2013-01-17
Empirical se 2013-01-17Empirical se 2013-01-17
Empirical se 2013-01-17
Ivica Crnkovic

More from Jun Zhao (14)

2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovuk
Jun Zhao
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk
Jun Zhao
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_next
Jun Zhao
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
Jun Zhao
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburgh
Jun Zhao
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
Jun Zhao
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod London
Jun Zhao
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
Jun Zhao
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
Jun Zhao
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
Jun Zhao
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDA
Jun Zhao
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
Jun Zhao
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
Jun Zhao
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow
Jun Zhao
2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovuk
Jun Zhao
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk
Jun Zhao
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_next
Jun Zhao
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
Jun Zhao
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburgh
Jun Zhao
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
Jun Zhao
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod London
Jun Zhao
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
Jun Zhao
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
Jun Zhao
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
Jun Zhao
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDA
Jun Zhao
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
Jun Zhao
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
Jun Zhao
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow
Jun Zhao

2011 03-provenance-workshop-edingurgh

  • 1. Provenance in the Dynamic, Collaborative New Science Dr Jun Zhao Department of Zoology University of Oxford jun.zhao@zoo.ox.ac.uk
  • 5. Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines
  • 7. Astronomy Use Case: A Repeater's Story Dealing with big amounts of tabular data A lot of small scripts to avoid creating blackbox process Local resource sharing, public access only after publication Data must be frequently updated from external data repositories Data updates must be tested before being executed Data must be locally stored with versioning ... we don't like to spread [the tasks] and lose controls who is doing what ...
  • 8. Research Objects http:/www.wf4ever-project.org Aggregation Pointers or literals of internal and external content; Identity Equivalence, equality; Metadata A reusable object; Lifecycle Stages of development. Impacts on available functionality; Versioning Recording changes; Security Access, authentication, ownership, trust; Graceful Degradation of Understanding Opaque RO domain content. Mixed stewardship Provenance ROs are Content Aware Objects Of compound objects that bundle things together Of evolutions Of dynamic objects and static objects
  • 9. Biology Use Case: A Reuser's Story Takes a set of genes from gene experiment results performed by others, as read in a scientific paper Perform 'dry' analysis to understand which genes and which biological processes were disturbed by which chemical compounds basic affymetrix data processing statistical analysis to identify genes that are significantly differentially expressed under different conditions (with/without the compounds) find those pathways that are most prominent among the filtered genes
  • 10. Biology Use Case: A Reuser's Story Search for existing experiments from myExperiment (http://myexperiment.org) Challenge: Understand the workflow Perform test runs with test data and his own data Read others' logs Read annotations to workflows Reuse scripts from colleagues and perform tests that his colleagues are familiar with
  • 11. How Can It be Supported? A reference to the source of the data and the people to acknowledge for it. The initial hypothesis The conceptual workflow or a summary of the experiment plan References to workflows that were tested, with comments on their application for the user's use case The workflow of the user's, possibly with a backlog of previous versions that the user wishes to keep for reference (with notes and comments) The runs of the user's own workflow, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. 'here I used parameter A, next time I may try B') The final hypothesis, with comments. A reference to the results of the workflow Design logs that record the user's considerations while making the workflow Run logs that record the user's considerations while running and interpreting the workflow
  • 13. The Role of Linked Data in Wf4Ever Collaborative science Dynamic science Open science
  • 14. Provenance Challenge Identity Context Storage Retrieval
  • 15. Take home Provenance should be user-driven Linked Data should be a means to an end http://www.wf4ever-project.org
  • 16. Acknowledgement Marco Roos of Leiden Unveristy (NL) and Jose Enrique Ruiz of Instituto de Astrof鱈sica de Andaluc鱈a (Spain) Carole Goble of University of Manchester (UK) and Jose Manuel Gomez of iSOCO (Spain) Hui Hua and Jenny Molly of University of Oxford (UK)