際際滷

際際滷Share a Scribd company logo
Digital Enterprise Research Institute                                          www.deri.ie




            Capturing interactive data transformation
             operations using provenance workflows

             Tope Omitola, Andre Freitas, Edward Curry, Sean
             O'Riain, Nicholas Gibbins and Nigel Shadbolt



  SWPM Workshop 28.05.2012, Herakleion, Crete


 Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Outline
Digital Enterprise Research Institute                 www.deri.ie




           Motivation
           Interactive data transformations (IDTs)
           IDT & Provenance
           Modelling IDTs
           Provenance Representation
           Provenance Capture
           Case Study
           Conclusion
Motivation
Digital Enterprise Research Institute                                  www.deri.ie




           Dataspaces:
                 High number of heterogeneous data sources
                 Complex data transformation environment
                 Need for both repeatable data transformations and once-
                  off transformations
           Traditional    ETL     approaches                 for     data
            transformation/integration:
                 Based on scripting/programming
                 Focus on repeatable data transformation processes
Interactive Data Transformation (IDTs)
Digital Enterprise Research Institute                   www.deri.ie




        Based on user interaction paradigms for user
         creation of data transformations
        Explores    GUI    elements    mapping   to   data
         transformation operations
        Instant feedback of each iteration
        Complementary to existing ETL tools
        Lower the barriers for non-programmers (reduces
         programming effort) of doing data transformations
        Example platforms: Google Refine, Potters Wheel,
         Wrangler
Interactive Data Transformation (IDTs)
Digital Enterprise Research Institute      www.deri.ie
Challenges
Digital Enterprise Research Institute                            www.deri.ie




           How to model IDTs?

           Facilitating the reuse of previous IDTs

           Representing IDTs
                                                           Provenance

           Making IDT platforms provenance-aware

           Enabling transportability across IDT and ETL
            platforms
IDT & Provenance
Digital Enterprise Research Institute                     www.deri.ie




           Provenance supports representation of interactive
            data transformations
           Output: a provenance descriptor which shows the
            relationship between the inputs, the outputs, and
            the applied transformation operations
           Both retrospective and prospective provenance
IDT
Digital Enterprise Research Institute        www.deri.ie




           IDT model
           Formal model (Algebra for IDT)
           Provenance representation
           Provenance capture of IDTs
IDT Model: Core Elements
Digital Enterprise Research Institute                       www.deri.ie




           Schema and instance data
           Set of predefined operations
           GUI elements mapping to predefined operations
           User actions
                 Operation selection
                 Parameter selection
                 Operation composition (workflow)
IDT Model
Digital Enterprise Research Institute   www.deri.ie
Formalizing the mapping from IDT to
     Provenance
Digital Enterprise Research Institute                        www.deri.ie




           Definition 1: A provenance-based interactive data
            transformation engine, consists of a set of
            transformations (or activities) on a set of datasets
            generating outputs in the form of other datasets or
            events which may trigger further transformations

           Definition 2: An interactive data transformation
            event, consists of the input dataset, the output
            dataset(s), the applied transformation function,
            and the time the transformation took place
Formalizing the mapping from IDT to
        Provenance
Digital Enterprise Research Institute                       www.deri.ie




           Definition 3: A run is a function from time to
            dataset(s) and the transformation applied to those
            dataset(s)

           Definition 4: A trace is the sequence of pairs of a
            run and the time the run was made
Provenance Representation
Digital Enterprise Research Institute                      www.deri.ie




           Proposed in Representing Interoperable Provenance
            Descriptions for ETL Workflows

           Three-layered provenance model:
                 Open Provenance Model Vocabulary Layer
                 Cogs ETL Provenance Vocabulary
                 Domain-Specific Model Layer


           Linked Data standards
Provenance Capture Layers
Digital Enterprise Research Institute   www.deri.ie
Provenance Event-Capture Sequence Flow
Digital Enterprise Research Institute    www.deri.ie
Case study
Digital Enterprise Research Institute                                                                                    www.deri.ie




        Implementation over the GR Platform
        Example descriptor

   @prefix grf: <http://127.0.0.1:3333/project/1402144365904/> .

   grf :MassCellChange-1092380975 rdf:type opmv:Process,
   cogs:ColumnOperation, cogs:Transformation;                                 Mapping to the actual program
   cogs:operationName "MassCellChange"^^xsd:string;
   cogs:programUsed "com.google.refine.operations.cell.MassEditOperation"^^xsd:string;                  Process
   rdfs:label "Mass edit 1 cells in column ==List of winners=="^^xsd:string.

   grf:MassCellChange-1092380975/1_0 rdf:type opmv:Artifact ;                                                       Input Artifact
   rdfs:label "* '''1955 [[Meena Kumari]]'[[Parineeta (1953 film)|Parineeta]]''''' as '''Lolita'''"^^xsd:string.

   grf:MassCellChange-1092380975/1_1 rdf:type opmv:Artifact;                                                       Output Artifact
   rdfs:label "* '''John Wayne'''"^^xsd:string.
                                                                                                            Workflow structure
   grf:MassCellChange-1092380975/1_1 opmv:wasDerivedFrom grf:MassCellChange-1092380975/1_0.
   grf:MassCellChange-1092380975 opmv:used grf:MassCellChange-1092380975/1_0.
   grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedBy grf:MassCellChange-1092380975.
   grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedAt "2011-11-16T11:2:14"^xsd: dateTime.
Conclusion
Digital Enterprise Research Institute                     www.deri.ie




           The proposed approach provides low impact on the
            existing IDT process
           Provenance representation supports different data
            models
           Preliminary implementation of a Google Refine
            provenance extension
Ad

Recommended

Aps ScanView
Aps ScanView
Roland Meulenbroek
Active Data PDSW'13
Active Data PDSW'13
Gilles Fedak
alphablues - ML applied to text and image in chat bots
alphablues - ML applied to text and image in chat bots
Andr辿 Karpi邸t邸enko
Ordex Presentation at Nationaal Congres Open Data Eindhoven 20 april 2012
Ordex Presentation at Nationaal Congres Open Data Eindhoven 20 april 2012
Tom Zeppenfeldt IEC MSc
How to Achieve Cross-Industry Semantic Interoperability
How to Achieve Cross-Industry Semantic Interoperability
Doug Migliori
Bhadale group of companies - clean tech innovations programs catalogue
Bhadale group of companies - clean tech innovations programs catalogue
Vijayananda Mohire
STI Summit 2011 - Limits of LOD
STI Summit 2011 - Limits of LOD
Semantic Technology Institute International
Total Data Industry Report
Total Data Industry Report
Ran Zhang
Knowledge management on the desktop
Knowledge management on the desktop
Laura Dragan
Linked Open Data
Linked Open Data
Derilinx
Camp 4-data workshop presentation
Camp 4-data workshop presentation
Paolo Missier
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata Provenance
Kai Eckert
Provinance in scientific workflows in e science
Provinance in scientific workflows in e science
bdemchak
Challenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial Data
Edward Curry
Self-service Linked Government Data
Self-service Linked Government Data
Fadi Maali
Data and end-to-end Explainability (XAI,XEE)
Data and end-to-end Explainability (XAI,XEE)
Paolo Missier
Data Curation at the New York Times
Data Curation at the New York Times
Edward Curry
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Isuru Suriarachchi
Linked Data: opportunities and challenges
Linked Data: opportunities and challenges
Michael Hausenblas
Wikipedia (DBpedia): Crowdsourced Data Curation
Wikipedia (DBpedia): Crowdsourced Data Curation
Edward Curry
ICOM: A Framework for Integrated Collaborative Work Environments
ICOM: A Framework for Integrated Collaborative Work Environments
Laura Dragan
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Derilinx
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
Alexandre Passant
Semantic Desktop
Semantic Desktop
Laura Dragan
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
Edward Curry
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to Reporting
Rayhan Ferdous
Metadata Provenance
Metadata Provenance
Kai Eckert
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
Richard Cyganiak
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash

More Related Content

Similar to Omitola o rian_eswc_idts final (20)

Knowledge management on the desktop
Knowledge management on the desktop
Laura Dragan
Linked Open Data
Linked Open Data
Derilinx
Camp 4-data workshop presentation
Camp 4-data workshop presentation
Paolo Missier
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata Provenance
Kai Eckert
Provinance in scientific workflows in e science
Provinance in scientific workflows in e science
bdemchak
Challenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial Data
Edward Curry
Self-service Linked Government Data
Self-service Linked Government Data
Fadi Maali
Data and end-to-end Explainability (XAI,XEE)
Data and end-to-end Explainability (XAI,XEE)
Paolo Missier
Data Curation at the New York Times
Data Curation at the New York Times
Edward Curry
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Isuru Suriarachchi
Linked Data: opportunities and challenges
Linked Data: opportunities and challenges
Michael Hausenblas
Wikipedia (DBpedia): Crowdsourced Data Curation
Wikipedia (DBpedia): Crowdsourced Data Curation
Edward Curry
ICOM: A Framework for Integrated Collaborative Work Environments
ICOM: A Framework for Integrated Collaborative Work Environments
Laura Dragan
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Derilinx
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
Alexandre Passant
Semantic Desktop
Semantic Desktop
Laura Dragan
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
Edward Curry
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to Reporting
Rayhan Ferdous
Metadata Provenance
Metadata Provenance
Kai Eckert
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
Richard Cyganiak
Knowledge management on the desktop
Knowledge management on the desktop
Laura Dragan
Linked Open Data
Linked Open Data
Derilinx
Camp 4-data workshop presentation
Camp 4-data workshop presentation
Paolo Missier
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata Provenance
Kai Eckert
Provinance in scientific workflows in e science
Provinance in scientific workflows in e science
bdemchak
Challenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial Data
Edward Curry
Self-service Linked Government Data
Self-service Linked Government Data
Fadi Maali
Data and end-to-end Explainability (XAI,XEE)
Data and end-to-end Explainability (XAI,XEE)
Paolo Missier
Data Curation at the New York Times
Data Curation at the New York Times
Edward Curry
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Isuru Suriarachchi
Linked Data: opportunities and challenges
Linked Data: opportunities and challenges
Michael Hausenblas
Wikipedia (DBpedia): Crowdsourced Data Curation
Wikipedia (DBpedia): Crowdsourced Data Curation
Edward Curry
ICOM: A Framework for Integrated Collaborative Work Environments
ICOM: A Framework for Integrated Collaborative Work Environments
Laura Dragan
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Derilinx
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
Alexandre Passant
Semantic Desktop
Semantic Desktop
Laura Dragan
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
Edward Curry
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to Reporting
Rayhan Ferdous
Metadata Provenance
Metadata Provenance
Kai Eckert
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
Richard Cyganiak

Recently uploaded (20)

Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
Safe Software
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
ICT Frame Magazine Pvt. Ltd.
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
The Future of Data, AI, and AR: Innovation Inspired by You.pdf
Safe Software
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
ICT Frame Magazine Pvt. Ltd.
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
Ad

Omitola o rian_eswc_idts final

  • 1. Digital Enterprise Research Institute www.deri.ie Capturing interactive data transformation operations using provenance workflows Tope Omitola, Andre Freitas, Edward Curry, Sean O'Riain, Nicholas Gibbins and Nigel Shadbolt SWPM Workshop 28.05.2012, Herakleion, Crete Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2. Outline Digital Enterprise Research Institute www.deri.ie Motivation Interactive data transformations (IDTs) IDT & Provenance Modelling IDTs Provenance Representation Provenance Capture Case Study Conclusion
  • 3. Motivation Digital Enterprise Research Institute www.deri.ie Dataspaces: High number of heterogeneous data sources Complex data transformation environment Need for both repeatable data transformations and once- off transformations Traditional ETL approaches for data transformation/integration: Based on scripting/programming Focus on repeatable data transformation processes
  • 4. Interactive Data Transformation (IDTs) Digital Enterprise Research Institute www.deri.ie Based on user interaction paradigms for user creation of data transformations Explores GUI elements mapping to data transformation operations Instant feedback of each iteration Complementary to existing ETL tools Lower the barriers for non-programmers (reduces programming effort) of doing data transformations Example platforms: Google Refine, Potters Wheel, Wrangler
  • 5. Interactive Data Transformation (IDTs) Digital Enterprise Research Institute www.deri.ie
  • 6. Challenges Digital Enterprise Research Institute www.deri.ie How to model IDTs? Facilitating the reuse of previous IDTs Representing IDTs Provenance Making IDT platforms provenance-aware Enabling transportability across IDT and ETL platforms
  • 7. IDT & Provenance Digital Enterprise Research Institute www.deri.ie Provenance supports representation of interactive data transformations Output: a provenance descriptor which shows the relationship between the inputs, the outputs, and the applied transformation operations Both retrospective and prospective provenance
  • 8. IDT Digital Enterprise Research Institute www.deri.ie IDT model Formal model (Algebra for IDT) Provenance representation Provenance capture of IDTs
  • 9. IDT Model: Core Elements Digital Enterprise Research Institute www.deri.ie Schema and instance data Set of predefined operations GUI elements mapping to predefined operations User actions Operation selection Parameter selection Operation composition (workflow)
  • 10. IDT Model Digital Enterprise Research Institute www.deri.ie
  • 11. Formalizing the mapping from IDT to Provenance Digital Enterprise Research Institute www.deri.ie Definition 1: A provenance-based interactive data transformation engine, consists of a set of transformations (or activities) on a set of datasets generating outputs in the form of other datasets or events which may trigger further transformations Definition 2: An interactive data transformation event, consists of the input dataset, the output dataset(s), the applied transformation function, and the time the transformation took place
  • 12. Formalizing the mapping from IDT to Provenance Digital Enterprise Research Institute www.deri.ie Definition 3: A run is a function from time to dataset(s) and the transformation applied to those dataset(s) Definition 4: A trace is the sequence of pairs of a run and the time the run was made
  • 13. Provenance Representation Digital Enterprise Research Institute www.deri.ie Proposed in Representing Interoperable Provenance Descriptions for ETL Workflows Three-layered provenance model: Open Provenance Model Vocabulary Layer Cogs ETL Provenance Vocabulary Domain-Specific Model Layer Linked Data standards
  • 14. Provenance Capture Layers Digital Enterprise Research Institute www.deri.ie
  • 15. Provenance Event-Capture Sequence Flow Digital Enterprise Research Institute www.deri.ie
  • 16. Case study Digital Enterprise Research Institute www.deri.ie Implementation over the GR Platform Example descriptor @prefix grf: <http://127.0.0.1:3333/project/1402144365904/> . grf :MassCellChange-1092380975 rdf:type opmv:Process, cogs:ColumnOperation, cogs:Transformation; Mapping to the actual program cogs:operationName "MassCellChange"^^xsd:string; cogs:programUsed "com.google.refine.operations.cell.MassEditOperation"^^xsd:string; Process rdfs:label "Mass edit 1 cells in column ==List of winners=="^^xsd:string. grf:MassCellChange-1092380975/1_0 rdf:type opmv:Artifact ; Input Artifact rdfs:label "* '''1955 [[Meena Kumari]]'[[Parineeta (1953 film)|Parineeta]]''''' as '''Lolita'''"^^xsd:string. grf:MassCellChange-1092380975/1_1 rdf:type opmv:Artifact; Output Artifact rdfs:label "* '''John Wayne'''"^^xsd:string. Workflow structure grf:MassCellChange-1092380975/1_1 opmv:wasDerivedFrom grf:MassCellChange-1092380975/1_0. grf:MassCellChange-1092380975 opmv:used grf:MassCellChange-1092380975/1_0. grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedBy grf:MassCellChange-1092380975. grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedAt "2011-11-16T11:2:14"^xsd: dateTime.
  • 17. Conclusion Digital Enterprise Research Institute www.deri.ie The proposed approach provides low impact on the existing IDT process Provenance representation supports different data models Preliminary implementation of a Google Refine provenance extension