ºÝºÝߣs on Taverna www.tvaerna.org.uk from the talk given at STFC/NERC workshop "Workflow approaches to investigation of biological complexity", 15-16 October 2013.
1 of 17
Download to read offline
More Related Content
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
1. Taverna workflows: provenance
and reproducibility
Aleksandra Pawlik
The University of Manchester
Workflow approaches to investigation of biological complexity
STFC/NERC Workshop 15-16 October 2013
2. Workflows for improvement
Workflows are more than just
pipelines¡
?Scaling
up automated execution
?Bringing together distributed and
continually changing resources
?Dealing with different standards,
interfaces and implementation
?Support for repeatable analysis
3. Taverna Engine Execution
Workflows in Scufl2
? Functional dataflow, simple control flows, implicit
?
iteration
Linking services and tools
? Different data resources and formats
? ¡°In Workflow Programming¡± (eg. Beanshell scripting)
? Provenance collection: W3C PROV-O, OPM
? Plug-in Framework
?
?
?
?
Infrastructures: Web Services (SOAP, REST), Grid, HPC
Common Tools: Excel Spreadsheets, Google Refine, R
OAuth security plug-in
4. Taverna Workbench
? Customizable for domains (eg. expose services only for
biodiversity)
? Desktop application
? Intermediate results views
? Plug-in framework
List of services
Workflow engine
to run workflows
Construct and
visualise workflows
5. Taverna User Spectrum
Taverna
Concept Knowledge
Workflow
Engineer
Workbench
Computational
Scientist
Workbench
Components
Lite
Domain
Domain
Scientist
(Workflow User)
Domain-Specific
Website / Tool / Portal
Player
High
Workflow Visibility
Low
6. reuse
?
?
Right apps, right users
Commodity apps:
?
?
?
?
?
?
?
?
?
Web. Spreadsheets. R.
Customisation
Mixed workflow / scripting
Deployment / Portability
?
Apps
Apps
Web based / desktop
Virtualised deployments
Cloud hosted service
A cloud-enabled local host
Local ownership
Capability building
Workflow
Workflow
WFMS
WFMS
middleware
middleware
Infrastructure
Infrastructure
Domain/task specific apps that
incorporate (an ecosystem of)
workflows. Integrate
Parameterised, integrative, multi-step
(data) pipelines, analytics,
computational protocols. Can be
repetitively reused.
Support design, config. and
execution of workflows. manage
utility actions for data, logging,
security, compute, error. Shield
incompatibilities & complexity.
Legacy, others and your own software,
datasets, services, codes, and
platforms.
Optimise and manage use of
computing infrastructure.
9. Taverna Components
Workflow Blocks made of a
workflow
? Well
described
? Well behaved
? Well looked after
? Agreed fail
? Agreed formats in and out
? Agreed provenance
Deposited in myExperiment
Grouped into families
10. Provenance: how did you do it?
? The
link between
computation and
results
d1
? Reporting
at
different scales/
levels
d2
S1
S0
S1
w
-> Using
Provenance
d1'
S0
? Collecting
d2
z
w
S2
S'2
y
y'
S4
S4
df
df'
(i) Trace A
(ii) Trace B
PDIFF: comparing provenance traces to
diagnose divergence across experimental
results [Woodman et al, 2011]
13. The Taverna Suite of Tools
Workflow
Repository
User Interfaces
Workbench
Service Catalogue
Workflow Engine
Workflow
Provenance
Activity and
Service Plug-in
Manager
Taverna
Lite
Workflow
Server
Web Portals / Gateways
Client User Interfaces
Third Party Tools
Player
Virtual
Machine
Workflow
Components
Command
Interaction
Line
Server
Prog
APIs
14. Sustainability and user support
Freely available
Open source
Current version 2.4
80,000+ downloads
across version
Windows/Mac OS X/
Linux/Unix
Tutorials and Workshops
Active user forum
& support
www.taverna.org.uk
15. Taverna in other projects
BioDiversity Virtual e-Laboratory
www.biovel.eu
SCAPE
www.scape-project.eu
Wf4Ever
www.wf4ever-project.org
VPH-Share
www.vph-share.eu
HELIO
www.helio-vo.eu
iPlant Collaborative
www.iplantcollaborative.or
g
HELIO
www.helio-vo.eu
Pacific Northwest
National Laboratory
www.pnnl.gov
KBase
www.kbase.us
Scientific Workflows and
Provenance Working Group
www.dataone.org
SHIWA
www.shiwa-workflow.eu
16. Products
Methods
Data-centric Computation
Scientific workflows over
Distributed Cyber-Infrastructure.
Data sharing
libraries and catalogues for all
types of scientific artefacts and all
types of scientists.
Knowledge Management
Metadata, semantics digital
exchange, preservation,
publishing
Software Engineering
Software sustainability,
software and data policy,
training
#2: {"11":"http://purl.org/wf4ever/model\nResearch Objects (RO) aggregate related resources, their provenance and annotations\nConveys ¡°everything you need to know¡± about a study/experiment/analysis/dataset/workflow\nShareable, evolvable, contributable, citable ROs have their own provenance and lifecycles\n","12":"Hosted resource ¨C no installation tears\nSelf-hosting distribution ¨C locality fears \nServices and/or workflow engine hosted locally or remotely\nHPC/cloud installations avoid cost of local installations on local infrastructures. Some like the comfort of local ownership.\nDeployment Infrastructure of BioVeL\n","1":"Title: Time well spent: Workflows for Environmental Omic Analysis.\nThe contextual analysis of Environmental Omics data is computationally intensive (involving the processing and management of large quantities of data), highly integrative (spanning data from many different disciplines) and rapidly evolving (involving the continuous development of novel methods and technologies). This poses a number of challenges for researchers in the field, including access to appropriate infrastructure, taking advantage of recent advancements and communicating research activities.\nScientific Workflow Management Systems, such as the Taverna Workflow Suite, are a particular class of computer application that manage the design, configuration and execution of repetitive, multi-step analysis processes that are particularly prevalent in Environmental Omics. The system handles the awkward work of accessing the different software and platforms, managing the data and security, handling errors and documenting the process.\nUtilising HPC or cloud installations of Taverna also means that there is no requirement to install tools and data sources locally, which reduces local infrastructure and maintenance costs and enables rapid workflow development and testing. Consequently, large-scale analyses can be performed regardless of local infrastructure.\n?The Taverna Workflow Suite is currently powering the Biodiversity Virtual eLaboratory project (www.biovel.eu), the project is beginning to release a number of useful Environmental Omic workflows in collaboration with Genomic Observatories (http://genomicobservatories.blogspot.co.uk/) and MicroB3 (Ocean Sampling Day. http://www.microb3.eu/news/new-axis-collaboration-biovel-workflows-micro-b3-ocean-sampling-day).\nThis talk will discuss aspects of workflows and the benefits that adopting workflows as an integral part of Environmental Omics analysis can offer to the community including, reproducibility, knowledge exchange and easier access to high performance infrastructure.\n","14":"2001, run by manchester and oxford\n"}