1) The document discusses data provenance in phylogenetics, including a survey of producers' and consumers' attitudes towards different metadata types.
2) It finds that half of all metadata types are critically important to two or more subfields, but the majority are also easy for producers to provide across subfields.
3) Balancing the needs of producers and consumers of phylogenetic data is an ongoing challenge, and computational data provenance is emerging as a new area that could help address this.
1 of 18
Download to read offline
More Related Content
Phylogenetics & Data Provenance: Survey Results
1. Data Provenance for
Phyloinformatics:
Introduction & Survey Results
Elliott Hauser
UNC Information Science
Karen Cranston
NESCent Informatics
3. What is Phylogenetic Data?
...many things!
Source: DRAFT: Current Best Practices for Publishing Trees Electronically, 2010. Stoltzfus et al.
http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/LinkingTrees2010
4. What is Phylogenetic Data?
<A sample NeXML file>
Source: http://github.com/miapa/miapa-etl/tree/master/nexmlex
5. What is a
Minimum Information Standard?
The answer to this question, for a domain:
"What is the minimum information necessary
for an independent scientist to carry out an
independent analysis of the data?"
Quackenbush, 2005
For Phylogenetics, this is MIAPA:
Minimum Information About a Phylogenetic Analysis
8. Overview:
Producers' and Consumers' attitudes
Most important
metadata type
Least important
metadata type
Source: Cranston MIAPA survey, 2012 (unpublished)
9. Half of all metadata types are
critically important to two+ subfields
Source: Cranston MIAPA survey, 2012 (unpublished)
10. The majority of metadata types are
easy to produce for all subfields
Source: Cranston MIAPA survey, 2012 (unpublished)
11. How to balance the needs of
Producers and Consumers?
Most important
metadata type
Least important
metadata type
Source: Cranston MIAPA survey, 2012 (unpublished)
12. Metadata at work:
The Open Tree of Life Project
Conflicting Data, Conflicting Needs:
¡ñ A Single, 'Best' Tree of Life
¡ñ Access to Underlying, Conflicting Trees
13. A new research area:
Computational data provenance
...Huh?
14. A new research area:
Computational data provenance
Computational: The result of a computation
Data provenance: Where/how it came to be
As science becomes more and more
computational, we need to know more about
our data!
16. Discussion
Will our survey results predict actual behavior?
What tools, if any, will preserve and encourage
submission of computational data provenance?
Is computational data different from measurement
data, classification data, or other types of
metadata? If so, does that affect our work?