Keynote talk on reproducibility and replication of data analysis for AAAS workshop on "Reproduciblity in the Field Sciences", May 11-12, 2015, Washington, DC
1 of 35
Download to read offline
More Related Content
Ellison keynote - aaas workshop 2015
1. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Challenges for Reproducibility in the Field Sciences
or
Its dj vu all over again
Aaron M. Ellison
Harvard University, Harvard Forest
Founding Editor, Ecological Archives
Editor-in-Chief, Ecological Monographs
2. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
3. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
?LonSchleining
4. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
5. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
6. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
7. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
8. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
It will be observed, then, that our efforts are not merely to
accumulate as great a mass of animal remains as possible. On the
contrary, we are expending even more time than would be required
for the collection of specimens alone, in rendering what we do obtain
as permanently valuable as we know how, to the ecologist as well as
the systematist. . . . I wish to emphasize what I believe will ultimately
prove to be the greatest value of our museum. This value will not,
however, be realized until the lapse of many years, possibly a
century. . . the student of the future will have access to the original
record of faunal conditions. . .
Joseph Grinnell (1910) The methods and uses of a research museum. Popular Science Monthly 75: 163-169.
9. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
It is only upon the distribution of a database that its far-
reaching research, educational, and other socioeconomic values
are recognized. The contribution of any of these products to
scientific and technical knowledge might well assume a value far
greater than the costs of database production and dissemination.
NRC (1999) A question of balance: private rights and the public interest in scientific and technical databases
10. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
11. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Reproducibility: Key challenges
? Cultural
C US
C International
? Technological
C Data storage
C Descriptive metadata
C Process metadata (Provenance)
12. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Ecology and evolutionary biology stand virtually alone among the
environmental and environment-related sciences in the lack of
some agency- or community-mandated data archiving and data
sharing policy.
Porter & Callahan (1994) Circumventing a dilemma: historical approaches to data sharing in ecological research. In:
Environmental Information Management and Analysis: Ecosystem to Global Scales
13. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Ecology and evolutionary biology stand virtually alone among the
environmental and environment-related sciences in the lack of
some agency- or community-mandated data archiving and data
sharing policy.
Porter & Callahan (1994) Circumventing a dilemma: historical approaches to data sharing in ecological research. In:
Environmental Information Management and Analysis: Ecosystem to Global Scales
1907 Annie Alexander and Joseph Grinnell found the MVZ at Berkeley
1988-1991 ESA Sustainable Biosphere Initiative (Jane Lubchenco)
1994-1995 ESA Committee: Future of Long-term Ecological Data (Kay Gross)
1995-1996 ESA Special Committee: Communications in the Electronic Age (Rob Colwell)
1995-1996 ESA Special Committee: Data Sharing and Archiving (Steward Pickett/Aaron Ellison)
1998- Ecological Archives (Aaron Ellison/William Michener)
2001 Ecological Metadata Language, version 1.0
2005 LTER Data Policy approved: requires data archiving within two years of collection
2008 ILTER Data Policy approved: encourages data archiving commensurate with publication
2009 NEON Data Policy developed: open access to all NEON data, on request
2011 NSF Data Management Requirement implemented for proposals
2015 Ecological Metadata Language, version 2.1
14. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Data Code
Ecology (est. 1920) Available to SME on request Required (2014)
Ecological Applications (est. 1991) Required (2014) Required (2014)
Ecological Monographs (est. 1930) Required (2011) Required (2014)
Ecosphere (est. 2010) Available to SME on request Available to SME on request
Ecosystem Health & Sustainability (est. 2015) NO NO
Journal of Ecology (est. 1913) Required (2014) For computer models (2014)
Journal of Animal Ecology (est. 1932) Required (2014) For computer models (2014)
Journal of Applied Ecology (est. 1964) Required (2014) For computer models (2014)
Functional Ecology (est. 1987) Required (2014) For computer models (2014)
Methods in Ecology & Evolution (est. 2010) Required (2014) Encouraged
Oikos (est. 1949) Strongly encouraged (2015) Strongly encouraged (2015)
Ecography (Holarctic Ecology) (est. 1978) Strongly encouraged (2015) Strongly encouraged (2015)
Oecologia (est. 1968) Available to SME on request Available to SME on request
15. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
16. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
17. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
http://www.forestgeo.si.edu/
18. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Reproducibility: Key challenges
? Cultural
C US
C International
? Technological
C Data storage
C Descriptive metadata
C Process metadata (Provenance)
19. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
20. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
changes to the master.txt
Assigned 15 months to cellulose 1bx,3bx,6ad,6bd,7ac, and 3 months to
cellulose 7bd with missing month value. This means each subplot is
sampled in every time period.
Changed all "dc" subplots to "bc", since those were missing from all
plot 5.2008-2010
2014
2012
21. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
22. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
23. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
24. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
25. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
filled.contour3 <-
function (x = seq(0, 1, length.out = nrow(z)),
y = seq(0, 1, length.out = ncol(z)), z, xlim = range(x, finite = TRUE),
ylim = range(y, finite = TRUE), zlim = range(z, finite = TRUE),
levels = pretty(zlim, nlevels), nlevels = 20, color.palette = cm.colors,
col = color.palette(length(levels) - 1), plot.title, plot.axes,
key.title, key.axes, asp = NA, xaxs = "i", yaxs = "i", las = 1,
axes = TRUE, frame.plot = axes,mar, ...)
{
# modification by Ian Taylor of the filled.contour function
# to remove the key and facilitate overplotting with contour()
# further modified by Carey McGilliard and Bridget Ferris
# to allow multiple plots on one page
if (missing(z)) {
if (!missing(x)) {
if (is.list(x)) {
z <- x$z
y <- x$y
x <- x$x
}
else {
z <- x
x <- seq.int(0, 1, length.out = nrow(z))
}
}
else stop("no 'z' matrix specified")
}
else if (is.list(x)) {
y <- x$y
x <- x$x
}
if (any(diff(x) <= 0) || any(diff(y) <= 0))
stop("increasing 'x' and 'y' values expected")
# mar.orig <- (par.orig <- par(c("mar", "las", "mfrow")))$mar
# on.exit(par(par.orig))
# w <- (3 + mar.orig[2]) * par("csi") * 2.54
# par(las = las)
# mar <- mar.orig
plot.new()
# par(mar=mar)
plot.window(xlim, ylim, "", xaxs = xaxs, yaxs = yaxs, asp = asp)
if (!is.matrix(z) || nrow(z) <= 1 || ncol(z) <= 1)
stop("no proper 'z' matrix specified")
if (!is.double(z))
storage.mode(z) <- "double"
.Internal(filledcontour(as.double(x), as.double(y), z, as.double(levels),
col = col))
#AME 1/15/2014: in R 3.0, should be
# .filled.contour(as.double(x), as.double(y), z, as.double(levels),
# col = col)
if (missing(plot.axes)) {
if (axes) {
title(main = "", xlab = "", ylab = "")
Axis(x, side = 1)
Axis(y, side = 2)
}
}
else plot.axes
if (frame.plot)
box()
if (missing(plot.title))
title(...)
else plot.title
invisible()
}
This happens if you use a non-standard API.
You are allowed to do that, but cannot expect
that it is maintained.
The C code underlying base graphics has been
migrated to the graphics package (and hence
no longer uses .Internal() calls).
26. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
27. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
28. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
29. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
30. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
From R Scripts to Provenance Graphs
DDG
Explorer
Textual
DDG
R
Script
Instrumented
R Script
RData
Tracker
R
Interpreter
DDG
Database
Visual
DDG
Instrumented
by scientist
Legend
R Scripts
R Environment
Provenance
31. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
32. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Ecology and evolutionary biology stand virtually alone among the
environmental and environment-related sciences in the lack of
some agency- or community-mandated data archiving and data
sharing policy.
Porter & Callahan (1994) Circumventing a dilemma: historical approaches to data sharing in ecological research. In:
Environmental Information Management and Analysis: Ecosystem to Global Scales
1907 Annie Alexander and Joseph Grinnell found the MVZ at Berkeley
1988-1991 ESA Sustainable Biosphere Initiative (Jane Lubchenco)
1994-1995 ESA Committee: Future of Long-term Ecological Data (Kay Gross)
1995-1996 ESA Special Committee: Communications in the Electronic Age (Rob Colwell)
1995-1996 ESA Special Committee: Data Sharing and Archiving (Steward Pickett/Aaron Ellison)
1998- Ecological Archives (Aaron Ellison/William Michener)
2001 Ecological Metadata Language, version 1.0
2005 LTER Data Policy approved: requires data archiving within two years of collection
2009 NEON Data Policy developed: open access to all NEON data, on request
2011 NSF Data Management Requirement implemented for proposals
2015 Ecological Metadata Language, version 2.1
???
33. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Goal:
Create reproducible science by documenting data
provenance: processes used to create, modify, visualize,
analyze, and synthesize data
Challenges:
Standard tools (e.g., R) do not collect provenance
Specialized tools (e.g., Kepler) have steep learning curve
Computer scientists are interested in control flow, data flow,
abstraction; ecologists are interested in other things
Lack of community standards
How much information to collect, manage, store, and use
34. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison
Reproducible
Traceable
Usable
Comparable
35. 11-12 May 2015 AAAS Workshop on Reproducibility in the Field Sciences ? 2015 Aaron M. Ellison