際際滷

際際滷Share a Scribd company logo
Stephen D. Turner, Ph.D.
Bioinformatics Core Director
University of Virginia School of Medicine
bioinformatics.virginia.edu
@strnr
Tools for Improving Rigor &
Reproducibility in Bioinformatics
際際滷s: bit.ly/madssci2018repro
We Are in the Middle of a
New Movement in Genomics
 Genomics/bioinformatics advancing at grueling pace
- New questions
- New study designs
- New technologies, new [鍖ll-in-the-blank]-seq
 New movements have:
- Leaders / method developers / early adopters
- First followers
- Everybody else
 New technology leads to more reproducibility issues
2
CORES!
Reproducibility is hard!
 Genomics data is too large and high
dimensional to easily inspect or visualize.
 Work鍖ows involve multiple steps and it's hard
to inspect every step.
 Unlike in the wet lab, we don't always know
what to expect of our genomics data analysis.
 It can be hard to distinguish good from bad
results.
3
4
Reproducibility:
What's in it for you?
 Your future self will thank you
- Re-running analysis with different parameters
- Re-running analysis with new data
- Documentation
 Faster/cheaper
- Modular work鍖ows
- Reusable code chunks
 Makes collaboration with others easier
5
"Robust research is about doing small things that
stack the deck in your favor to prevent mistakes."
Vince Buffalo, author of Bioinformatics Data Skills (2015).
Obstacles to Reproducibility
1. Bioinformatics software
2. Pipeline / work鍖ow management
3. Documentation
4. Data / code sharing
6
A non-comprehensive list of
Bioinformatics
Software
7
Bioinformatics Software
 Bioinformatics software implements complex algorithms.
- Dozens of parameters, endless permutations
- Defaults not always optimal
 Perception:
8
ACACTCGCATCCGCACATCGCACTA
GGTCAGCATACGCCGACTCCGACCG
GCGCTATCGCCAGCGGAAATCGCAA
Bioinformatics Software
 Bioinformatics software implements complex algorithms.
- Dozens of parameters, endless permutations
- Defaults not always optimal
 Reality: Software is written by smart people, but:
- Not software engineers
- Not using good practice (version control,
modularization, commentary, testing)
- Unable to offer long-term 
maintenance / support
- Focus on graduating / 
publishing, not support
- Not always easily available
9
 Missing or incomplete documentation
 Distribution is missing files
 Missing third party package
 Dependencies failed to build
 Runtime error
 Internal compiler error
 My last week:
- samtools: error while loading shared libraries:
libbz2.so.1.0: cannot open shared object file
- error while loading shared libraries: libz.so.1:
failed to map segment from shared object:
Operation not permitted
- /lib64/libc.so.6: version `GLIBC_2.14' not found
11
https://twitter.com/ianholmes/status/288689712636493824
Package managers
12
Mac OS WindowsLinux
apt-get
yum
homebrew
macports
?????
?????
Cross-platform
Conda
 Cross-platform package manager: Win, Mac, Linux
 Language agnostic (can be used to install C/C++,
Fortran, Go, R, Python, Perl, Java, etc.).
 User-installable  no admin/root privileges needed.
 Describes packages with a recipe de鍖ning
dependencies and a build script that installs.
 Channels: conda provides many common packages by
default. Additional channels add more.
 Isolated environments
- Versions and tools can be managed per-project
- No con鍖icts or version incompatibility
- Environments can be shared via simple text 鍖les
13
Conda: Main commands
 conda create -n <environment>
 source activate <environment>
 conda search <package>
 conda install <package>
 conda upgrade <package>
 conda uninstall <package>
14
Conda: example
 Create a new environment named madssci:
conda create -n madssci
 Activate that environment
source activate madssci
 Install some packages
conda install blast bioconductor-flowcore
 Install a particular version
conda install samtools=0.1.19
15
Bioconda
 bioconda.github.io
 Bioconda is a channel for the conda package manager
 Repository for more than 3,000 bioinformatics
packages ready to use with conda install
 >250 contributors have added/updated recipes
 Preprint: Gr端ning, Bj旦rn, et al. "Bioconda: A sustainable
and comprehensive software distribution for the life
sciences."bioRxiv(2017): 207092.
https://www.biorxiv.org/content/early/2017/10/27/207092
 See also: "Nature TechBlog: Bioconda Promises to Ease
Bioinformatics Software Installation Woes" 
http://blogs.nature.com/naturejobs/2017/11/03/techblog-bioconda-
promises-to-ease-bioinformatics-software-installation-woes/
16
Docker
 docker.com
 Lightweight virtualization technology
 Package software with all of its dependencies into an isolated "container"
 Containers have everything needed to run: code, system tools & libraries
 Like VMs: portable. = reproducibility!
 Unlike VMs: containers virtualize the OS instead of the hardware. = More
ef鍖cient, more portable. Near native performance, instant startup, small
images. Easy to share.
 https://www.docker.com/what-container
 https://blog.docker.com/2016/03/containers-are-not-vms/
17
Containers are an abstraction at
the app layer that packages code
and dependencies together.
Multiple containers can run on the
same machine and share the OS
kernel with other containers, each
running as isolated processes in
user space. Containers take up
less space than VMs (container
images are typically tens of MBs in
size), and start almost instantly.
Virtual machines (VMs) are an
abstraction of physical hardware
turning one server into many
servers. The hypervisor allows
multiple VMs to run on a single
machine. Each VM includes a full
copy of an operating system, one
or more apps, necessary binaries
and libraries - taking up tens of
GBs. VMs can also be slow to
boot.
Pipeline / work鍖ow
management
18
Pipeline / Workflow Management
 Bioinformatics data analysis: series of steps
involving many different programs tied together
with 鍖le-based inputs and outputs. E.g.:
19
Pipeline / Workflow Management
 Simple solution: simple (bash) script
- List of commands
- Pros: quick, easy, portable, universal
- Cons: not scalable, no re-entry / partial execution,
assumes dependency availability, dif鍖cult / no
parallelization
 Work鍖ow management systems
- Make (installed on most systems)
- Snakemake
- Next鍖ow
- Galaxy
- Many more: github.com/pditommaso/awesome-pipeline
20
Nextflow
 next鍖ow.io
 Di Tommaso, Paolo, et al. "Next鍖ow enables reproducible
computational work鍖ows." Nature biotechnology 35.4
(2017): 316-319.
 Features:
- Free
- Actively developed
- Supports docker containers
- Easy parallelization, implicitly de鍖ned
- Continuous checkpointing & resumed execution
- Easily portable across architectures (SGE, LSF,
SLURM, PBS, Amazon AWS, ...).
21
Beware of Pipelineitis
 Pipelines can kill your creativity and force
you to think too rigidly.
 Dont pipeline too early, if at all.
 Does it even need to be pipeline-i鍖ed?
 Whos running it?
- You, once: dont pipeline-ify. Document, move along.
- You, 2-5 times: documented script?
- You, 10+ times: consider pipeline-ifying.
- Others: create sharable pipeline
 See: Loman & Watson. "So you want to be a
computational biologist?" Nat Biotechnol 31
(2013): 996-998.
22
Documentation
23
Dynamic Documentation: RMarkdown
 R: widely used for data science & bioinformatics
 Markdown: a simple markup language that allows you to
render structured/formatted documents from plain text.
 RMarkdown: embeds R code in a Markdown
document.
- Write documents that execute embedded code and
integrates results into the 鍖nal report.
- Allows you to keep code and documentation together.
- Easily re-render the document, re-running analysis
and re-incorporating results on the 鍖y.
- Many output formats: PDF, DOCX, HTML, EPUB, ...
24
25
Write plain text document
Embed R code
Rendered output report
26
output: pdf_document output: word_document
Jupyter notebooks
 jupyter.org
 Jupyter: open source project to
develop software, standards,
services across many languages
 Jupyter notebook: free
application to create documents
containing live code,
visualizations, narrative text.
- Supports >40 programming
languages
- Easily shared
- Interactive output
- Multi-user versions for
companies, classrooms, labs
27
Data / code sharing
28
Sharing Code
 State of the art early-2000's
- "Data/code available upon request"
- "Code available on <lab website>"
- None of the above
 Schultheiss, Sebastian J., et al. "Persistence and
availability of web services in computational
biology."PloS one6.9 (2011): e24914.
- Surveyed ~1000 web services published in NAR
2003-2009
- ~30% unavailable
- ~80% developed by students / non-permanent
researcher
 Russell, Pamela H., et al. "A large-scale analysis of
bioinformatics code on GitHub."bioRxiv(2018):
321919.
 github.com is becoming the de facto standard for
archiving and sharing code
29
Sharing any research output
30
- 鍖gshare.com
- Free
- Upload any 鍖le format
- Get a DOI
- 5 GB max 鍖le size
- 20GB private space
- Unlimited public space
- Launched 2012
- Hosted on S3, multiple
redundant copies
- SLA: 10 yr persistence
- zenodo.org
- Free
- Upload any 鍖le format
- Get a DOI
- 50 GB per record
- Higher quota by request
- Unlimited records
- Launched 2013
- Hosted at CERN (est
1954), with de鍖ned
program of 20 years
- about.zenodo.org/policies/
- about.zenodo.org/principles/
- osf.io
- Free
- Upload any 鍖le format
- Get a DOI
- 5 GB per 鍖le
- Connect to any external
storage provider
- Launched 2013
- Preservation fund
guaranteeing 50+ years
of persistent availability
- osf.io/faq
31
doi.org/10.5281/zenodo.1255003
bit.ly/madssci2018repro
際際滷s:
Other Resources
32
Wilson, et al. "Good enough practices
in scientific computing." PLoS
computational biology 13.6 (2017):
e1005510.
Wilson, et al. "Best practices for
scientific computing." PLoS
biology 12.1 (2014): e1001745.
https://doi.org/10.1371/journal.pbio.1001745
https://doi.org/10.1371/journal.pcbi.1005510
Other Resources
 2017: Ten simple rules for making research software more robust: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005412
 2017: Ten simple rules for responsible big data research: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005399
 2017: Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005278
 2016: Ten Simple Rules for Digital Data Storage: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005097
 2016: Ten Simple Rules for Effective Statistical Practice: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961
 2015: Ten Simple Rules for Creating a Good Data Management Plan: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004525
 2015: Ten Simple Rules for Experiments Provenance: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004384
 2015: Ten Simple Rules for a Computational Biologists Laboratory Notebook: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004385
 2015: Ten Simple Rules for Reducing Overoptimistic Reporting in Methodological Computational Research: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004191
 2014: Ten Simple Rules for the Care and Feeding of Scienti鍖c Data: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003542
 2014: Ten Simple Rules for Effective Computational Research: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003506
 2013: Ten Simple Rules for Reproducible Computational Research: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
 2012: Ten Simple Rules for the Open Development of Scienti鍖c Software: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002802
 2014: Ten Simple Rules for Writing a PLOS Ten Simple Rules Article: 
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003858
33
collections.plos.org/
ten-simple-rules
Other Resources
 Baker, Monya. 1,500 Scientists Lift the Lid on Reproducibility. Nature News, vol.
533, no. 7604, May 2016, p. 452. www.nature.com, doi:10.1038/533452a.
 Gr端ning, Bj旦rn, et al. Practical Computational Reproducibility in the Life Sciences.
BioRxiv, Oct. 2017, p. 200683. www.biorxiv.org, doi:10.1101/200683.
 Leek, Jeff. "A Few Things That Would Reduce Stress around Reproducibility/
Replicability in Science." Simply Statistics, November 2017: https://simplystatistics.org/
2017/11/21/rr-sress/.
 Mesirov, Jill P. Accessible Reproducible Research. Science, vol. 327, no. 5964, Jan.
2010, pp. 41516. science.sciencemag.org, doi:10.1126/science.1179653.
 Munaf嘆, Marcus R., et al. A Manifesto for Reproducible Science. Nature Human
Behaviour, vol. 1, no. 1, Jan. 2017, p. 0021. www.nature.com, doi:10.1038/
s41562-016-0021.
 Patil, Prasad, et al. A Statistical De鍖nition for Reproducibility and Replicability.
BioRxiv, July 2016, p. 066803. www.biorxiv.org, doi:10.1101/066803.
 Russell, Pamela, et al. A Large-Scale Analysis of Bioinformatics Code on GitHub.
BioRxiv, May 2018, p. 321919. www.biorxiv.org, doi:10.1101/321919.
 Schultheiss, Sebastian J., et al. Persistence and Availability of Web Services in
Computational Biology. PLOS ONE, vol. 6, no. 9, Sept. 2011, p. e24914. PLoS
Journals, doi:10.1371/journal.pone.0024914.
34
Stephen D. Turner, Ph.D.
Bioinformatics Core Director
University of Virginia School of Medicine
bioinformatics.virginia.edu
@strnr
THANKYOU
bit.ly/madssci2018repro
doi.org/10.5281/zenodo.1255003

More Related Content

2018 ABRF Tools for improving rigor and reproducibility in bioinformatics

  • 1. Stephen D. Turner, Ph.D. Bioinformatics Core Director University of Virginia School of Medicine bioinformatics.virginia.edu @strnr Tools for Improving Rigor & Reproducibility in Bioinformatics 際際滷s: bit.ly/madssci2018repro
  • 2. We Are in the Middle of a New Movement in Genomics Genomics/bioinformatics advancing at grueling pace - New questions - New study designs - New technologies, new [鍖ll-in-the-blank]-seq New movements have: - Leaders / method developers / early adopters - First followers - Everybody else New technology leads to more reproducibility issues 2 CORES!
  • 3. Reproducibility is hard! Genomics data is too large and high dimensional to easily inspect or visualize. Work鍖ows involve multiple steps and it's hard to inspect every step. Unlike in the wet lab, we don't always know what to expect of our genomics data analysis. It can be hard to distinguish good from bad results. 3
  • 4. 4
  • 5. Reproducibility: What's in it for you? Your future self will thank you - Re-running analysis with different parameters - Re-running analysis with new data - Documentation Faster/cheaper - Modular work鍖ows - Reusable code chunks Makes collaboration with others easier 5 "Robust research is about doing small things that stack the deck in your favor to prevent mistakes." Vince Buffalo, author of Bioinformatics Data Skills (2015).
  • 6. Obstacles to Reproducibility 1. Bioinformatics software 2. Pipeline / work鍖ow management 3. Documentation 4. Data / code sharing 6 A non-comprehensive list of
  • 8. Bioinformatics Software Bioinformatics software implements complex algorithms. - Dozens of parameters, endless permutations - Defaults not always optimal Perception: 8 ACACTCGCATCCGCACATCGCACTA GGTCAGCATACGCCGACTCCGACCG GCGCTATCGCCAGCGGAAATCGCAA
  • 9. Bioinformatics Software Bioinformatics software implements complex algorithms. - Dozens of parameters, endless permutations - Defaults not always optimal Reality: Software is written by smart people, but: - Not software engineers - Not using good practice (version control, modularization, commentary, testing) - Unable to offer long-term maintenance / support - Focus on graduating / publishing, not support - Not always easily available 9
  • 10. Missing or incomplete documentation Distribution is missing files Missing third party package Dependencies failed to build Runtime error Internal compiler error My last week: - samtools: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file - error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted - /lib64/libc.so.6: version `GLIBC_2.14' not found
  • 12. Package managers 12 Mac OS WindowsLinux apt-get yum homebrew macports ????? ????? Cross-platform
  • 13. Conda Cross-platform package manager: Win, Mac, Linux Language agnostic (can be used to install C/C++, Fortran, Go, R, Python, Perl, Java, etc.). User-installable no admin/root privileges needed. Describes packages with a recipe de鍖ning dependencies and a build script that installs. Channels: conda provides many common packages by default. Additional channels add more. Isolated environments - Versions and tools can be managed per-project - No con鍖icts or version incompatibility - Environments can be shared via simple text 鍖les 13
  • 14. Conda: Main commands conda create -n <environment> source activate <environment> conda search <package> conda install <package> conda upgrade <package> conda uninstall <package> 14
  • 15. Conda: example Create a new environment named madssci: conda create -n madssci Activate that environment source activate madssci Install some packages conda install blast bioconductor-flowcore Install a particular version conda install samtools=0.1.19 15
  • 16. Bioconda bioconda.github.io Bioconda is a channel for the conda package manager Repository for more than 3,000 bioinformatics packages ready to use with conda install >250 contributors have added/updated recipes Preprint: Gr端ning, Bj旦rn, et al. "Bioconda: A sustainable and comprehensive software distribution for the life sciences."bioRxiv(2017): 207092. https://www.biorxiv.org/content/early/2017/10/27/207092 See also: "Nature TechBlog: Bioconda Promises to Ease Bioinformatics Software Installation Woes" http://blogs.nature.com/naturejobs/2017/11/03/techblog-bioconda- promises-to-ease-bioinformatics-software-installation-woes/ 16
  • 17. Docker docker.com Lightweight virtualization technology Package software with all of its dependencies into an isolated "container" Containers have everything needed to run: code, system tools & libraries Like VMs: portable. = reproducibility! Unlike VMs: containers virtualize the OS instead of the hardware. = More ef鍖cient, more portable. Near native performance, instant startup, small images. Easy to share. https://www.docker.com/what-container https://blog.docker.com/2016/03/containers-are-not-vms/ 17 Containers are an abstraction at the app layer that packages code and dependencies together. Multiple containers can run on the same machine and share the OS kernel with other containers, each running as isolated processes in user space. Containers take up less space than VMs (container images are typically tens of MBs in size), and start almost instantly. Virtual machines (VMs) are an abstraction of physical hardware turning one server into many servers. The hypervisor allows multiple VMs to run on a single machine. Each VM includes a full copy of an operating system, one or more apps, necessary binaries and libraries - taking up tens of GBs. VMs can also be slow to boot.
  • 19. Pipeline / Workflow Management Bioinformatics data analysis: series of steps involving many different programs tied together with 鍖le-based inputs and outputs. E.g.: 19
  • 20. Pipeline / Workflow Management Simple solution: simple (bash) script - List of commands - Pros: quick, easy, portable, universal - Cons: not scalable, no re-entry / partial execution, assumes dependency availability, dif鍖cult / no parallelization Work鍖ow management systems - Make (installed on most systems) - Snakemake - Next鍖ow - Galaxy - Many more: github.com/pditommaso/awesome-pipeline 20
  • 21. Nextflow next鍖ow.io Di Tommaso, Paolo, et al. "Next鍖ow enables reproducible computational work鍖ows." Nature biotechnology 35.4 (2017): 316-319. Features: - Free - Actively developed - Supports docker containers - Easy parallelization, implicitly de鍖ned - Continuous checkpointing & resumed execution - Easily portable across architectures (SGE, LSF, SLURM, PBS, Amazon AWS, ...). 21
  • 22. Beware of Pipelineitis Pipelines can kill your creativity and force you to think too rigidly. Dont pipeline too early, if at all. Does it even need to be pipeline-i鍖ed? Whos running it? - You, once: dont pipeline-ify. Document, move along. - You, 2-5 times: documented script? - You, 10+ times: consider pipeline-ifying. - Others: create sharable pipeline See: Loman & Watson. "So you want to be a computational biologist?" Nat Biotechnol 31 (2013): 996-998. 22
  • 24. Dynamic Documentation: RMarkdown R: widely used for data science & bioinformatics Markdown: a simple markup language that allows you to render structured/formatted documents from plain text. RMarkdown: embeds R code in a Markdown document. - Write documents that execute embedded code and integrates results into the 鍖nal report. - Allows you to keep code and documentation together. - Easily re-render the document, re-running analysis and re-incorporating results on the 鍖y. - Many output formats: PDF, DOCX, HTML, EPUB, ... 24
  • 25. 25 Write plain text document Embed R code Rendered output report
  • 27. Jupyter notebooks jupyter.org Jupyter: open source project to develop software, standards, services across many languages Jupyter notebook: free application to create documents containing live code, visualizations, narrative text. - Supports >40 programming languages - Easily shared - Interactive output - Multi-user versions for companies, classrooms, labs 27
  • 28. Data / code sharing 28
  • 29. Sharing Code State of the art early-2000's - "Data/code available upon request" - "Code available on <lab website>" - None of the above Schultheiss, Sebastian J., et al. "Persistence and availability of web services in computational biology."PloS one6.9 (2011): e24914. - Surveyed ~1000 web services published in NAR 2003-2009 - ~30% unavailable - ~80% developed by students / non-permanent researcher Russell, Pamela H., et al. "A large-scale analysis of bioinformatics code on GitHub."bioRxiv(2018): 321919. github.com is becoming the de facto standard for archiving and sharing code 29
  • 30. Sharing any research output 30 - 鍖gshare.com - Free - Upload any 鍖le format - Get a DOI - 5 GB max 鍖le size - 20GB private space - Unlimited public space - Launched 2012 - Hosted on S3, multiple redundant copies - SLA: 10 yr persistence - zenodo.org - Free - Upload any 鍖le format - Get a DOI - 50 GB per record - Higher quota by request - Unlimited records - Launched 2013 - Hosted at CERN (est 1954), with de鍖ned program of 20 years - about.zenodo.org/policies/ - about.zenodo.org/principles/ - osf.io - Free - Upload any 鍖le format - Get a DOI - 5 GB per 鍖le - Connect to any external storage provider - Launched 2013 - Preservation fund guaranteeing 50+ years of persistent availability - osf.io/faq
  • 32. Other Resources 32 Wilson, et al. "Good enough practices in scientific computing." PLoS computational biology 13.6 (2017): e1005510. Wilson, et al. "Best practices for scientific computing." PLoS biology 12.1 (2014): e1001745. https://doi.org/10.1371/journal.pbio.1001745 https://doi.org/10.1371/journal.pcbi.1005510
  • 33. Other Resources 2017: Ten simple rules for making research software more robust: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005412 2017: Ten simple rules for responsible big data research: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005399 2017: Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005278 2016: Ten Simple Rules for Digital Data Storage: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005097 2016: Ten Simple Rules for Effective Statistical Practice: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961 2015: Ten Simple Rules for Creating a Good Data Management Plan: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004525 2015: Ten Simple Rules for Experiments Provenance: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004384 2015: Ten Simple Rules for a Computational Biologists Laboratory Notebook: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004385 2015: Ten Simple Rules for Reducing Overoptimistic Reporting in Methodological Computational Research: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004191 2014: Ten Simple Rules for the Care and Feeding of Scienti鍖c Data: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003542 2014: Ten Simple Rules for Effective Computational Research: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003506 2013: Ten Simple Rules for Reproducible Computational Research: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285 2012: Ten Simple Rules for the Open Development of Scienti鍖c Software: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002802 2014: Ten Simple Rules for Writing a PLOS Ten Simple Rules Article: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003858 33 collections.plos.org/ ten-simple-rules
  • 34. Other Resources Baker, Monya. 1,500 Scientists Lift the Lid on Reproducibility. Nature News, vol. 533, no. 7604, May 2016, p. 452. www.nature.com, doi:10.1038/533452a. Gr端ning, Bj旦rn, et al. Practical Computational Reproducibility in the Life Sciences. BioRxiv, Oct. 2017, p. 200683. www.biorxiv.org, doi:10.1101/200683. Leek, Jeff. "A Few Things That Would Reduce Stress around Reproducibility/ Replicability in Science." Simply Statistics, November 2017: https://simplystatistics.org/ 2017/11/21/rr-sress/. Mesirov, Jill P. Accessible Reproducible Research. Science, vol. 327, no. 5964, Jan. 2010, pp. 41516. science.sciencemag.org, doi:10.1126/science.1179653. Munaf嘆, Marcus R., et al. A Manifesto for Reproducible Science. Nature Human Behaviour, vol. 1, no. 1, Jan. 2017, p. 0021. www.nature.com, doi:10.1038/ s41562-016-0021. Patil, Prasad, et al. A Statistical De鍖nition for Reproducibility and Replicability. BioRxiv, July 2016, p. 066803. www.biorxiv.org, doi:10.1101/066803. Russell, Pamela, et al. A Large-Scale Analysis of Bioinformatics Code on GitHub. BioRxiv, May 2018, p. 321919. www.biorxiv.org, doi:10.1101/321919. Schultheiss, Sebastian J., et al. Persistence and Availability of Web Services in Computational Biology. PLOS ONE, vol. 6, no. 9, Sept. 2011, p. e24914. PLoS Journals, doi:10.1371/journal.pone.0024914. 34
  • 35. Stephen D. Turner, Ph.D. Bioinformatics Core Director University of Virginia School of Medicine bioinformatics.virginia.edu @strnr THANKYOU bit.ly/madssci2018repro doi.org/10.5281/zenodo.1255003