This document proposes developing an open source cloud bioinformatics community through collaborative software images and data snapshots. It discusses challenges in building open source bioinformatics communities and proposes lowering barriers by creating an integrated community solution that is inclusive, configurable, easy to contribute to, and automated through tools like Amazon EC2 and Fabric. This would provide a common platform and remove installation/distribution barriers to help scale bioinformatics analysis.
1 of 16
More Related Content
Developing an open source community for cloud bioinformatics
1. Developing an open source
community for cloud
bioinformatics
Brad Chapman
http://bcbio.wordpress.com/
8 June 2010
2. Overview
1 Building open source bioinformatics
communities is hard.
2 Developer resources are a productive
target.
3 Framework: collaborative software
images and data snapshots.
3. Motivation
Open source
OpenBio, Biopython
Graduate school developed distributed
algorithm. Never reused.
Work
Startup: Automated biological pipelines.
Research hospital: Democratization of
analysis.
4. Filters in biological computing
Working in same biological area
Interest in developing open source code
Technical abilities
Your software is good enough
5. Successful bioinformatics
Sean Eddy, HMMER
...the best software in the 鍖eld is often an
unplanned labor of love from a single
investigator.
http://selab.janelia.org/people/eddys/blog/?p=313
9. Establishing common platform
The solution
= to all our
problems
Remove install and distribution barriers
Building block for scaling
10. Existing cloud bioinformatics work
JCVI Cloud BioLinux
bioperl-max
MachetEC2
Debian Med
Overlapping set of useful functionality.
11. Integrated community solution
Inclusive but con鍖gurable
Easy to contribute
Automated
Bootstrap bare machine to fully ready
distributed AMI.
http://github.com/chapmanb/bcbb/tree/master/ec2/
biolinux/
12. Inclusive but con鍖gurable
# Top level YAML configuration file specifying
# groups of programs to be installed.
packages:
- python
- r
- erlang
- databases
- viz
- bio_search
- bio_alignment
- bio_nextgen
- bio_sequencing
- bio_visualization
- phylogeny
libraries:
- r-libs
- python-libs
13. Easy to contribute
# Configuration file defining R specific libraries that
# are installed via CRAN and Bioconductor.
cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/
cran:
- ggplot2
- rjson
- sqldf
- NMF
- ape
biocrepo: http://bioconductor.org/biocLite.R
bioc:
- ShortRead
- BSgenome
- edgeR
- GOstats
- biomaRt
- Rsamtools