際際滷

際際滷Share a Scribd company logo
Developing an open source
  community for cloud
      bioinformatics
        Brad Chapman
  http://bcbio.wordpress.com/


         8 June 2010
Overview

   1   Building open source bioinformatics
       communities is hard.
   2   Developer resources are a productive
       target.
   3   Framework: collaborative software
       images and data snapshots.
Motivation

    Open source
        OpenBio, Biopython
        Graduate school  developed distributed
        algorithm. Never reused.
    Work
        Startup: Automated biological pipelines.
        Research hospital: Democratization of
        analysis.
Filters in biological computing

          Working in same biological area

          Interest in developing open source code

          Technical abilities

          Your software is good enough
Successful bioinformatics

  Sean Eddy, HMMER
  ...the best software in the 鍖eld is often an
  unplanned labor of love from a single
  investigator.
  http://selab.janelia.org/people/eddys/blog/?p=313
Recognizing contributions
Successful community projects

      OpenBio: BioPerl, Biopython, BioJava
      Bioconductor
 Common theme
 Aimed at developers.
 Biologists bene鍖t indirectly.
Lowering activation energy
Establishing common platform

                           The solution
                    =      to all our
                           problems


    Remove install and distribution barriers
    Building block for scaling
Existing cloud bioinformatics work

      JCVI Cloud BioLinux
      bioperl-max
      MachetEC2
      Debian Med
  Overlapping set of useful functionality.
Integrated community solution

      Inclusive but con鍖gurable
      Easy to contribute
      Automated
 Bootstrap bare machine to fully ready
 distributed AMI.
 http://github.com/chapmanb/bcbb/tree/master/ec2/
 biolinux/
Inclusive but con鍖gurable
  # Top level YAML configuration file specifying
  # groups of programs to be installed.
  packages:
    - python
    - r
    - erlang
    - databases
    - viz
    - bio_search
    - bio_alignment
    - bio_nextgen
    - bio_sequencing
    - bio_visualization
    - phylogeny
  libraries:
    - r-libs
    - python-libs
Easy to contribute
 # Configuration file defining R specific libraries that
 # are installed via CRAN and Bioconductor.
 cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/
 cran:
  - ggplot2
  - rjson
  - sqldf
  - NMF
  - ape
 biocrepo: http://bioconductor.org/biocLite.R
 bioc:
  - ShortRead
  - BSgenome
  - edgeR
  - GOstats
  - biomaRt
  - Rsamtools
Automated

 def install_biolinux():
     ec2_ubuntu_environment()
     pkg_install, lib_install = _read_main_config()
     _apt_packages(pkg_install)
     _do_library_installs(lib_install)

 def _ruby_library_installer(config):
     for gem in config[gems]:
         sudo("gem install %s" % gem)


 Fabric: http://docs.fabfile.org/
Ready to use biological data

 % ls /referenceGenomes/            % ls Hsapiens/hg18
 Athaliana                          arachne
 Celegans                           bowtie
 Dmelanogaster                      bwa
 Ecoli                              eland
 Hsapiens                           maq
 Mmusculus                          seq
 Msmegmatis                         snps
 Mtuberculosis_H37Rv                ucsc
 Paeruginosa_UCBPP-PA14
 phiX174
 Rnorvegicus
 Scerevisiae
 Xtropicalis
  http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
Organization: Codefest 2010




 www.open-bio.org/wiki/Codefest_2010

More Related Content

Developing an open source community for cloud bioinformatics

  • 1. Developing an open source community for cloud bioinformatics Brad Chapman http://bcbio.wordpress.com/ 8 June 2010
  • 2. Overview 1 Building open source bioinformatics communities is hard. 2 Developer resources are a productive target. 3 Framework: collaborative software images and data snapshots.
  • 3. Motivation Open source OpenBio, Biopython Graduate school developed distributed algorithm. Never reused. Work Startup: Automated biological pipelines. Research hospital: Democratization of analysis.
  • 4. Filters in biological computing Working in same biological area Interest in developing open source code Technical abilities Your software is good enough
  • 5. Successful bioinformatics Sean Eddy, HMMER ...the best software in the 鍖eld is often an unplanned labor of love from a single investigator. http://selab.janelia.org/people/eddys/blog/?p=313
  • 7. Successful community projects OpenBio: BioPerl, Biopython, BioJava Bioconductor Common theme Aimed at developers. Biologists bene鍖t indirectly.
  • 9. Establishing common platform The solution = to all our problems Remove install and distribution barriers Building block for scaling
  • 10. Existing cloud bioinformatics work JCVI Cloud BioLinux bioperl-max MachetEC2 Debian Med Overlapping set of useful functionality.
  • 11. Integrated community solution Inclusive but con鍖gurable Easy to contribute Automated Bootstrap bare machine to fully ready distributed AMI. http://github.com/chapmanb/bcbb/tree/master/ec2/ biolinux/
  • 12. Inclusive but con鍖gurable # Top level YAML configuration file specifying # groups of programs to be installed. packages: - python - r - erlang - databases - viz - bio_search - bio_alignment - bio_nextgen - bio_sequencing - bio_visualization - phylogeny libraries: - r-libs - python-libs
  • 13. Easy to contribute # Configuration file defining R specific libraries that # are installed via CRAN and Bioconductor. cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/ cran: - ggplot2 - rjson - sqldf - NMF - ape biocrepo: http://bioconductor.org/biocLite.R bioc: - ShortRead - BSgenome - edgeR - GOstats - biomaRt - Rsamtools
  • 14. Automated def install_biolinux(): ec2_ubuntu_environment() pkg_install, lib_install = _read_main_config() _apt_packages(pkg_install) _do_library_installs(lib_install) def _ruby_library_installer(config): for gem in config[gems]: sudo("gem install %s" % gem) Fabric: http://docs.fabfile.org/
  • 15. Ready to use biological data % ls /referenceGenomes/ % ls Hsapiens/hg18 Athaliana arachne Celegans bowtie Dmelanogaster bwa Ecoli eland Hsapiens maq Mmusculus seq Msmegmatis snps Mtuberculosis_H37Rv ucsc Paeruginosa_UCBPP-PA14 phiX174 Rnorvegicus Scerevisiae Xtropicalis http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
  • 16. Organization: Codefest 2010 www.open-bio.org/wiki/Codefest_2010