際際滷

際際滷Share a Scribd company logo
Its always  sunny  on top of the Cloud! An intro to Amazon Web Services Simon Twigger, Ph.D. Medical College of Wisconsin, Milwaukee ViPDAC, a stand-alone Proteomics Analysis Suite in the Cloud
` How the humble pipette tip helped us rethink our computing strategy...¨
Meet Joe `the¨ Researcher...
Proteomics - Finding and identifying proteins DB Rat/Tissue Sample LC MS/MS Peptide Identification Results & Analysis
Current architecture Windows (head node, preprocessing, storage) Raw File .dtas Protein IDs IBM Blade Cluster (Sequest)
Finite Resource, wait your turn 1 MCW Cluster
Here¨s the lab¨s pipette tip, Let me have it when you¨re done...
What would you do if there was only one tip? Wait in line to use it Run fewer experiments  ( due to waiting in line ) Do small scale things ( Its a small tip, pipetting 5l takes all week! ) Try fewer things  ( its a real pain to keep washing it up ) Not try anything weird  ( What happens if it gets permanently clogged!? )
OK, more computers might be better... but... we dont have the money! we dont have an IT guy/gal we dont have a sysadmin we dont know how to install a cluster we wont use it all the time
Virtual Proteomics Analysis Cluster (ViPDAC) http://proteomics.mcw.edu/vipdac + +
Current architecture with Sequest Raw File .dtas Protein IDs IBM Blade Cluster (Sequest) Windows (head node, preprocessing, storage)
ViPDAC & Amazon Components S3 (Data Store) Raw File .dtas Protein IDs EC2 (OMSSA, !XTandem)
ViPDAC & Amazon Components S3 (Data Store) Raw File .dtas Protein IDs 2x 3x 20x
ViPDAC: Create a new analysis job
Job in progress
Wait in line vs On Demand vs 1 MCW Cluster Molly¨s ViPDAC Shama¨s ViPDAC Brian¨s ViPDAC Bassam¨s ViPDAC
Equal-opportunity computing - Clusters for All vs 1 PC 1 ViPDAC or n ViPDACs
Observations Sign up & Start up is hard for biologists. http://www.directthought.com / http://www.elasticpod.com /
Now what? No need to   Wait in line to use it No need to   Run fewer analyses  No need to   Do small scale things No need to   Try fewer things  No need to   Not try anything weird  Molly¨s ViPDAC Shama¨s ViPDAC Brian¨s ViPDAC Bassam¨s ViPDAC
Internal Hybrid Solution C Local and Cloud Scale up/down/off
Clouds & Bioinformatics: Our observations so far Use it as a software delivery method Use it to provide computing to virtually anyone Get fast access to large data files (Ensembl, Genbank, etc) Use it to  COMPLEMENT  existing clusters/grids AMIs/Apps not easy for non-informatics folks to get going ` Cloud-friendly¨ licensing structures for commercial software? ` Grant-friendly¨ billing options Data transfer for large datasets (NextGen sequencing?)
Acknowledgements Joey Geiger, Brian Halligan and Andrew Vallejos Molly Pellitteri-Hahn, Shama Mirsa Mike Olivier, Andy Greene NHLBI National Proteomics Center Low Cost, Scalable Proteomics Data Analysis Using Amazon¨s Cloud Computing Services and Open Source Search Algorithms. J. Proteome Res., 2009, 8 (6), pp 3148C3153

More Related Content

Virtual Proteomics Analysis Cluster in the Cloud

  • 1. Its always sunny on top of the Cloud! An intro to Amazon Web Services Simon Twigger, Ph.D. Medical College of Wisconsin, Milwaukee ViPDAC, a stand-alone Proteomics Analysis Suite in the Cloud
  • 2. ` How the humble pipette tip helped us rethink our computing strategy...¨
  • 3. Meet Joe `the¨ Researcher...
  • 4. Proteomics - Finding and identifying proteins DB Rat/Tissue Sample LC MS/MS Peptide Identification Results & Analysis
  • 5. Current architecture Windows (head node, preprocessing, storage) Raw File .dtas Protein IDs IBM Blade Cluster (Sequest)
  • 6. Finite Resource, wait your turn 1 MCW Cluster
  • 7. Here¨s the lab¨s pipette tip, Let me have it when you¨re done...
  • 8. What would you do if there was only one tip? Wait in line to use it Run fewer experiments ( due to waiting in line ) Do small scale things ( Its a small tip, pipetting 5l takes all week! ) Try fewer things ( its a real pain to keep washing it up ) Not try anything weird ( What happens if it gets permanently clogged!? )
  • 9. OK, more computers might be better... but... we dont have the money! we dont have an IT guy/gal we dont have a sysadmin we dont know how to install a cluster we wont use it all the time
  • 10. Virtual Proteomics Analysis Cluster (ViPDAC) http://proteomics.mcw.edu/vipdac + +
  • 11. Current architecture with Sequest Raw File .dtas Protein IDs IBM Blade Cluster (Sequest) Windows (head node, preprocessing, storage)
  • 12. ViPDAC & Amazon Components S3 (Data Store) Raw File .dtas Protein IDs EC2 (OMSSA, !XTandem)
  • 13. ViPDAC & Amazon Components S3 (Data Store) Raw File .dtas Protein IDs 2x 3x 20x
  • 14. ViPDAC: Create a new analysis job
  • 16. Wait in line vs On Demand vs 1 MCW Cluster Molly¨s ViPDAC Shama¨s ViPDAC Brian¨s ViPDAC Bassam¨s ViPDAC
  • 17. Equal-opportunity computing - Clusters for All vs 1 PC 1 ViPDAC or n ViPDACs
  • 18. Observations Sign up & Start up is hard for biologists. http://www.directthought.com / http://www.elasticpod.com /
  • 19. Now what? No need to Wait in line to use it No need to Run fewer analyses No need to Do small scale things No need to Try fewer things No need to Not try anything weird Molly¨s ViPDAC Shama¨s ViPDAC Brian¨s ViPDAC Bassam¨s ViPDAC
  • 20. Internal Hybrid Solution C Local and Cloud Scale up/down/off
  • 21. Clouds & Bioinformatics: Our observations so far Use it as a software delivery method Use it to provide computing to virtually anyone Get fast access to large data files (Ensembl, Genbank, etc) Use it to COMPLEMENT existing clusters/grids AMIs/Apps not easy for non-informatics folks to get going ` Cloud-friendly¨ licensing structures for commercial software? ` Grant-friendly¨ billing options Data transfer for large datasets (NextGen sequencing?)
  • 22. Acknowledgements Joey Geiger, Brian Halligan and Andrew Vallejos Molly Pellitteri-Hahn, Shama Mirsa Mike Olivier, Andy Greene NHLBI National Proteomics Center Low Cost, Scalable Proteomics Data Analysis Using Amazon¨s Cloud Computing Services and Open Source Search Algorithms. J. Proteome Res., 2009, 8 (6), pp 3148C3153

Editor's Notes

  • #21: Internally we now utilize a hybrid solution C Sequest and mascot running on local clusters, X!Tandem and OMSSA are run on AWS. Raw data can be sent to any and all of these algorithms through an integrated workflow system