際際滷

際際滷Share a Scribd company logo
Less talking, more doing
Crowd-sourcing the integration of
Galaxy with a high-performance
computing cluster
The Goal
Enable users of the Michigan State University Genomics
Core to perform their own analysis using their High
Performance Computing Cluster infrastructure
Via:
1. Integrated institutional login
2. Import/export data from/to cluster storage while respecting permissions
3. Utilize existing node allocations and quotas; jobs must run as a HPCC user
not a generic Galaxy user
4. Use the existing installed bioinformatics tools (no installs from the tool-
shed)
The Resources
Institute for Cyber-Enabled Research
 $10 million for developing collaborative, interdisciplinary computational
projects through a faculty scholars program and post-doctoral fellowships
 Home of Michigan State Universitys HPCC
High Performance Computing Center
 8, 16, 32, or 64 cores
 8GiB - 2TiB of memory/node
 Advanced GPU and Intel PHI capabilities also available
 > 7000 cores in main cluster incl 800 core HTCondor system
 339 TB scratch storage, 192TB user storage
The Plan
Do It Ourselves: open agile
deployment
All stakeholders set aside a single work
day to get as much done as possible
Community support solicited via
galaxy-dev@ and Twitter
Public chat room to document our work
March 5th, 2014
Community Assistance
6 people joined our chat room to provide
encouragement and very useful advice
Thanks to Marten Martenson, Alper
Kucukural, Dannon Baker, Lauren M and
Nate Coraor!
Zero to Success in 8 Hours
 No code changes needed
 Only minimal prep beforehand
 Login using existing Shibboleth
infrastructure (no new accounts or
passwords)
 Jobs running as the users account
with quota control on the existing
compute cluster
 Frontend + database running on a
VMware ESXi 5.1 virtual machine (4
cores, shared, NetApp NFS backed)
 Deployed using Puppet
 Will be migrating to the communitys
Puppet configuration
The Result
Tools using
already
installed
software
The Future
 Filesystem permissions
automation (each homedir
is own filesystem & needs
the SHARENFS property
managed)
 Galaxy upgrade procedure
& testing
 More user outreach
courtesy @nodoubleg
Credits
Dirk Colbry1, Michael R. Crusoe2, Andy Keen1, Greg Mason1, Jason Muffett1, Matthew Scholz1,
Tracy K. Teal21 Michigan State University, Institute for Cyber-Enabled Research
2 Michigan State University, Department of Microbiology and Molecular Genetics
Nicholas Beckloff, Genomics Core Director

More Related Content

Less talking, more doing: Crowd-sourcing the integration of Galaxy with a high-performance computing cluster

  • 1. Less talking, more doing Crowd-sourcing the integration of Galaxy with a high-performance computing cluster
  • 2. The Goal Enable users of the Michigan State University Genomics Core to perform their own analysis using their High Performance Computing Cluster infrastructure Via: 1. Integrated institutional login 2. Import/export data from/to cluster storage while respecting permissions 3. Utilize existing node allocations and quotas; jobs must run as a HPCC user not a generic Galaxy user 4. Use the existing installed bioinformatics tools (no installs from the tool- shed)
  • 3. The Resources Institute for Cyber-Enabled Research $10 million for developing collaborative, interdisciplinary computational projects through a faculty scholars program and post-doctoral fellowships Home of Michigan State Universitys HPCC High Performance Computing Center 8, 16, 32, or 64 cores 8GiB - 2TiB of memory/node Advanced GPU and Intel PHI capabilities also available > 7000 cores in main cluster incl 800 core HTCondor system 339 TB scratch storage, 192TB user storage
  • 4. The Plan Do It Ourselves: open agile deployment All stakeholders set aside a single work day to get as much done as possible Community support solicited via galaxy-dev@ and Twitter Public chat room to document our work
  • 6. Community Assistance 6 people joined our chat room to provide encouragement and very useful advice Thanks to Marten Martenson, Alper Kucukural, Dannon Baker, Lauren M and Nate Coraor!
  • 7. Zero to Success in 8 Hours No code changes needed Only minimal prep beforehand Login using existing Shibboleth infrastructure (no new accounts or passwords) Jobs running as the users account with quota control on the existing compute cluster Frontend + database running on a VMware ESXi 5.1 virtual machine (4 cores, shared, NetApp NFS backed) Deployed using Puppet Will be migrating to the communitys Puppet configuration
  • 9. The Future Filesystem permissions automation (each homedir is own filesystem & needs the SHARENFS property managed) Galaxy upgrade procedure & testing More user outreach courtesy @nodoubleg
  • 10. Credits Dirk Colbry1, Michael R. Crusoe2, Andy Keen1, Greg Mason1, Jason Muffett1, Matthew Scholz1, Tracy K. Teal21 Michigan State University, Institute for Cyber-Enabled Research 2 Michigan State University, Department of Microbiology and Molecular Genetics Nicholas Beckloff, Genomics Core Director