On March 5th, 2014 a team of system administrators and bioinformaticians conducted a hack-a-thon to integrate Galaxy on top of the high-performance computing cluster at Michigan State University complete with single-sign-on and the ability to run jobs as the submitting user. They elicited and received strong community support during the hack-a-thon and engaged Galaxy developers and users through IRC and Twitter. In eight hours this hack-a-thon was able to quickly navigate the various integration hurdles via real time assistance from the Galaxy community. The entire deployment was done as openly as possible with coordination of the various efforts via a separate public chat channel. While there were a couple person-days of prep and follow up, the scheduling of a single day to do the bulk of the installation proved to be critical in getting the job done and was far more effective than the many hours talking about the idea of deploying Galaxy prior. The format allowed for rapid progress as communication time was reduced and developers could modify or add components, receive prompt feedback and continue to build on the growing infrastructure. We advocate a similar recipe of using virtual machines, the Puppet configuration management system, and agile development enabled by the built-in implementations of various components of Galaxy to enable forward progress.
1 of 10
Download to read offline
More Related Content
Less talking, more doing: Crowd-sourcing the integration of Galaxy with a high-performance computing cluster
1. Less talking, more doing
Crowd-sourcing the integration of
Galaxy with a high-performance
computing cluster
2. The Goal
Enable users of the Michigan State University Genomics
Core to perform their own analysis using their High
Performance Computing Cluster infrastructure
Via:
1. Integrated institutional login
2. Import/export data from/to cluster storage while respecting permissions
3. Utilize existing node allocations and quotas; jobs must run as a HPCC user
not a generic Galaxy user
4. Use the existing installed bioinformatics tools (no installs from the tool-
shed)
3. The Resources
Institute for Cyber-Enabled Research
$10 million for developing collaborative, interdisciplinary computational
projects through a faculty scholars program and post-doctoral fellowships
Home of Michigan State Universitys HPCC
High Performance Computing Center
8, 16, 32, or 64 cores
8GiB - 2TiB of memory/node
Advanced GPU and Intel PHI capabilities also available
> 7000 cores in main cluster incl 800 core HTCondor system
339 TB scratch storage, 192TB user storage
4. The Plan
Do It Ourselves: open agile
deployment
All stakeholders set aside a single work
day to get as much done as possible
Community support solicited via
galaxy-dev@ and Twitter
Public chat room to document our work
6. Community Assistance
6 people joined our chat room to provide
encouragement and very useful advice
Thanks to Marten Martenson, Alper
Kucukural, Dannon Baker, Lauren M and
Nate Coraor!
7. Zero to Success in 8 Hours
No code changes needed
Only minimal prep beforehand
Login using existing Shibboleth
infrastructure (no new accounts or
passwords)
Jobs running as the users account
with quota control on the existing
compute cluster
Frontend + database running on a
VMware ESXi 5.1 virtual machine (4
cores, shared, NetApp NFS backed)
Deployed using Puppet
Will be migrating to the communitys
Puppet configuration
9. The Future
Filesystem permissions
automation (each homedir
is own filesystem & needs
the SHARENFS property
managed)
Galaxy upgrade procedure
& testing
More user outreach
courtesy @nodoubleg
10. Credits
Dirk Colbry1, Michael R. Crusoe2, Andy Keen1, Greg Mason1, Jason Muffett1, Matthew Scholz1,
Tracy K. Teal21 Michigan State University, Institute for Cyber-Enabled Research
2 Michigan State University, Department of Microbiology and Molecular Genetics
Nicholas Beckloff, Genomics Core Director