際際滷

際際滷Share a Scribd company logo
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Angel Pizarro
AWS Research & Technical Computing
April 24, 2018
High Throughput Genomics on AWS
Containers and serverless technology for science
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
 Background and introduction
 Deep dive on application packaging and AWS Batch
 Demo - packaging samtools using Docker and submitting a Job
 Encoding and executing full scientific workflows with AWS Lambda and
AWS Step Functions
 Demo - running a full workflow
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The problem
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Serial steps
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Parallel steps
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Retry logic
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem 1: Application packaging
Need to package a application with its dependencies
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem 2: Application execution
Need to provide inputs, runtime arguments, and collect output
input
output
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem 2: Orchestration of execution
Need to define a dependency graph of applications and data
input
output output
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Amazon ECR
Amazon S3
Applications
Data
Application Layer
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Amazon ECR
Amazon S3
AWS Batch
Execution Layer
Job
Execution
Application Layer
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration
Application Layer Execution Layer Orchestration Layer
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics workflows
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Application Layer
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Virtualization of whole pipelines
Pros:
 Easy application publishing
 Clean dependency bundling
Cons:
 Large OS images
 Duplication of basic services
 Long start time
GATK v4.0
Bins/Libs
OS
GATK v4.0.1
Bins/Libs
OS
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Packaging applications using Docker containers
GATK v4.0
Bins/Libs
OS
GATK v4.0.1
Pros:
 Easy application publishing
 Clean dependency bundling
 Shared dependencies
 Shared OS services
 Small images
Cons:
 Some cross container
networking issues
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FROM ubuntu:16.04
RUN apt-get install -y python-pip python-dev
RUN pip install PIL
FROM python:2.7
RUN pip install numpy pandas
Docker Dockerfile and the build process
961f9d3583
c6d01316e4
a408d3cfe23
python:2.7ubuntu:precise
e3fc50a88d0
961f9d3583
c6d01316e4
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Docker container source repositories
Community containers Custom developed
 Control specific version and build
features
 Support for S3 download and check
pointing data
 Scratch space management
 Container metadata management
 Full control on the software stack
 Licensing
 Monitoring
 Security and compliance adherence
https://dockstore.org/
http://biocontainers.pro/
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Considerations for genomics applications on AWS
Data Staging
Use Amazon S3 to store reference and input data, store
results
Multi-tenancy
Have processes work with temporary directories
Storage cost/efficiency
Each Job cleans up after itself before returning
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo 1 - Application packaging
using Docker
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Execution Layer
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing AWS Batch
Fully Managed
Task Execution
No software to install or
servers to manage. AWS
Batch provisions and
scales your infrastructure
Integrated with AWS
AWS Batch jobs can easily
and securely interact with
services such as Amazon S3,
DynamoDB, and Rekognition
Cost-Efficient
AWS Batch launches compute
resources tailored to your jobs
and can provision Amazon EC2
and EC2 Spot instances
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch Concepts
Compute Environments
 The EC2 resources that do the work
Scheduler
 The resource scheduler, looks for submitted jobs and their
dependencies
Job Queue
 The resource to submit jobs to
Job Definition
 Defines the application, the minimal resources (CPUs, RAM)
and application arguments
Jobs
 The runtime instance of a Job Definition
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example AWS Batch Job Architecture
IAM Role for
Batch Job
Amazon S3
Input Files
Queue of
Runnable Jobs
Events Trigger
Lambda Function
Submits Batch Job
AWS Batch
Compute Environments
AWS Batch Job
Output
Job Definition
Job Resource Requirements
and other parameters
AWS Batch Execution
Application
Image
AWS Batch
Scheduler
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Visual Representation of AWS Batch
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Executing Job(s)
Specify Docker run parameters as container overrides
Specify Job Queue
Submit Dependencies
aws batch submit-job --job-name testsamtools_stats
--job-queue ${JOB_QUEUE}
--job-definition ${JOB_DEFINITION}
--container-overrides vcpus=4,memory=6
# STDERR return should resemble the following
{ "jobName": "testsamtools_stats", "jobId": "f92b20d3-cdcd-4b92-aa0c-
6bfd98a65ac6" }
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo 2 - Executing samtools stats
with AWS Batch
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Orchestration Layer
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Orchestration LayerApplication Layer Execution Layer
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
input
output
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Workflow orchestration using
Serverless technologies
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Owning servers means dealing with ...
Scaling
Availability and fault tolerance
Operations and management
Provisioning and utilization
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless means
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
New data
available
EVENT SOURCE FUNCTION
Node.js
Python
Java
C#
Go
AWS Lambda provides Functions as a Service
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Orchestration of workflows
Initiate Actions and Transitions
input
output
AWS Lambda
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using AWS Lambda
Bring your own code
 Node.js, Java, Python,
C#, Go
 Bring your own libraries
(even native ones)
Simple resource model
 Select power rating from
128 MB to 3 GB
 CPU and network
allocated proportionately
Flexible use
 Synchronous or
asynchronous
 Integrated with other
AWS services
Flexible authorization
 Securely grant access to
resources and VPCs
 Fine-grained control for
invoking your functions
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anatomy of a Lambda function
Handler() function
Function to be executed
upon invocation
Event object
Data sent during
Lambda Function
Invocation
Context object
Methods available to
interact with runtime
information (request ID,
log group, etc.)
def handle_request(job, context):
batch_job_response = submitBatchJob(job["name"],
job["queue"],
job["definition"])
return batch_job_response
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Keep orchestration out of code.
Sequence Choice Parallel
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions
Serverless workflow management with zero
administration:
 Makes it easy to coordinate the components of
distributed applications and microservices using
visual workflows
 Automatically triggers and tracks each step, and
retries when there are errors, so your application
executes in order and as expected
 Logs the state of each step, so when things do go
wrong, you can diagnose and debug problems
quickly
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions
Orchestration of workflows
Initiate Actions and Transitions
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Seven State Types
Task A single unit of work
Choice Adds branching logic
Parallel Fork and join the data across tasks
Wait Delay for a specified time
Fail Stops an execution and marks it as a failure
Succeed Stops an execution successfully
Pass Passes its input to its output
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deployment with Step Functions and Lambda
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Genomics Workflow
Alignment
Variant
Calling
Annotation
QC
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Putting it all together
$ aws stepfunctions start-execution
--state-machine-arn <your-
state-machine-arn>
--input
file://input.states.json
AWS Command Line Interface
AWS Batch console
Step Function console
S3 object listing
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo 3 - Implementing a full
workflow using Lambda and Step
Functions
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alternatives!
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Partner Network
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open Source Workflow Orchastration
Cromwell
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BioIT World workshop May 15th
息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
We will send a follow up email with more information
on how to get started using AWS Batch for genomics.

More Related Content

Recently uploaded (20)

PPT
rate of reaction and the factors affecting it.ppt
MOLATELOMATLEKE
PDF
Impacts on Ocean Worlds Are Sufficiently Frequent and Energetic to Be of Astr...
S辿rgio Sacani
PDF
Isro (Indian space research organization)
parineetaparineeta23
PDF
The First Detection of Molecular Activity in the Largest Known Oort Cloud Com...
S辿rgio Sacani
PPTX
Clinical Toxicology- Drug antagonism and drug synergism
jasmine698677
PPTX
Microbes Involved In Malaria, Microbiology
UMME54
PDF
SCH 4103_Fibre Technology & Dyeing_07012020.pdf
samwelngigi37
PDF
Can Consciousness Live and Travel Through Quantum AI?
Saikat Basu
PPTX
Philippine_Literature_Precolonial_Period_Designed.pptx
josedalagdag5
PDF
CERT Basic Training PTT, Brigadas comunitarias
chavezvaladezjuan
PDF
Global Health Initiatives: Lessons from Successful Programs (www.kiu.ac.ug)
publication11
PPTX
Chromosomal Aberration (Mutation) and Classification.
Dr-Haseeb Zubair Tagar
PPTX
Cyclotron_Presentation_theory, designMSc.pptx
MohamedMaideen12
PPTX
PROTOCOL PREsentation.pptx 12345567890q0
jeevika54
PDF
Agentic AI: Autonomy, Accountability, and the Algorithmic Society
vs5qkn48td
PDF
The Diversity of Exoplanetary Environments and the Search for Signs of Life B...
S辿rgio Sacani
PDF
Cultivation and goods of microorganisms-4.pdf
adimondal300
PPTX
General properties of connective tissue.pptx
shrishtiv82
PPTX
The-Emergence-of-Social-Science-Disciplines-A-Historical-Journey.pptx
RomaErginaBachiller
PDF
Evidence for a sub-Jovian planet in the young TWA 7 disk
S辿rgio Sacani
rate of reaction and the factors affecting it.ppt
MOLATELOMATLEKE
Impacts on Ocean Worlds Are Sufficiently Frequent and Energetic to Be of Astr...
S辿rgio Sacani
Isro (Indian space research organization)
parineetaparineeta23
The First Detection of Molecular Activity in the Largest Known Oort Cloud Com...
S辿rgio Sacani
Clinical Toxicology- Drug antagonism and drug synergism
jasmine698677
Microbes Involved In Malaria, Microbiology
UMME54
SCH 4103_Fibre Technology & Dyeing_07012020.pdf
samwelngigi37
Can Consciousness Live and Travel Through Quantum AI?
Saikat Basu
Philippine_Literature_Precolonial_Period_Designed.pptx
josedalagdag5
CERT Basic Training PTT, Brigadas comunitarias
chavezvaladezjuan
Global Health Initiatives: Lessons from Successful Programs (www.kiu.ac.ug)
publication11
Chromosomal Aberration (Mutation) and Classification.
Dr-Haseeb Zubair Tagar
Cyclotron_Presentation_theory, designMSc.pptx
MohamedMaideen12
PROTOCOL PREsentation.pptx 12345567890q0
jeevika54
Agentic AI: Autonomy, Accountability, and the Algorithmic Society
vs5qkn48td
The Diversity of Exoplanetary Environments and the Search for Signs of Life B...
S辿rgio Sacani
Cultivation and goods of microorganisms-4.pdf
adimondal300
General properties of connective tissue.pptx
shrishtiv82
The-Emergence-of-Social-Science-Disciplines-A-Historical-Journey.pptx
RomaErginaBachiller
Evidence for a sub-Jovian planet in the young TWA 7 disk
S辿rgio Sacani

Featured (20)

PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
Search Engine Journal
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
PDF
Artificial Intelligence, Data and Competition SCHREPEL June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
PDF
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
PDF
2024 State of Marketing Report by Hubspot
Marius Sescu
PDF
Everything You Need To Know About ChatGPT
Expeed Software
PDF
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
PDF
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
PDF
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
PDF
Skeleton Culture Code
Skeleton Technologies
PDF
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
PDF
Content Methodology: A Best Practices Report (Webinar)
contently
PPTX
How to Prepare For a Successful Job Search for 2024
Albert Qian
PDF
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
PDF
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
PDF
5 Public speaking tips from TED - Visualized summary
SpeakerHub
PDF
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
PDF
Getting into the tech field. what next
Tessa Mero
PDF
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
PDF
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
2024 Trend Updates: What Really Works In SEO & Content Marketing
Search Engine Journal
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
Artificial Intelligence, Data and Competition SCHREPEL June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
2024 State of Marketing Report by Hubspot
Marius Sescu
Everything You Need To Know About ChatGPT
Expeed Software
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
Skeleton Culture Code
Skeleton Technologies
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
Content Methodology: A Best Practices Report (Webinar)
contently
How to Prepare For a Successful Job Search for 2024
Albert Qian
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
5 Public speaking tips from TED - Visualized summary
SpeakerHub
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
Getting into the tech field. what next
Tessa Mero
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
Ad

Genomics on aws-webinar-april2018

  • 1. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Angel Pizarro AWS Research & Technical Computing April 24, 2018 High Throughput Genomics on AWS Containers and serverless technology for science
  • 2. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Background and introduction Deep dive on application packaging and AWS Batch Demo - packaging samtools using Docker and submitting a Job Encoding and executing full scientific workflows with AWS Lambda and AWS Step Functions Demo - running a full workflow
  • 3. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The problem
  • 4. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis
  • 5. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Serial steps
  • 6. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Parallel steps
  • 7. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Retry logic
  • 8. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis
  • 9. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem 1: Application packaging Need to package a application with its dependencies
  • 10. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem 2: Application execution Need to provide inputs, runtime arguments, and collect output input output
  • 11. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Problem 2: Orchestration of execution Need to define a dependency graph of applications and data input output output
  • 12. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Amazon ECR Amazon S3 Applications Data Application Layer
  • 13. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Amazon ECR Amazon S3 AWS Batch Execution Layer Job Execution Application Layer
  • 14. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration Application Layer Execution Layer Orchestration Layer
  • 15. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics workflows Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 16. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Application Layer Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 17. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bioinformatics application stacks * Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
  • 18. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Virtualization of whole pipelines Pros: Easy application publishing Clean dependency bundling Cons: Large OS images Duplication of basic services Long start time GATK v4.0 Bins/Libs OS GATK v4.0.1 Bins/Libs OS
  • 19. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bioinformatics application stacks * Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
  • 20. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Packaging applications using Docker containers GATK v4.0 Bins/Libs OS GATK v4.0.1 Pros: Easy application publishing Clean dependency bundling Shared dependencies Shared OS services Small images Cons: Some cross container networking issues
  • 21. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FROM ubuntu:16.04 RUN apt-get install -y python-pip python-dev RUN pip install PIL FROM python:2.7 RUN pip install numpy pandas Docker Dockerfile and the build process 961f9d3583 c6d01316e4 a408d3cfe23 python:2.7ubuntu:precise e3fc50a88d0 961f9d3583 c6d01316e4
  • 22. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Docker container source repositories Community containers Custom developed Control specific version and build features Support for S3 download and check pointing data Scratch space management Container metadata management Full control on the software stack Licensing Monitoring Security and compliance adherence https://dockstore.org/ http://biocontainers.pro/
  • 23. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Considerations for genomics applications on AWS Data Staging Use Amazon S3 to store reference and input data, store results Multi-tenancy Have processes work with temporary directories Storage cost/efficiency Each Job cleans up after itself before returning
  • 24. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo 1 - Application packaging using Docker
  • 25. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 26. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Execution Layer Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 27. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing AWS Batch Fully Managed Task Execution No software to install or servers to manage. AWS Batch provisions and scales your infrastructure Integrated with AWS AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Rekognition Cost-Efficient AWS Batch launches compute resources tailored to your jobs and can provision Amazon EC2 and EC2 Spot instances
  • 28. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch Concepts Compute Environments The EC2 resources that do the work Scheduler The resource scheduler, looks for submitted jobs and their dependencies Job Queue The resource to submit jobs to Job Definition Defines the application, the minimal resources (CPUs, RAM) and application arguments Jobs The runtime instance of a Job Definition
  • 29. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example AWS Batch Job Architecture IAM Role for Batch Job Amazon S3 Input Files Queue of Runnable Jobs Events Trigger Lambda Function Submits Batch Job AWS Batch Compute Environments AWS Batch Job Output Job Definition Job Resource Requirements and other parameters AWS Batch Execution Application Image AWS Batch Scheduler
  • 30. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Visual Representation of AWS Batch
  • 31. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Executing Job(s) Specify Docker run parameters as container overrides Specify Job Queue Submit Dependencies aws batch submit-job --job-name testsamtools_stats --job-queue ${JOB_QUEUE} --job-definition ${JOB_DEFINITION} --container-overrides vcpus=4,memory=6 # STDERR return should resemble the following { "jobName": "testsamtools_stats", "jobId": "f92b20d3-cdcd-4b92-aa0c- 6bfd98a65ac6" }
  • 32. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo 2 - Executing samtools stats with AWS Batch
  • 33. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 34. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Orchestration Layer Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Orchestration LayerApplication Layer Execution Layer
  • 35. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions
  • 36. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions input output
  • 37. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions
  • 38. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Workflow orchestration using Serverless technologies
  • 39. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Owning servers means dealing with ... Scaling Availability and fault tolerance Operations and management Provisioning and utilization
  • 40. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless means
  • 41. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SERVICES (ANYTHING) Changes in data state Requests to endpoints New data available EVENT SOURCE FUNCTION Node.js Python Java C# Go AWS Lambda provides Functions as a Service
  • 42. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Orchestration of workflows Initiate Actions and Transitions input output AWS Lambda
  • 43. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using AWS Lambda Bring your own code Node.js, Java, Python, C#, Go Bring your own libraries (even native ones) Simple resource model Select power rating from 128 MB to 3 GB CPU and network allocated proportionately Flexible use Synchronous or asynchronous Integrated with other AWS services Flexible authorization Securely grant access to resources and VPCs Fine-grained control for invoking your functions
  • 44. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anatomy of a Lambda function Handler() function Function to be executed upon invocation Event object Data sent during Lambda Function Invocation Context object Methods available to interact with runtime information (request ID, log group, etc.) def handle_request(job, context): batch_job_response = submitBatchJob(job["name"], job["queue"], job["definition"]) return batch_job_response
  • 45. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Keep orchestration out of code. Sequence Choice Parallel
  • 46. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Step Functions Serverless workflow management with zero administration: Makes it easy to coordinate the components of distributed applications and microservices using visual workflows Automatically triggers and tracks each step, and retries when there are errors, so your application executes in order and as expected Logs the state of each step, so when things do go wrong, you can diagnose and debug problems quickly
  • 47. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Step Functions Orchestration of workflows Initiate Actions and Transitions
  • 48. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Seven State Types Task A single unit of work Choice Adds branching logic Parallel Fork and join the data across tasks Wait Delay for a specified time Fail Stops an execution and marks it as a failure Succeed Stops an execution successfully Pass Passes its input to its output
  • 49. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deployment with Step Functions and Lambda
  • 50. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Genomics Workflow Alignment Variant Calling Annotation QC
  • 51. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Putting it all together $ aws stepfunctions start-execution --state-machine-arn <your- state-machine-arn> --input file://input.states.json AWS Command Line Interface AWS Batch console Step Function console S3 object listing
  • 52. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo 3 - Implementing a full workflow using Lambda and Step Functions
  • 53. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 54. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Alternatives!
  • 55. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Partner Network
  • 56. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Open Source Workflow Orchastration Cromwell
  • 57. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BioIT World workshop May 15th
  • 58. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! We will send a follow up email with more information on how to get started using AWS Batch for genomics.