際際滷

際際滷Share a Scribd company logo
ILM - Pipeline in the cloud
Who are we?
Jim Vanns
Aaron Carey
Production Engineers at ILM London
VFX Pipeline in the Cloud
Experiments with Mesos and Docker
Nomenclature, glossary and other big words
¥ VFX Visual Effects
¥ Pipeline Data->Process->Data repeat!
¥ Show Film
¥ Sequence A thematically linked series of
(continuous) scenes!
¥ Shot An uninterrupted portion of the
sequence
What is a VFX pipeline?
What is a VFX pipeline?
Film Scan
Roto
3D
FX
Comp
Lighting
What is a VFX pipeline?
What VFX isn¨t.
What VFX isn¨t
¥ Rendering and Sims are our `Big Data¨
¥ We¨re not crunching analytics in real-time
¥ Rendering != MapReduce
¥ Apps run on hardware, not in a browser
¥ We¨re not here to re-write a renderer (not yet...)
Where does the cloud meet VFX?
What¨s in it for us?
¥ Reducing Capital Expenditure
¥ Potentially reducing overheads
¥ Flexibility
¥ Giving power back to developers
VFX Studio Infrastructure
¥ Render Farm
¥ Database
¥ Storage
¥ Workstations
Render Farm
First, what is rendering!?
¥ Take a virtual 3D representation of a scene
$ 3D Models
$ Textures
$ Light sources
$ Static backgrounds (plates)
¥ Place a virtual camera in the scene
¥ Compute the 2D image that the camera will see
Rendering in the cloud
¥ Low hanging fruit
¥ Already happening
¥ Typical Farm 30-50k procs
¥ Managed by specialist software (Tractor/Deadline/in-house etc)
¥ VFX has been doing clustered computing for decades
What¨s next?
Mesos
¥ Open Source framework for scheduling
¥ Already used at massive scale
¥ NOT a job scheduler
¥ We can concentrate on the scheduling logic
¥ Support for task isolation/containment (eg
Docker)
Automating our Mesos cluster with Docker and Ansible
¥ Goals: Quick - Easy - Repeatable
¥ Didn¨t want to spend time fighting our config manager (or each other)
¥ Be able to deploy a virtual studio from scratch in under an hour (including
provisioning, building software, deploying, configuration)
¥ Run multiple versions of the infrastructure at the same time (in the same
availability zone/network)
¥ If something is typed in the terminal, we want to automate and version it
Docker + Ansible was the answer
Automating our Mesos cluster with Ansible
¥ Heavily using tags and variables in Ansible
¥ Cloud agnostic: Some modification of GCE inventory and launch modules
¥ Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries
lineinfile:
dest=/etc/zookeeper/conf/zoo.cfg
insertafter=EOF
line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888"
with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"
Service Discovery in Mesos
¥ No control over where a service or render runs
¥ Services may move hosts
¥ Can¨t guarantee hosts will have same IP
¥ Options:
$ Mesos-DNS
$ Homegrown (etcd etc)
$ Consul
Mesos and Consul
¥ What is Consul?
¥ Every host runs an agent
¥ All DNS lookups on a host go to its agent
¥ Consul servers outside the Mesos cluster
¥ Mesos-Consul automates service registry
¥ Can be used for services outside the cluster
Example - Static service outside the cluster
$ ssh -i mykey.pem username@172.100.121.100
$ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY 
-e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3
$ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry", 
"Tags": ["docker-registry", "v2"], "Port": 5000 }' 
http://127.0.0.1:8500/v1/agent/service/register
Example - Static service outside the cluster
- name: Run docker registry container
docker:
name: docker-registry
image: registry:2.1
state: started
ports:
- "5000:5000"
restart_policy: always
env:
REGISTRY_STORAGE_S3_ACCESSKEY:
REGISTRY_STORAGE_S3_SECRETKEY:
REGISTRY_STORAGE_S3_REGION:
REGISTRY_STORAGE_S3_BUCKET:
REGISTRY_STORAGE: s3
- name: Register registry with consul
uri:
url: http://127.0.0.1:8500/v1/agent/service/register
method: PUT
body: '{
"Name": "docker-registry",
"Tags": [
"docker-registry",
"v2"
],
"Port": 5000
}'
body_format: json
Example - Launching a service on marathon
- name: Submit maya container to marathon
hosts: "tag_build_docker_{{ consul_domain }}"
gather_facts: False
tasks:
- name: Submit maya job to marathon
uri:
url: http://marathon:8080/v2/apps
method: POST
status_code: 201,409
body: '{
"args": [],
"container": {
"type": "DOCKER",
"docker": {
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 5901,
"hostPort": 0,
"protocol": "tcp"
}
],
"image": "docker-registry:5000/studio-local-base/maya",
"forcePullImage": true,
"parameters":
[
{ "key": "env", "value": "DISPLAY" },
{ "key": "device", "value": "/dev/dri/card0" },
{ "key": "device", "value": "/dev/nvidia0" },
{ "key": "device", "value": "/dev/nvidiactl" }
]
},
"volumes": [
{
"containerPath": "/tmp/.X11-unix/X0",
"hostPath": "/tmp/.X11-unix/X0",
"mode": "RW"
}
]
},
"id": "maya",
"instances": 1,
"cpus": 4,
"mem": 8024,
"constraints": [
["gfx", "CLUSTER", "gpu"]
]
}'
body_format: json
Studio Services
Studio Service Structure
Studio Service Deployment
Database
¥ Sites (eg. London, San Francisco, Singapore etc.)
¥ Departments
¥ Shows (film)
¥ Sequences
¥ Shots
¥ Tasks
¥ Assets
¥ Data
Modelling studio relationships
Challenges
¥ New technologies
$ Graph database
$ Query language/APIs
$ Distributed storage engine
¥ Complexity (both in the data modelling and system)
¥ Adoption/Approval
Storage
Cloud Storage Pros and Cons
¥ Managed
¥ No more tape archives/backups
But..
¥ Getting data into the cloud is expensive
¥ Getting data into the cloud is slooow
Is there another way?
Work in Progress...
¥ Applications need a POSIX filesystem interface
¥ Can we cache cloud storage?
$ EFS
$ Avere
$ Homegrown
Can we create content entirely in the cloud?
Workstations
Can we create content entirely in the Cloud?
¥ Applications require OpenGL
¥ OpenGL requires hardware
¥ Hardware needs drivers
Can we do this in Docker?
Dockerising OpenGL Applications
¥ NVIDIA drivers must match the host version
exactly
¥ Driver inside the container must not install
kernel module
¥ Container requires access to GPU device and X
Server
Running an OpenGL Docker application
docker run 
-it 
-v /tmp/.X11-unix:/tmp/.X11-unix:rw 
--device=/dev/dri/card0 
--device=/dev/nvidia0 
--device=/dev/nvidiactl 
-e DISPLAY
Scheduling a VFX app on Mesos in the cloud
¥ Must use custom Mesos resources/attributes to
only schedule on GPU machines
¥ Cloud machines have no monitor
¥ Remote desktop apps will forward GL calls to
the client machine
Using VirtualGL
¥ Intercepts GLX calls on the host
¥ Calls forwarded to 2nd (local) X Server
¥ GPU computation is done on the GPU and
output forwarded to the 2D (VNC) X Server
Using VirtualGL
3D X server setup
/etc/X11/xorg.conf
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GRID K520"
BusID "PCI:0:3:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "UseDisplayDevice"
"None"
SubSection "Display"
Depth 24
EndSubSection
EndSection
$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)
00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
Demo
We¨re Hiring

More Related Content

ILM - Pipeline in the cloud

  • 2. Who are we? Jim Vanns Aaron Carey Production Engineers at ILM London
  • 3. VFX Pipeline in the Cloud Experiments with Mesos and Docker
  • 4. Nomenclature, glossary and other big words ¥ VFX Visual Effects ¥ Pipeline Data->Process->Data repeat! ¥ Show Film ¥ Sequence A thematically linked series of (continuous) scenes! ¥ Shot An uninterrupted portion of the sequence
  • 5. What is a VFX pipeline?
  • 6. What is a VFX pipeline? Film Scan Roto 3D FX Comp Lighting
  • 7. What is a VFX pipeline?
  • 9. What VFX isn¨t ¥ Rendering and Sims are our `Big Data¨ ¥ We¨re not crunching analytics in real-time ¥ Rendering != MapReduce ¥ Apps run on hardware, not in a browser ¥ We¨re not here to re-write a renderer (not yet...) Where does the cloud meet VFX?
  • 10. What¨s in it for us? ¥ Reducing Capital Expenditure ¥ Potentially reducing overheads ¥ Flexibility ¥ Giving power back to developers
  • 11. VFX Studio Infrastructure ¥ Render Farm ¥ Database ¥ Storage ¥ Workstations
  • 13. First, what is rendering!? ¥ Take a virtual 3D representation of a scene $ 3D Models $ Textures $ Light sources $ Static backgrounds (plates) ¥ Place a virtual camera in the scene ¥ Compute the 2D image that the camera will see
  • 14. Rendering in the cloud ¥ Low hanging fruit ¥ Already happening ¥ Typical Farm 30-50k procs ¥ Managed by specialist software (Tractor/Deadline/in-house etc) ¥ VFX has been doing clustered computing for decades What¨s next?
  • 15. Mesos ¥ Open Source framework for scheduling ¥ Already used at massive scale ¥ NOT a job scheduler ¥ We can concentrate on the scheduling logic ¥ Support for task isolation/containment (eg Docker)
  • 16. Automating our Mesos cluster with Docker and Ansible ¥ Goals: Quick - Easy - Repeatable ¥ Didn¨t want to spend time fighting our config manager (or each other) ¥ Be able to deploy a virtual studio from scratch in under an hour (including provisioning, building software, deploying, configuration) ¥ Run multiple versions of the infrastructure at the same time (in the same availability zone/network) ¥ If something is typed in the terminal, we want to automate and version it Docker + Ansible was the answer
  • 17. Automating our Mesos cluster with Ansible ¥ Heavily using tags and variables in Ansible ¥ Cloud agnostic: Some modification of GCE inventory and launch modules ¥ Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries lineinfile: dest=/etc/zookeeper/conf/zoo.cfg insertafter=EOF line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888" with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"
  • 18. Service Discovery in Mesos ¥ No control over where a service or render runs ¥ Services may move hosts ¥ Can¨t guarantee hosts will have same IP ¥ Options: $ Mesos-DNS $ Homegrown (etcd etc) $ Consul
  • 19. Mesos and Consul ¥ What is Consul? ¥ Every host runs an agent ¥ All DNS lookups on a host go to its agent ¥ Consul servers outside the Mesos cluster ¥ Mesos-Consul automates service registry ¥ Can be used for services outside the cluster
  • 20. Example - Static service outside the cluster $ ssh -i mykey.pem username@172.100.121.100 $ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY -e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3 $ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry", "Tags": ["docker-registry", "v2"], "Port": 5000 }' http://127.0.0.1:8500/v1/agent/service/register
  • 21. Example - Static service outside the cluster - name: Run docker registry container docker: name: docker-registry image: registry:2.1 state: started ports: - "5000:5000" restart_policy: always env: REGISTRY_STORAGE_S3_ACCESSKEY: REGISTRY_STORAGE_S3_SECRETKEY: REGISTRY_STORAGE_S3_REGION: REGISTRY_STORAGE_S3_BUCKET: REGISTRY_STORAGE: s3 - name: Register registry with consul uri: url: http://127.0.0.1:8500/v1/agent/service/register method: PUT body: '{ "Name": "docker-registry", "Tags": [ "docker-registry", "v2" ], "Port": 5000 }' body_format: json
  • 22. Example - Launching a service on marathon - name: Submit maya container to marathon hosts: "tag_build_docker_{{ consul_domain }}" gather_facts: False tasks: - name: Submit maya job to marathon uri: url: http://marathon:8080/v2/apps method: POST status_code: 201,409 body: '{ "args": [], "container": { "type": "DOCKER", "docker": { "network": "BRIDGE", "portMappings": [ { "containerPort": 5901, "hostPort": 0, "protocol": "tcp" } ], "image": "docker-registry:5000/studio-local-base/maya", "forcePullImage": true, "parameters": [ { "key": "env", "value": "DISPLAY" }, { "key": "device", "value": "/dev/dri/card0" }, { "key": "device", "value": "/dev/nvidia0" }, { "key": "device", "value": "/dev/nvidiactl" } ] }, "volumes": [ { "containerPath": "/tmp/.X11-unix/X0", "hostPath": "/tmp/.X11-unix/X0", "mode": "RW" } ] }, "id": "maya", "instances": 1, "cpus": 4, "mem": 8024, "constraints": [ ["gfx", "CLUSTER", "gpu"] ] }' body_format: json
  • 27. ¥ Sites (eg. London, San Francisco, Singapore etc.) ¥ Departments ¥ Shows (film) ¥ Sequences ¥ Shots ¥ Tasks ¥ Assets ¥ Data Modelling studio relationships
  • 28. Challenges ¥ New technologies $ Graph database $ Query language/APIs $ Distributed storage engine ¥ Complexity (both in the data modelling and system) ¥ Adoption/Approval
  • 30. Cloud Storage Pros and Cons ¥ Managed ¥ No more tape archives/backups But.. ¥ Getting data into the cloud is expensive ¥ Getting data into the cloud is slooow Is there another way?
  • 31. Work in Progress... ¥ Applications need a POSIX filesystem interface ¥ Can we cache cloud storage? $ EFS $ Avere $ Homegrown Can we create content entirely in the cloud?
  • 33. Can we create content entirely in the Cloud? ¥ Applications require OpenGL ¥ OpenGL requires hardware ¥ Hardware needs drivers Can we do this in Docker?
  • 34. Dockerising OpenGL Applications ¥ NVIDIA drivers must match the host version exactly ¥ Driver inside the container must not install kernel module ¥ Container requires access to GPU device and X Server
  • 35. Running an OpenGL Docker application docker run -it -v /tmp/.X11-unix:/tmp/.X11-unix:rw --device=/dev/dri/card0 --device=/dev/nvidia0 --device=/dev/nvidiactl -e DISPLAY
  • 36. Scheduling a VFX app on Mesos in the cloud ¥ Must use custom Mesos resources/attributes to only schedule on GPU machines ¥ Cloud machines have no monitor ¥ Remote desktop apps will forward GL calls to the client machine
  • 37. Using VirtualGL ¥ Intercepts GLX calls on the host ¥ Calls forwarded to 2nd (local) X Server ¥ GPU computation is done on the GPU and output forwarded to the 2D (VNC) X Server
  • 39. 3D X server setup /etc/X11/xorg.conf Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GRID K520" BusID "PCI:0:3:0" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "UseDisplayDevice" "None" SubSection "Display" Depth 24 EndSubSection EndSection $ lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1) 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
  • 40. Demo

Editor's Notes

  1. So.. this is kind of how a VFX studio does work (show diagram and run through it)
  2. So.. this is kind of how a VFX studio does work (show diagram and run through it)
  3. open source is great!