An overview of our experiments at Industrial Light and Magic to create a fully cloud based pipeline, based on Mesos, Docker and automated with Ansible.
1 of 41
Downloaded 106 times
More Related Content
ILM - Pipeline in the cloud
2. Who are we?
Jim Vanns
Aaron Carey
Production Engineers at ILM London
4. Nomenclature, glossary and other big words
¥ VFX Visual Effects
¥ Pipeline Data->Process->Data repeat!
¥ Show Film
¥ Sequence A thematically linked series of
(continuous) scenes!
¥ Shot An uninterrupted portion of the
sequence
9. What VFX isn¨t
¥ Rendering and Sims are our `Big Data¨
¥ We¨re not crunching analytics in real-time
¥ Rendering != MapReduce
¥ Apps run on hardware, not in a browser
¥ We¨re not here to re-write a renderer (not yet...)
Where does the cloud meet VFX?
10. What¨s in it for us?
¥ Reducing Capital Expenditure
¥ Potentially reducing overheads
¥ Flexibility
¥ Giving power back to developers
13. First, what is rendering!?
¥ Take a virtual 3D representation of a scene
$ 3D Models
$ Textures
$ Light sources
$ Static backgrounds (plates)
¥ Place a virtual camera in the scene
¥ Compute the 2D image that the camera will see
14. Rendering in the cloud
¥ Low hanging fruit
¥ Already happening
¥ Typical Farm 30-50k procs
¥ Managed by specialist software (Tractor/Deadline/in-house etc)
¥ VFX has been doing clustered computing for decades
What¨s next?
15. Mesos
¥ Open Source framework for scheduling
¥ Already used at massive scale
¥ NOT a job scheduler
¥ We can concentrate on the scheduling logic
¥ Support for task isolation/containment (eg
Docker)
16. Automating our Mesos cluster with Docker and Ansible
¥ Goals: Quick - Easy - Repeatable
¥ Didn¨t want to spend time fighting our config manager (or each other)
¥ Be able to deploy a virtual studio from scratch in under an hour (including
provisioning, building software, deploying, configuration)
¥ Run multiple versions of the infrastructure at the same time (in the same
availability zone/network)
¥ If something is typed in the terminal, we want to automate and version it
Docker + Ansible was the answer
17. Automating our Mesos cluster with Ansible
¥ Heavily using tags and variables in Ansible
¥ Cloud agnostic: Some modification of GCE inventory and launch modules
¥ Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries
lineinfile:
dest=/etc/zookeeper/conf/zoo.cfg
insertafter=EOF
line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888"
with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"
18. Service Discovery in Mesos
¥ No control over where a service or render runs
¥ Services may move hosts
¥ Can¨t guarantee hosts will have same IP
¥ Options:
$ Mesos-DNS
$ Homegrown (etcd etc)
$ Consul
19. Mesos and Consul
¥ What is Consul?
¥ Every host runs an agent
¥ All DNS lookups on a host go to its agent
¥ Consul servers outside the Mesos cluster
¥ Mesos-Consul automates service registry
¥ Can be used for services outside the cluster
20. Example - Static service outside the cluster
$ ssh -i mykey.pem username@172.100.121.100
$ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY
-e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3
$ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry",
"Tags": ["docker-registry", "v2"], "Port": 5000 }'
http://127.0.0.1:8500/v1/agent/service/register
21. Example - Static service outside the cluster
- name: Run docker registry container
docker:
name: docker-registry
image: registry:2.1
state: started
ports:
- "5000:5000"
restart_policy: always
env:
REGISTRY_STORAGE_S3_ACCESSKEY:
REGISTRY_STORAGE_S3_SECRETKEY:
REGISTRY_STORAGE_S3_REGION:
REGISTRY_STORAGE_S3_BUCKET:
REGISTRY_STORAGE: s3
- name: Register registry with consul
uri:
url: http://127.0.0.1:8500/v1/agent/service/register
method: PUT
body: '{
"Name": "docker-registry",
"Tags": [
"docker-registry",
"v2"
],
"Port": 5000
}'
body_format: json
30. Cloud Storage Pros and Cons
¥ Managed
¥ No more tape archives/backups
But..
¥ Getting data into the cloud is expensive
¥ Getting data into the cloud is slooow
Is there another way?
31. Work in Progress...
¥ Applications need a POSIX filesystem interface
¥ Can we cache cloud storage?
$ EFS
$ Avere
$ Homegrown
Can we create content entirely in the cloud?
33. Can we create content entirely in the Cloud?
¥ Applications require OpenGL
¥ OpenGL requires hardware
¥ Hardware needs drivers
Can we do this in Docker?
34. Dockerising OpenGL Applications
¥ NVIDIA drivers must match the host version
exactly
¥ Driver inside the container must not install
kernel module
¥ Container requires access to GPU device and X
Server
35. Running an OpenGL Docker application
docker run
-it
-v /tmp/.X11-unix:/tmp/.X11-unix:rw
--device=/dev/dri/card0
--device=/dev/nvidia0
--device=/dev/nvidiactl
-e DISPLAY
36. Scheduling a VFX app on Mesos in the cloud
¥ Must use custom Mesos resources/attributes to
only schedule on GPU machines
¥ Cloud machines have no monitor
¥ Remote desktop apps will forward GL calls to
the client machine
37. Using VirtualGL
¥ Intercepts GLX calls on the host
¥ Calls forwarded to 2nd (local) X Server
¥ GPU computation is done on the GPU and
output forwarded to the 2D (VNC) X Server