狠狠撸

Who are we?
Jim Vanns
Aaron Carey
Production Engineers at ILM London

VFX Pipeline in the Cloud
Experiments with Mesos and Docker

Nomenclature, glossary and other big words
★ VFX Visual Effects
★ Pipeline Data->Process->Data repeat!
★ Show Film
★ Sequence A thematically linked series of
(continuous) scenes!
★ Shot An uninterrupted portion of the
sequence

What is a VFX pipeline?
Film Scan
Roto
3D
FX
Comp
Lighting

What VFX isn’t
★ Rendering and Sims are our ‘Big Data’
★ We’re not crunching analytics in real-time
★ Rendering != MapReduce
★ Apps run on hardware, not in a browser
★ We’re not here to re-write a renderer (not yet...)
Where does the cloud meet VFX?

What’s in it for us?
★ Reducing Capital Expenditure
★ Potentially reducing overheads
★ Flexibility
★ Giving power back to developers

VFX Studio Infrastructure
★ Render Farm
★ Database
★ Storage
★ Workstations

First, what is rendering!?
★ Take a virtual 3D representation of a scene
○ 3D Models
○ Textures
○ Light sources
○ Static backgrounds (plates)
★ Place a virtual camera in the scene
★ Compute the 2D image that the camera will see

Rendering in the cloud
★ Low hanging fruit
★ Already happening
★ Typical Farm 30-50k procs
★ Managed by specialist software (Tractor/Deadline/in-house etc)
★ VFX has been doing clustered computing for decades
What’s next?

Mesos
★ Open Source framework for scheduling
★ Already used at massive scale
★ NOT a job scheduler
★ We can concentrate on the scheduling logic
★ Support for task isolation/containment (eg
Docker)

Automating our Mesos cluster with Docker and Ansible
★ Goals: Quick - Easy - Repeatable
★ Didn’t want to spend time fighting our config manager (or each other)
★ Be able to deploy a virtual studio from scratch in under an hour (including
provisioning, building software, deploying, configuration)
★ Run multiple versions of the infrastructure at the same time (in the same
availability zone/network)
★ If something is typed in the terminal, we want to automate and version it
Docker + Ansible was the answer

Automating our Mesos cluster with Ansible
★ Heavily using tags and variables in Ansible
★ Cloud agnostic: Some modification of GCE inventory and launch modules
★ Example: Creating a multi-host dynamic Zookeeper configuration -- name: Append the zookeeper server entries
lineinfile:
dest=/etc/zookeeper/conf/zoo.cfg
insertafter=EOF
line="server.{{hostvars[item]['zkid']}}={{hostvars[item]['ansible_eth0']['ipv4']['address']}}:2888:3888"
with_items: "{{ groups['tag_zookeeper_server_' + consul_domain ]}}"

Service Discovery in Mesos
★ No control over where a service or render runs
★ Services may move hosts
★ Can’t guarantee hosts will have same IP
★ Options:
○ Mesos-DNS
○ Homegrown (etcd etc)
○ Consul

Mesos and Consul
★ What is Consul?
★ Every host runs an agent
★ All DNS lookups on a host go to its agent
★ Consul servers outside the Mesos cluster
★ Mesos-Consul automates service registry
★ Can be used for services outside the cluster

Example - Static service outside the cluster
$ ssh -i mykey.pem username@172.100.121.100
$ docker run -d -p 5000:5000 --restart=always -e REGISTRY_STORAGE_S3_ACCESSKEY
-e REGISTRY_STORAGE_S3_SECRETKEY -e REGISTRY_STORAGE_S3_REGION -e REGISTRY_STORAGE=s3
$ curl -H "Content-Type: application/json" -X POST -d '{ "Name": "docker-registry",
"Tags": ["docker-registry", "v2"], "Port": 5000 }'
http://127.0.0.1:8500/v1/agent/service/register

Example - Static service outside the cluster
- name: Run docker registry container
docker:
name: docker-registry
image: registry:2.1
state: started
ports:
- "5000:5000"
restart_policy: always
env:
REGISTRY_STORAGE_S3_ACCESSKEY:
REGISTRY_STORAGE_S3_SECRETKEY:
REGISTRY_STORAGE_S3_REGION:
REGISTRY_STORAGE_S3_BUCKET:
REGISTRY_STORAGE: s3
- name: Register registry with consul
uri:
url: http://127.0.0.1:8500/v1/agent/service/register
method: PUT
body: '{
"Name": "docker-registry",
"Tags": [
"docker-registry",
"v2"
],
"Port": 5000
}'
body_format: json

Example - Launching a service on marathon
- name: Submit maya container to marathon
hosts: "tag_build_docker_{{ consul_domain }}"
gather_facts: False
tasks:
- name: Submit maya job to marathon
uri:
url: http://marathon:8080/v2/apps
method: POST
status_code: 201,409
body: '{
"args": [],
"container": {
"type": "DOCKER",
"docker": {
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 5901,
"hostPort": 0,
"protocol": "tcp"
}
],
"image": "docker-registry:5000/studio-local-base/maya",
"forcePullImage": true,
"parameters":
[
{ "key": "env", "value": "DISPLAY" },
{ "key": "device", "value": "/dev/dri/card0" },
{ "key": "device", "value": "/dev/nvidia0" },
{ "key": "device", "value": "/dev/nvidiactl" }
]
},
"volumes": [
{
"containerPath": "/tmp/.X11-unix/X0",
"hostPath": "/tmp/.X11-unix/X0",
"mode": "RW"
}
]
},
"id": "maya",
"instances": 1,
"cpus": 4,
"mem": 8024,
"constraints": [
["gfx", "CLUSTER", "gpu"]
]
}'
body_format: json

★ Sites (eg. London, San Francisco, Singapore etc.)
★ Departments
★ Shows (film)
★ Sequences
★ Shots
★ Tasks
★ Assets
★ Data
Modelling studio relationships

Challenges
★ New technologies
○ Graph database
○ Query language/APIs
○ Distributed storage engine
★ Complexity (both in the data modelling and system)
★ Adoption/Approval

Cloud Storage Pros and Cons
★ Managed
★ No more tape archives/backups
But..
★ Getting data into the cloud is expensive
★ Getting data into the cloud is slooow
Is there another way?

Work in Progress...
★ Applications need a POSIX filesystem interface
★ Can we cache cloud storage?
○ EFS
○ Avere
○ Homegrown
Can we create content entirely in the cloud?

Can we create content entirely in the Cloud?
★ Applications require OpenGL
★ OpenGL requires hardware
★ Hardware needs drivers
Can we do this in Docker?

Dockerising OpenGL Applications
★ NVIDIA drivers must match the host version
exactly
★ Driver inside the container must not install
kernel module
★ Container requires access to GPU device and X
Server

Running an OpenGL Docker application
docker run
-it
-v /tmp/.X11-unix:/tmp/.X11-unix:rw
--device=/dev/dri/card0
--device=/dev/nvidia0
--device=/dev/nvidiactl
-e DISPLAY

Scheduling a VFX app on Mesos in the cloud
★ Must use custom Mesos resources/attributes to
only schedule on GPU machines
★ Cloud machines have no monitor
★ Remote desktop apps will forward GL calls to
the client machine

Using VirtualGL
★ Intercepts GLX calls on the host
★ Calls forwarded to 2nd (local) X Server
★ GPU computation is done on the GPU and
output forwarded to the 2D (VNC) X Server

3D X server setup
/etc/X11/xorg.conf
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GRID K520"
BusID "PCI:0:3:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "UseDisplayDevice"
"None"
SubSection "Display"
Depth 24
EndSubSection
EndSection
$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)
00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)

狠狠撸

ILM - Pipeline in the cloud

More Related Content

ILM - Pipeline in the cloud

Editor's Notes