Distributed Erlang systems aim to be decentralized, distributed, homogeneous, and fault tolerant. Nodes use only local data and there is no global state or reliance on physical time. Cluster membership is handled through a configuration file or gossip protocol. Load balancing uses techniques like consistent hashing. Liveness is checked through Erlang tools and custom mechanisms. Soft state through gossip protocols can provide an alternative to global state. Shipping code embedded in a runtime and upgrading through reboots helps manage distributed systems.
1 of 16
More Related Content
Distributed Erlang Systems In Operation
1. Distributed Erlang
Systems In Operation
Andy Gross <andy@basho.com>, @argv0
VP, Engineering
Basho Technologies
Erlang Factory SF 2010
2. Architectural Goals
? Decentralized (no masters).
? Distributed (asynchronous, nodes use only
local data).
? Homogeneous (all nodes can do anything).
? Fault tolerant (emergent goal).
? Observable
3. Anti-Goals
? Global state:
? pg2/hot data in mnesia
? globally registered names
? Distributed transactions
? Reliance on physical time
4. Compromise your
Goals
? Decentralized (no masters).
? Distributed (nodes use only local data).
? Homogeneous (all nodes can do anything).
? No distributed transactions/global state.
? No reliance on physical time.
5. Systems Design
? Cluster Membership
? Load balancing/naming/resource allocation
? Liveness checking
? Soft Global State
6. Cluster Membership
? Option 1: Use a con?guration ?le:
? Requires out-of-band sync of
con?guration ?le across machines.
? Not ^elastic ̄ enough for some use-cases.
? Option II: Contact a seed node to join and
use gossip protocol to propagate state.
8. Liveness Checking
? nodes() and net_adm:ping() operations can
be too low-level.
? Sometimes you¨d like to divert traf?c from
a node at the application level while
keeping distributed Erlang up.
? Use net_kernel:monitor_nodes() and an
app-level mechanism for liveness.
9. Soft State/Gossip
Protocols
? An eventually-consistent alternative to
global state.
? Nodes make changes, gossip to another
node.
? Nodes receive changes, merge with local
state, gossip to another node.
? Requires up-front thought about data
structures, dealing with slightly-stale data.
10. Running Your System
? Shipping code
? Upgrading code
? Debugging your own systems
? Living with other people¨s systems
11. Shipping Code
? Don¨t rely on working Erlang on end-user
machines (many Linux distros are broken
or out of date).
? Ship code with an embedded runtime and
libraries.
? Put version/build info in code.
12. Upgrading Code
? Hot code loading for small, emergency
?xes.
? For new releases, reboot the node.
? Why not .appups?
? Systems I¨ve worked on have changed/
evolved too fast.
? A reboot is a good test of resiliency.
13. Debugging Running
Systems
? Remote Erlang shells are awesome, except
when distributed Erlang dies (it happens).
? run_erl (or even screen(1)) give you a
backdoor for when -remsh fails.
? rebar (http://hg.basho.com/rebar) makes
this easy.
? What if you don¨t have access to the box?
14. OPS - Other People¨s
Systems
? Your Erlang, Enterprise ?rewalls.
? Erlang shell is powerful, but scary.
? Provide a debugging module.
? Get data out via HTTP/SMTP/SNMP
? Use disk_log/report_browser.
15. Questions?
^You know you have [a distributed system]
when the crash of a computer you¨ve never
heard of stops you from getting any work
done ̄
-Leslie Lamport
16. Resources
? unsplit: http://github.com/uwiger/unsplit
? gen_leader: http://github.com/KirinDave/gen_leader_revival
? Dynamo: http://www.allthingsdistributed.com/2007/10/
amazons_dynamo.html
? Hans Svensson: Distributed Erlang Application Pitfalls and
Recipes: http://www.erlang.org/workshop/2007/proceedings/
06svenss.ppt
? Consistent Hashing and Random Trees: Distributed Caching
Protocols for relieving Hot Spots on the World Wide Web:
http://bit.ly/LewinConsistentHashing