Frank from Openminds discusses their DevOps challenge competition where the prize is beer. He explains how they use software-defined networking (SDN) and BGP routing to load balance traffic across multiple active nodes. Each load balancer has a service IP and health script to announce when it is handling or withdrawing the IP address. This allows the network to know which load balancers are active and distribute traffic accordingly.
12. 12
root@loadbalancer-001:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet 37.72.160.20/32 scope global lo
On each loadbalancer:
- service ip
- health script
- BGP software
13. 13
Am I healthy? Tell the network I know
how to handle 37.72.160.20
Am I unhealthy? Tell the network I
withdraw my knowledge about
37.72.160.20
14. 14
Active nodes announce virtual IP +
priority to multiple BGP routers
The network knows which
loadbalancers are up and runninng
Hi, I am Frank. I am a co-founder and Operations-boss at Openminds.
Openminds is a managed hosting company based here in Gent, we sponsor DevOpsdays
We run a game, a devops challenger at our both. No marketing, no recruiting. Just a game we created for you to have fun, exercise your brain. Bragg about how many points you have on Twitter. And its BaaS compliant: you can win Beer as a Service.
So lets talk redundancy and failover today.
We all fixed that problem. right? Just add more servers and load balance requests. Right?
Problem: how do we failover or scale the loadbalancers?
Just add more. Sure, but how to handle failover? How do we avoid the SPOF?
DNS round-robin? no. not really. This only works for some load spreading, not reliable enough for failover (Client dependent).
The classical answer is to use Keepalived, IPVS, Heartbeat or some other VRRP based system.
This works well, but has its limitations.
They all have the same issues:
only works in a layer 2 domain: so close networks, forget failovers to another datacenter
timing is essential. so high CPU loads leads to flaps
These solutions are usually based on Multicast and UDP, so no guarantee your election packets will actually arrive.
Very import problem is that the networking infrastructure is blind to the failover. Solutions relies on ARP flooding to tell the network the failover occurred.
I have a ton of great load-balancer failure stories. Come talk to me if you really want to hear them
Now is there a better way?
Lets talk about SDN. SDN is the network-guys part of DevOps (Dev NOps? DevOps-en?)
SDN in a greenfield implementation is great! Unicorns! Rainbows! World peace
But most of us have to work in a legacy environment (even if legacy is just 6 months ago).
But we can still use SDN concepts on existing networks:
BGP is an excellent networking protocol to use in such solutions.
How does it work? Each web-node has a virtual service IP attached to its local loopback interface.
Each node does a health-check, and if all is fine: announce a route to its virtual ip to a neighbour network device (or better: to a few network devices).
How does it work? Each web-node has a virtual service IP attached to its local loopback interface.
Each node does a health-check, and if all is fine: announce a route to its virtual ip to a neighbour network device (or better: to a few network devices).
Active nodes not only announce the service IP or virtual IP they carry, but also a priority
Network devices sees all online ones, and chooses one (on priority etc).
If a webnodes goes offline, or declares itself unhealthy, the router already knows the backup one
There are a few advantages to this approach. The biggest one is that (from a Protocol standpoint), its much, much easier.
Failover is faster and (more important than speed) more reliable (as there are no arp floods needed)
BGP is proven technology
There is no multicast, no udp, no single-layer2 domain. No Dragons, no cry!
Multiple datacenters? No problem! (As long as you have one network in place)
Are there disadvantages? Sure there are.
BGP is something you havent then done. So youll need to learn.
Your equipment needs to be able to handle it (most 500$+ switches are, use Linux/bsd box)
Service or Virtual ips needs to come from separate subnet, only for virtual ips
Other cool stuff you can do with these ideas:
Build your own 8.8.8.8 style services (great for dns, time)
Announce active dns recursors in multiple places in your network, each with same vip
So clients will connect closest one, but failover to other one if goes down.