�ݺ�ߣ

Demystifying
etcd failure scenarios
for Kubernetes
By William Caban
1
@williamcaban

Kubernetes Control-Plane & etcd
3
W W
S S S
W
S W S W S W
Multi Node Cluster
Compact Cluster
S W
All-in-One K8s
W W W
Multi Node Cluster
S W S W S W
kube-apiserver
kube-scheduler
kube-controller-manager
cloud-controller-manager
container runtime
kubelet
Kubernetes Architectures
A
B
C
D
K8s Control Plane
(Supervisor role)

4
Etcd Redundancy vs Performance
Failure
Tolerance
x 2
x 1
x 0
x 0
Write
Performance
High
Low
Required Active
Quorum Size
Low
High
Redundancy
Low
High 3
2
2
1

5
The life of a write on etcd
1. No leader 2. The election & vote 3. Leader coordinate the
writes
4. For ��Set Foo=bar��. Leader
writes into log entry
Foo=bar
5. Replicate ��Foo=bar�� to
follower nodes
Foo=bar Foo=bar
Foo=bar
6. Leader waits for majority
to write the entry to commit
Foo=bar Foo=bar
Foo=bar
7. Leader noti?es followers
entry is committed
Foo=bar Foo=bar
Foo=bar
8. Leader send regular role
noti?cations to followers
Foo=bar Foo=bar
Foo=bar

Writing to etcd via a Leader
(etcd client)
A C
(Follower)
(Leader)
(write ��foo��)
B(Follower)
1
Wait while I work��
2
Write to my Raft log
Send to Followers
4
3
Send acknowledgement
6
7
Send acknowledgement
6
7
Wait for ack
Ack to client
8
5
Send acknowledgement to
client and close session
6
(write
��foo=bar��)
9

Writing to etcd via a Follower
(etcd client)
A C
(Follower) (Leader)
(write ��foo=bar��)
I��m not the leader.
Let me forward that to ��C��.
B
(Follower)
1
7
(proxied write requests)
7
2
3
4
5
6

9
�� Critical etcd timers settings:
�� HEARTBEAT_INTERVAL (100ms)
�� Frequency with which the Leader will notify
Followers that it is still the Leader
�� ELECTION_TIMEOUT (1000ms)
�� How long a Follower node will wait without hearing
a heartbeat before attempting to become Leader
itself.
Why the Critical ETCD Timers?
Best Practices
Heartbeat Interval
? < max(RTT) between members
? Too low increase CPU and network usage
? Too high leads to high election timeout
? slower to recover and detect
failures
Election Timeout
? 10 times the HEARTBEAT_INTERVAL

Why the Hardware Speci?cations?
10
CPU RAM DISK
2 to 4 cores
8 to 16 cores
MINIMUM
PRODUCTION
8 GB
16GB to 64GB
< 30ms latency
< 10ms latency
Introducing the Magic Latency Formula for ETCD latency pro?les��
Effective Latency = Disk Latency + Max(Jitter(Disk Latency)) + Network RTT + Max(Network Jitter)
Note: To maintain etcd stability at scale, the E?ective Latency must be well below < Election Timeout

Myth Collection 1
11
Myth: We can use stretched control-plane for Kubernetes:
�� without impact in performance
�� for high availability
�� as a highly available Kubernetes design
What happens with failures?
? High Network Latency
? High Disk Latency
? Client to Leader Latency
? Cross-site Disconnection
? Kube-apiserver transaction rate?
? Memory utilization due to etcd
fragmentation?

Myth Collection 2
12
Myth: We can use backups of etcd to:
�� Restore Kubernetes in case of disaster recovery
�� Rollback Kubernetes
�� To recover the applications running in the cluster
What happens with failures?
? Cluster identity?
? Certi?cates?
? ETCD peer certi?cates?
? ETCD identity?
? Persistent storage?
? API Schema Version?
Manifest and other K8s objects
Container image
PersistentVolumeClaim
PersistentVolume
CSI-enabled storage backend
Kubernetes Application
Stack (Pods, Manifests,
Storage mappings, etc)
VS.

13
ETCD Failure Modes
https://etcd.io/docs/v3.5/op-guide/failures/
Leader failure
Follower failure
Majority failure
Majority failure
Network Partition
Network Partition

14
What to Remember about etcd?

Enjoy the rest of
the event!
Image by https://www.opsramp.com/guides/why-kubernetes/who-made-kubernetes/
15

�ݺ�ߣ

[KCD GT 2023] Demystifying etcd failure scenarios for Kubernetes.pdf

More Related Content

[KCD GT 2023] Demystifying etcd failure scenarios for Kubernetes.pdf