際際滷

際際滷Share a Scribd company logo
Peter Schuurman (@pwschuurman)
Software Engineer / Google
2022-08-12
K8s Cluster Upgrade Strategies
Best Practices for your Stateful Workload
Agenda
 Why Upgrade?
 Stateful Workloads and Upgrades
 Nodepool Upgrade Strategies
 Control Plane Upgrade Strategies
 Upgrade Strategy and Workload Selection
Why Upgrade: Kubernetes Version Lifecycle
source
Version Skew
Kubernetes Version
Skew Policy maintains
support for 2 node minor
versions
New Features
New features are
introduced in upcoming
Kubernetes versions. Eg:
StatefulSet
MaxUnavailable was
introduced in 1.24.
Security
Compliance
Organizations following
compliance protocols
(PCI, HIPAA, FedRamp)
are required to apply
security patches within
30 days of availability
Patch Support
Kubernetes minor
versions are maintained
for 1 year
Why Upgrade: Modern and Protected
MariaDB has modernized their architecture by bringing
SkySQL to the cloud on Kubernetes. Built using the
Kubernetes operator pattern, MariaDB leverages
resiliency and maintains high availability during
upgrades.
We have been using containers for many years  Our goal
was to simplify the implementation and focus less on
lower-level infrastructure, dependencies and instance
life-cycle. With Kubernetes, our engineers could leverage
the strong momentum from the open source community
to drive infrastructure logic and security. (Reference)
Why Upgrade: Modern Applications
Why Upgrade: Upgrade Dimensions
Application Developer
Kubernetes Administrator
Cloud Platform
Why Upgrade: Upgrade Dimensions
Application Developer
Kubernetes Administrator
Cloud Platform
Why Upgrade: Upgrade Dimensions
Application Compatibility Nodes Control Plane
Ensuring your application is compatible
with an upgraded Kubernetes version
Kubernetes (node or control plane)
Upgrading the operating system,
dependant libraries and kubernetes
software of your clusters data plane
Upgrading the operating system and
kubernetes software of your clusters
orchestration layer
Why Upgrade: Key Concerns
Application
Availability
Cost Speed
Nodes: Surge Upgrades
 Application Availability: Suitable for fault-tolerant workloads.
Control availability by specifying node maxUnavailable
 Cost: Cost effective
 Speed: Increase upgrade velocity with parallel node surge
Nodes: Blue/Green Upgrades
 Application Availability: Granular
control during migration
 Cost: Increased cost with resource
pre-provisioning
 Speed: Slow and controlled
Node Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback scenarios make take
more time
High degree of application
availability
Cost Lower cost, upgraded node
creation occurs just in time
Higher cost, upgraded nodes
are pre-provisioned
Speed Nodes can be upgraded in
batches for increased speed
Higher control over node
migration reduces speed
Control Plane: Upgrades
 Kubernetes maintains API versions with each minor release
 API schema may change with new minor versions
Control Plane: Surge Upgrade
 Application Availability: HA control plane setups limit disruptions. Kubernetes minor
rollback is not supported
 Cost: Cost effective
 Speed: Fast
Control Plane: Blue/Green Upgrade
 Application Availability: Granular control over application upgrade. Safe minor version
rollback
 Cost: Increased cost over in-place upgrades with cluster pre-provisioning
 Speed: Slow and controlled
Control Plane: Blue/Green Upgrade
 KEP-3335: Introduces building blocks to the StatefulSet API to enable StatefulSet
replicas to be moved across clusters.
 With Kubernetes Multi-Cluster Services (KEP-1645), applications can maintain
connectivity
 Demo
Control Plane Upgrade Takeaways
Surge Upgrades Blue/Green Upgrades
Application Availability Rollback is not possible Applications can be rolled
back to a cluster with a
known compatible Control
Plane
Cost Lower cost, upgraded control
plane creation occurs just in
time
Higher cost, cluster
pre-provisioned
Speed Control Plane upgrade is fast
and scales sub-linearly as
cluster size increases
Upgrade speed scales with
application migration speed
Takeaways
 Trade-off between business requirements: application availability, speed and cost
 Modern applications update consistently and often
 Kubernetes has the tools to support safe stateful upgrades today, and the community is
building new tools to increase this margin of safety

More Related Content

Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload

  • 1. Peter Schuurman (@pwschuurman) Software Engineer / Google 2022-08-12 K8s Cluster Upgrade Strategies Best Practices for your Stateful Workload
  • 2. Agenda Why Upgrade? Stateful Workloads and Upgrades Nodepool Upgrade Strategies Control Plane Upgrade Strategies Upgrade Strategy and Workload Selection
  • 3. Why Upgrade: Kubernetes Version Lifecycle source
  • 4. Version Skew Kubernetes Version Skew Policy maintains support for 2 node minor versions New Features New features are introduced in upcoming Kubernetes versions. Eg: StatefulSet MaxUnavailable was introduced in 1.24. Security Compliance Organizations following compliance protocols (PCI, HIPAA, FedRamp) are required to apply security patches within 30 days of availability Patch Support Kubernetes minor versions are maintained for 1 year Why Upgrade: Modern and Protected
  • 5. MariaDB has modernized their architecture by bringing SkySQL to the cloud on Kubernetes. Built using the Kubernetes operator pattern, MariaDB leverages resiliency and maintains high availability during upgrades. We have been using containers for many years Our goal was to simplify the implementation and focus less on lower-level infrastructure, dependencies and instance life-cycle. With Kubernetes, our engineers could leverage the strong momentum from the open source community to drive infrastructure logic and security. (Reference) Why Upgrade: Modern Applications
  • 6. Why Upgrade: Upgrade Dimensions Application Developer Kubernetes Administrator Cloud Platform
  • 7. Why Upgrade: Upgrade Dimensions Application Developer Kubernetes Administrator Cloud Platform
  • 8. Why Upgrade: Upgrade Dimensions Application Compatibility Nodes Control Plane Ensuring your application is compatible with an upgraded Kubernetes version Kubernetes (node or control plane) Upgrading the operating system, dependant libraries and kubernetes software of your clusters data plane Upgrading the operating system and kubernetes software of your clusters orchestration layer
  • 9. Why Upgrade: Key Concerns Application Availability Cost Speed
  • 10. Nodes: Surge Upgrades Application Availability: Suitable for fault-tolerant workloads. Control availability by specifying node maxUnavailable Cost: Cost effective Speed: Increase upgrade velocity with parallel node surge
  • 11. Nodes: Blue/Green Upgrades Application Availability: Granular control during migration Cost: Increased cost with resource pre-provisioning Speed: Slow and controlled
  • 12. Node Upgrade Takeaways Surge Upgrades Blue/Green Upgrades Application Availability Rollback scenarios make take more time High degree of application availability Cost Lower cost, upgraded node creation occurs just in time Higher cost, upgraded nodes are pre-provisioned Speed Nodes can be upgraded in batches for increased speed Higher control over node migration reduces speed
  • 13. Control Plane: Upgrades Kubernetes maintains API versions with each minor release API schema may change with new minor versions
  • 14. Control Plane: Surge Upgrade Application Availability: HA control plane setups limit disruptions. Kubernetes minor rollback is not supported Cost: Cost effective Speed: Fast
  • 15. Control Plane: Blue/Green Upgrade Application Availability: Granular control over application upgrade. Safe minor version rollback Cost: Increased cost over in-place upgrades with cluster pre-provisioning Speed: Slow and controlled
  • 16. Control Plane: Blue/Green Upgrade KEP-3335: Introduces building blocks to the StatefulSet API to enable StatefulSet replicas to be moved across clusters. With Kubernetes Multi-Cluster Services (KEP-1645), applications can maintain connectivity Demo
  • 17. Control Plane Upgrade Takeaways Surge Upgrades Blue/Green Upgrades Application Availability Rollback is not possible Applications can be rolled back to a cluster with a known compatible Control Plane Cost Lower cost, upgraded control plane creation occurs just in time Higher cost, cluster pre-provisioned Speed Control Plane upgrade is fast and scales sub-linearly as cluster size increases Upgrade speed scales with application migration speed
  • 18. Takeaways Trade-off between business requirements: application availability, speed and cost Modern applications update consistently and often Kubernetes has the tools to support safe stateful upgrades today, and the community is building new tools to increase this margin of safety