際際滷

際際滷Share a Scribd company logo
Journey for Provisioning 20k Over
Rbd Volumes to Kubernetes
With Openstack #2
  ろ襴讌 
レ 揃   揃 煙
Contents
1 _ NAVER Rbd  
 
2 _ Kubernetes Plugin
Requirements
Public Rbd Plugin
Flexvolume Plugin
3 _ NAVER RBD Plugin
Multi Tenancy 讌
Multi Map
レ 
Storage Watcher
Volume QoS
Filesystem Trim
覈磯
1 _ NAVER Rbd
 
Kubernetes
NAVER Ceph Cluster
Elasticsearch Farm
NAVER In-House PaaS Platform
Jenkins Farm
Kafka Farm
......
 
24
Clusters
10PB
Largest Cluster Capacity
5PB
Total Used
26PB
Total Clusters Capacity
30K
Total RBD Volumes
 
Persistent Volume for Container
in Container Orchestrator, Kubernetes
VOL1 VOL2 VOL3
2 _ Kubernetes Plugin
Requirements 揃 Public Rbd Plugin 揃 Flexvolume Plugin
Requirements
則 Keystone support for multi tenancy /
authentication support
則 Cinder support for volume dynamic
provisioning / managing
則 QoS support to ensure performance for
public cluster
則 Easy to manage, customize, deploy
Public Rbd Plugin
Kubernetes In-tree Provisioner / Ceph CSI Plugin
則 Ceph standalone
則 Version dependency
則 Multi tenancy not supported
則 Customization / Upgrade
issue
https://kubernetes.io/ko/docs/concepts/storage/volumes/#rbd
https://github.com/ceph/ceph-csi
Flexvolume Plugin
則 Deprecated, But,,
則 In-tree 蠏語語 譟伎讌  ろ襴讌 轟 豢螳 ろ  蟆曙, 螳 螳覦
 Kubernetes 企ろ一 豢螳 螳
則 Worker 碁 Kubelet  螳 燕 Binary 襯 ろり, Response JSON 
牛  覦
則 Driver Only 蟲 / With Provisioner (PVC/PV 襦觜) 蟲煙朱 蟲 螳
則  覯るれ 蟲 ろ る 螻 (https://github.com/search?q=flexvolume)
VOLUME
Flexvolume Plugin
則 Fully Customizable 則 Simple Architecture
Attach
則 Check if volume is multi mapped
則 Check volume status
則 RBD map option
...
MountDevice
則 Mount option
Mount
Unmount, UmountDevice
Expandfs
Simple Driver
Binary
Plugin Directory
Kubelet
Node1
Node2 Node3
3 _ NAVER Rbd Plugin
Multi Tenancy 讌 揃 Multi Map 揃 レ  揃 Storage Watcher 揃 Filesystem Trim 揃 Volume QoS 揃 覈磯
Multi Tenancy 讌
Kubernetes
Blog Namespace Band Namespace
Cafe Namespace
Blog Project
Cafe Project
Band Project
煙
レ
 
USERS OPENSTACK
TENANTS
Multi Tenancy 讌
Create RBD Volume to proper tenant
For user who can access with right permission
則 Cronjob
則 Openstack ceph role can
manage specific tenants
volume
則 Cronjob assign ceph role for
newly created tenant
則 Provisioner
則 Namespace have tenant id
annotation
則 Provisioner provision create
volume in users tenant
dynamically
Multi Map
Multi map can occur filesystem corruption
則  Pod   Read / Write 
Filesystem Corruption 覦 螳
則 Filesystem Repair 螳 覃 (xfs_repair)
Log zeroing  一危  螳レ 譟伎
則 Ceph version: luminous
則 Ceph Driver  覲朱エ Attach 蠍  
 覲朱エ 譴語 誤   覦?
Multi Map
RBD Watcher & Lock
$ rbd status ${pool}/${image id}
則 Map  覲朱エ  Watcher 覲企ゼ 豢  
則 Watching Client 螳  == 覲朱エ 企 Map  
$ rbd lock add ${pool}/${image id} ${lock id}
$ rbd lock ls ${pool}/${image id}
則 襦 Lock  豢螳  
則 語ろ碁 Lock ID 襦
Multi Map
RBD Watcher & Lock
Attach
則 Check watcher (for readwrite volume)
則 Check lock (for readwrite volume)
則 Map
則 Add lock for rbd image
MountDevice
Mount
Unmount, UmountDevice
Expandfs
則 Watcher  Lock  企 譟伎る 蠏 Attach 襯
谿,  襴
則 觜   蟆曙  覦
レ 
Node1
Kubernetes Cluster
Pod 1
VOL1
Ceph Cluster
/dev/rbd0
Node2 Node3
kubelet kubelet kubelet
レ 
Node1
Kubernetes Cluster
Pod 1
VOL1
Ceph Cluster
/dev/rbd0
Node2 Node3
kubelet kubelet kubelet
Pod 1
則 Node1  Kubelet  企り
則 豈 磯 Pod1  Node2 襦 れ譴襷
則 覲朱エ?
!! Orphaned Volume
レ 
Node1
Kubernetes Cluster
Pod 1
VOL1
Ceph Cluster
/dev/rbd0
Node2 Node3
kubelet kubelet kubelet
Pod 1
則 Node1  レ襯 螳讌
則 殊 螳 レ 讌  OSD Blacklist 語
則 Node1  覲朱エ Garbage Device螳 覃, 蠏 Node2  襷危 螳
!! Black Listed
/dev/rbd0
Garbage device  豢 Cleanup
Storage Watcher 螳
Node1
Pod 1
VOL1
Ceph Cluster
/dev/rbd0
kubelet
則 碁 / 覲朱エ 伎  覈磯, Volume trim, QoS  
覿螳 れ 蟯襴  Sidecar 貉企
則 Daemonset 朱 Driver  蟷 Pod 朱 
- Detector
- Collector
- Limiter
- TrimExecutor
Storage Watcher
VOL2
Orphaned Volume
Filesystem Trim
Filesystem trim is required to use
available capacity More efficiently
Node1 Ceph Cluster
/dev/rbd0
Storage Watcher
Pod 1
Filesystem Trim
Node1 Ceph Cluster
/dev/rbd0
Storage Watcher
Pod 1
$ fstrim 足v ${mountpoint}
則 Storage Watcher Trimexecutor 螳  襷 Filesystem Trim  
則 Trim  IO 襯 殊狩る, Storage Watcher  Daemonset 朱 覈 Node  
蠍 覓語, Trim  螳  覿壱蟆 譴 (Hostname hash)
Volume QoS
則 覲朱エ (PV) 覲 iops / bps  讌 Kubernetes  襷 殊螳 殊企螻 
則 https://github.com/kubernetes/enhancements/pull/1907
則  螻旧 ろ Ceph 企ろ一 QoS 螳
Volume QoS
Linux Control Groups
https://selfish-developer.com/entry/Cgroup-Control-Group
則 襦語るれ  (CPU, 覃覈襴, ろ 豢,
ろ語 ) 螻 蟆襴る 襴 貉る 蠍磯
則 襦語る Device  IOPS / Throughput  螳
Volume QoS
Node1
Storage Watcher
Pod 1
/dev/rbd0
1.  Node   Pod 襴ろ 讌
2. Pod   Container 襴ろ 讌
3. Container config  (config.v2.json) 朱覿 譴
襷危 誤 / 覦伎 讌
4. 覦伎れ major / minor 覯 讌
5. 貉企 襦語れ  iops / throughput  れ
Limiter
覈磯
+ Storage Watcher Collector
則 Node exporter  Device 焔,  讌襯 螻牛
則 Storage Watcher  Device 襯 企 Pod , 企 Namespace  譴語 
 覃一危 襷ろ碁Ν 螻牛
則  襷ろ碁Ν Join  豕譬 蟆 螻
Thanks! End of Documents.

More Related Content

Journey for provisioning 20k over rbd volumes to kubernetes with openstack

  • 1. Journey for Provisioning 20k Over Rbd Volumes to Kubernetes With Openstack #2 ろ襴讌 レ 揃 揃 煙
  • 2. Contents 1 _ NAVER Rbd 2 _ Kubernetes Plugin Requirements Public Rbd Plugin Flexvolume Plugin 3 _ NAVER RBD Plugin Multi Tenancy 讌 Multi Map レ Storage Watcher Volume QoS Filesystem Trim 覈磯
  • 3. 1 _ NAVER Rbd
  • 4. Kubernetes NAVER Ceph Cluster Elasticsearch Farm NAVER In-House PaaS Platform Jenkins Farm Kafka Farm ......
  • 5. 24 Clusters 10PB Largest Cluster Capacity 5PB Total Used 26PB Total Clusters Capacity 30K Total RBD Volumes
  • 6. Persistent Volume for Container in Container Orchestrator, Kubernetes VOL1 VOL2 VOL3
  • 7. 2 _ Kubernetes Plugin Requirements 揃 Public Rbd Plugin 揃 Flexvolume Plugin
  • 8. Requirements 則 Keystone support for multi tenancy / authentication support 則 Cinder support for volume dynamic provisioning / managing 則 QoS support to ensure performance for public cluster 則 Easy to manage, customize, deploy
  • 9. Public Rbd Plugin Kubernetes In-tree Provisioner / Ceph CSI Plugin 則 Ceph standalone 則 Version dependency 則 Multi tenancy not supported 則 Customization / Upgrade issue https://kubernetes.io/ko/docs/concepts/storage/volumes/#rbd https://github.com/ceph/ceph-csi
  • 10. Flexvolume Plugin 則 Deprecated, But,, 則 In-tree 蠏語語 譟伎讌 ろ襴讌 轟 豢螳 ろ 蟆曙, 螳 螳覦 Kubernetes 企ろ一 豢螳 螳 則 Worker 碁 Kubelet 螳 燕 Binary 襯 ろり, Response JSON 牛 覦 則 Driver Only 蟲 / With Provisioner (PVC/PV 襦觜) 蟲煙朱 蟲 螳 則 覯るれ 蟲 ろ る 螻 (https://github.com/search?q=flexvolume) VOLUME
  • 11. Flexvolume Plugin 則 Fully Customizable 則 Simple Architecture Attach 則 Check if volume is multi mapped 則 Check volume status 則 RBD map option ... MountDevice 則 Mount option Mount Unmount, UmountDevice Expandfs Simple Driver Binary Plugin Directory Kubelet Node1 Node2 Node3
  • 12. 3 _ NAVER Rbd Plugin Multi Tenancy 讌 揃 Multi Map 揃 レ 揃 Storage Watcher 揃 Filesystem Trim 揃 Volume QoS 揃 覈磯
  • 13. Multi Tenancy 讌 Kubernetes Blog Namespace Band Namespace Cafe Namespace Blog Project Cafe Project Band Project 煙 レ USERS OPENSTACK TENANTS
  • 14. Multi Tenancy 讌 Create RBD Volume to proper tenant For user who can access with right permission 則 Cronjob 則 Openstack ceph role can manage specific tenants volume 則 Cronjob assign ceph role for newly created tenant 則 Provisioner 則 Namespace have tenant id annotation 則 Provisioner provision create volume in users tenant dynamically
  • 15. Multi Map Multi map can occur filesystem corruption 則 Pod Read / Write Filesystem Corruption 覦 螳 則 Filesystem Repair 螳 覃 (xfs_repair) Log zeroing 一危 螳レ 譟伎 則 Ceph version: luminous 則 Ceph Driver 覲朱エ Attach 蠍 覲朱エ 譴語 誤 覦?
  • 16. Multi Map RBD Watcher & Lock $ rbd status ${pool}/${image id} 則 Map 覲朱エ Watcher 覲企ゼ 豢 則 Watching Client 螳 == 覲朱エ 企 Map $ rbd lock add ${pool}/${image id} ${lock id} $ rbd lock ls ${pool}/${image id} 則 襦 Lock 豢螳 則 語ろ碁 Lock ID 襦
  • 17. Multi Map RBD Watcher & Lock Attach 則 Check watcher (for readwrite volume) 則 Check lock (for readwrite volume) 則 Map 則 Add lock for rbd image MountDevice Mount Unmount, UmountDevice Expandfs 則 Watcher Lock 企 譟伎る 蠏 Attach 襯 谿, 襴 則 觜 蟆曙 覦
  • 18. レ Node1 Kubernetes Cluster Pod 1 VOL1 Ceph Cluster /dev/rbd0 Node2 Node3 kubelet kubelet kubelet
  • 19. レ Node1 Kubernetes Cluster Pod 1 VOL1 Ceph Cluster /dev/rbd0 Node2 Node3 kubelet kubelet kubelet Pod 1 則 Node1 Kubelet 企り 則 豈 磯 Pod1 Node2 襦 れ譴襷 則 覲朱エ? !! Orphaned Volume
  • 20. レ Node1 Kubernetes Cluster Pod 1 VOL1 Ceph Cluster /dev/rbd0 Node2 Node3 kubelet kubelet kubelet Pod 1 則 Node1 レ襯 螳讌 則 殊 螳 レ 讌 OSD Blacklist 語 則 Node1 覲朱エ Garbage Device螳 覃, 蠏 Node2 襷危 螳 !! Black Listed /dev/rbd0 Garbage device 豢 Cleanup
  • 21. Storage Watcher 螳 Node1 Pod 1 VOL1 Ceph Cluster /dev/rbd0 kubelet 則 碁 / 覲朱エ 伎 覈磯, Volume trim, QoS 覿螳 れ 蟯襴 Sidecar 貉企 則 Daemonset 朱 Driver 蟷 Pod 朱 - Detector - Collector - Limiter - TrimExecutor Storage Watcher VOL2 Orphaned Volume
  • 22. Filesystem Trim Filesystem trim is required to use available capacity More efficiently Node1 Ceph Cluster /dev/rbd0 Storage Watcher Pod 1
  • 23. Filesystem Trim Node1 Ceph Cluster /dev/rbd0 Storage Watcher Pod 1 $ fstrim 足v ${mountpoint} 則 Storage Watcher Trimexecutor 螳 襷 Filesystem Trim 則 Trim IO 襯 殊狩る, Storage Watcher Daemonset 朱 覈 Node 蠍 覓語, Trim 螳 覿壱蟆 譴 (Hostname hash)
  • 24. Volume QoS 則 覲朱エ (PV) 覲 iops / bps 讌 Kubernetes 襷 殊螳 殊企螻 則 https://github.com/kubernetes/enhancements/pull/1907 則 螻旧 ろ Ceph 企ろ一 QoS 螳
  • 25. Volume QoS Linux Control Groups https://selfish-developer.com/entry/Cgroup-Control-Group 則 襦語るれ (CPU, 覃覈襴, ろ 豢, ろ語 ) 螻 蟆襴る 襴 貉る 蠍磯 則 襦語る Device IOPS / Throughput 螳
  • 26. Volume QoS Node1 Storage Watcher Pod 1 /dev/rbd0 1. Node Pod 襴ろ 讌 2. Pod Container 襴ろ 讌 3. Container config (config.v2.json) 朱覿 譴 襷危 誤 / 覦伎 讌 4. 覦伎れ major / minor 覯 讌 5. 貉企 襦語れ iops / throughput れ Limiter
  • 27. 覈磯 + Storage Watcher Collector 則 Node exporter Device 焔, 讌襯 螻牛 則 Storage Watcher Device 襯 企 Pod , 企 Namespace 譴語 覃一危 襷ろ碁Ν 螻牛 則 襷ろ碁Ν Join 豕譬 蟆 螻
  • 28. Thanks! End of Documents.