際際滷

際際滷Share a Scribd company logo
Jabayo on Kubeflow
(譯) れ -
2. Project 螳
Jabayo on Kubeflow
伎 襴 觜
2. Project 螳 - () Jabayo
-  伎  襴 觜, Video understanding 覿
- 覓伎誤ク / 企Π伎  襦
- 危蟲 煙 蟲 螻 
- 譴 覯れ 蠍一覿 R&D 螻殊 + NIPA (IPA) 讌 (螻煙 企)
2. Project 螳 - () Jabayo
磯
ML Toolkit for kubernetes
2. Project 螳 - on Kubeflow
Kubeflow  貉危
2. Project 螳 - on Kubeflow
Jupyter Notebook Easy GPU provisioning
Katb 危狩朱誤 
Pipeline ML Workflow  ()
Fairing building, training, and deploying ML models in a hybrid cloud env
TFJob, PyTorchJob CRD
Seldon, TF Serving Model serving
Kubeflow 0.5 Release!
- kfctl襦 譬  讌 kubeflow れ 覦 れ 螻殊
- Kubeflow hub UI 螳螻 Notebook UI螳 譬  蠍 所 覲蟆
- 誤碁 覲朱エ 襷危   蟆 !
- Fairing : 襦貉/ 企殊磯 蟆曙 所 KF component submit  
library 螻
2. Project 螳 - on Kubeflow
DL team Infra team Output
0 day -ろ 覈 覲
-朱 蟲
-朱襴 一危 
-static GPU 
-Kubernetes on prem
-Kubeflow install
-Worker node drain
-Jupyter notebook
-Two stream  覈
-Kubernetes / Kubeflow on prem infra
-Storage(MINIO)
#1 -覈 覲 襭
-れ启
-貉れろ 一危 覲 覦 讌
-Kubeflow 襴觀
-Katib
-Pipeline
-Scaffold
-Custom dataset & models
#2 -Katib  危狩朱誤

-覃碁Ν 螳煙 襴 

-Pipeline - 貉れろ 一危一
襦
-Kubeflow 襴觀
-Fairing
-Seldon
-Custom dataset & models
-Data pipeline
#3 -Watch 覃碁Ν, 豢襦 
-Fairing (ろ)
-Multi-cluster 襴觀
-Storage sync 襴觀
(Rok)
-Watch pipeline run
#4 -覈  / 觚  -open-rok
3. 120 days log (3~)
- 覈 覲 (Two stream )
- 朱襴一危 + 覈 螳覦
- 覯 螳 覓語 覓語 願屋
- Worker node drain
- 譟伎煙 豕
- Kubeflow 曙 覈 郁規 レ 譯殊 襦
- Kubernetes(Kubeflow) on prem infra with storage (minio)
4. Share - 0 day (3)
DL team Infra team Output
0 day -ろ 覈 覲
-朱 蟲
-朱襴 一危 
-static GPU 
-Kubernetes on prem
-Kubeflow install
-Worker node drain
-Jupyter notebook
-覈誤覲 (Two stream )
-Kube / Kubeflow on prem infra with
storage(minio)
DL team Infra team Output
0 day -ろ 覈 覲
-朱 蟲
-朱襴 一危 
-static GPU 
-Kubernetes on prem
-Kubeflow install
-Worker node drain
-Jupyter notebook
-Two stream  覈
-Kubernetes / Kubeflow on prem infra
-Storage(MINIO)
#1 -覈 覲 襭
-れ启
-貉れろ 一危 覲 覦 讌
-Kubeflow 襴觀
-Katib
-Pipeline
-Jupyter notebook
-Scaffold
-Custom dataset & models
#2 -Katib  危狩朱誤

-覃碁Ν 螳煙 襴 

-Pipeline - 貉れろ 一危一
襦
-Kubeflow 襴觀
-Fairing
-Seldon
-Custom dataset & models
-Data pipeline
-Kubeflow review report 2 (showcase)
#3 -Watch 覃碁Ν, 豢襦 
-Fairing (ろ)
-Multi-cluster 襴觀
-Storage sync 襴觀
(Rok)
-Watch pipeline run
#4 -覈  / 觚  -open-rok
(Index) 120 days log (4)
- Jupyter notebook
- 譴觜 襭 & 
- Scaffold
- DL team - Infra team螳  , DL team螳  覦 螻旧  螳讌
- Scaffold & quick start
- 貉れろ 一危
- 朱 一危
-  蠍 譟
4. Share - sprint #1
DL team Infra team Output
#1 -覈 覲 襭
-れ启
-貉れろ 一危 覲 覦 讌
-Kubeflow 襴觀
-Katib
-Pipeline
-Jupyter notebook
-Scaffold
-Custom dataset & models
[Open-infradays 2019 Korea] jabayo on Kubeflow
1.1. Jupyter Notebook : Easy GPU Node provisioning
Tensorflow + Fairing + Kubectl = Kubeflow Jupyter Notebook
れ襷 覃 至  豌覯讌 企
4. Share - sprint #1
[Open-infradays 2019 Korea] jabayo on Kubeflow
1.2. Scaffold
-   朱 .
- 覈 ろ 覦朱 磯Μ 襷蟆 朱 貉れろ
http://drivendata.github.io/cookiecutter-data-science/
谿瑚)  git-flow 貊襴觀 伎 伎 :)
4. Share - sprint #1
1.2. Scaffold
4. Share - sprint #1
DL team Infra team Output
0 day -ろ 覈 覲
-朱 蟲
-朱襴 一危 
-static GPU 
-Kubernetes on prem
-Kubeflow install
-Worker node drain
-Jupyter notebook
-Two stream  覈
-Kubernetes / Kubeflow on prem infra
-Storage(MINIO)
#1 -覈 覲 襭
-れ启
-貉れろ 一危 覲 覦 讌
-Kubeflow 襴觀
-Katib
-Pipeline
-Scaffold
-Custom dataset & models
-Kubeflow review report 1 (showcase)
#2 -Katib  危狩朱誤

-Pipeline 一危一 襦
-Kubeflow 襴觀
-Fairing
-Seldon
-Katib
-Data pipeline (one-off)
-Custom dataset & models
#3 -Watch 覃碁Ν, 豢襦 
-Fairing (ろ)
-Multi-cluster 襴觀
-Storage sync 襴觀
(Rok)
-Watch pipeline run
#4 -覈  / 觚  -open-rok
(Index) 120 days log (5)
4. Share - sprint #2
DL team Infra team Output
#2 -Katib  危狩朱誤

-Pipeline 一危一 襦
-Kubeflow 襴觀
-Fairing
-Seldon
-Katib
-Data pipeline (one-off)
-Custom dataset & models
- Katib襯 伎 危狩朱誤壱
- Data pipeline  (One-off run )
[Open-infradays 2019 Korea] jabayo on Kubeflow
2.1. Katib 螳: 危狩朱誤 
- 危狩朱誤 ?
覈 旧 れ 覲 螳 豕 蟆郁骸襦   蟆 譟一 
Learning Rate, Dropout Rate, hidden layer , batch size, activation function 焔奄
- Katib(secretary) 覈語 Google vizier(prime minister) 豢覦
- 蟆 螻襴讀 蠍一朱 覈 豺蟾讌 螻 Train job ろ 蠏 蟆郁骸襯 讌.
螳讌 蟆 螻襴讀 螻 : Grid, Random, Bayesian, hyperbans, nasrl
- StudyJob企朱 Custom Kubernetes Resource蠍 覓語 yaml CRD 襦  螳
- 覈  覲蟆曙  Metric Collector螳 讌 Log 襷 覲蟆渚 
- Worker, Metric Collector  螳 貉危碁 ろ覃, 炎ロ UI伎 螻
4. Share - sprint #2
2.1. Katib  襴 : MetricCollectorPod 襦蠏 讌 (襦蠏 覃 key=value)
4. Share - sprint #2
2.1. Katib StudyJob Apply
- Validation accuracy 0.5 覈
- 30 requestcount :  Gpu Node(worker) 30覯 ろ
- parameter れ
batch_size, learning_rate, dropout, optimizer
- Search Algorithm : random
- requestNumber: 2 -> 2螳 Node(worker)  讌
=> 豐 60螳 Trial ろ
 螻 願啓 !
4. Share - sprint #2
2.1. Katib Dashboard
4. Share - sprint #2
2.1. Katib 危狩朱誤 
- 0.5  ろ : 0.35
optimizer螳 SGD  蟆曙  val-acc 觜
Dropout 0.3 0.4 蟆曙  val-acc 觜
- れ 朱誤 
optimizer襯 SGD襦 螻 ろ
dropout 豕螳 0.3朱 
...
0.48 -> 0.66 -> 0.77
4. Share - sprint #2
蠏朱, るジ  讌 ?
2.1. Katib 危狩朱誤 
蠏朱 るジ  讌 ?
- 貅殊  (https://github.com/keras-team/keras-tuner)
- Hyperas (https://github.com/maxpumperla/hyperas)
- Kopt (https://github.com/Avsecz/kopt)
4. Share - sprint #2
貅殊 
Hyperas
2.1. Katib 危狩朱誤 
- 貎覯ろ一 豺伎 
- 蠍一ヾ 覈語れ  ...
4. Share - sprint #2
貅殊 
Hyperas
2.1. Katib 危狩朱誤 
- 豺危磯 on kubernetes!
-   Hyperparameter ル朱襷 
- 蠍一ヾ 覈語れ  Log 襷 覲蟆渚覃 
(Keras Log callback 襷 豢螳覃 !)
4. Share - sprint #2
2.1. Katib 襴觀
GOOD!
- hyperparameter tuning tool WITHOUT CODE CHANGE!
- 螻 蠍磯るΜ螻 螻 蠍磯るΜ螻 襯  譯手鍵 覓語 豌願螳 豢 -> るジ   
 !
- Random Search襷朱 蟯谿  螳ロ.
- Train 豌企ゼ Katib ろ蟆 .
BAD!
- UI螳   豺伎  : 螻牛 Rest API襯 蟆  ク.
-  螳 蟆螻襴讀 讌讌襷 讌 Grid, Random 襷螻 蠍郁 曙螳 .
- Hyperparameter 覯襯 int, double, categorical, discrete襯 讌讌襷 Pair襦 讌   蟆

- 覈 Metric 襷 れ  
- UI襯 牛 燕 StudyJob 朱 Metric Collector 讌讌 螳 .
4. Share - sprint #2
[Open-infradays 2019 Korea] jabayo on Kubeflow
2.2. Pipeline : ML Pipelines for Kubeflow
Cloud native ML Workflow 燕 ()
- Argo 
- REST api 讌
- Pipeline SDK 螻
Jupyter Notebook pipeline SDK れ  Pipeline / 覦壱 螳
- 蟯襴襯  UI 螻
- Pipeline, Experiments, Run朱 蟲
- One-off , Recurring  Run 讌
4. Share - sprint #2
2.2. Pipeline - One-off Job 1. 一危一 燕 Pipeline (1) Pipeline
4. Share - sprint #2
Compile
create_pipeline.py
$ dsl-compile --py create_pipeline.py --output create_pipeline.tar.gz
Upload
create_pipeline.tar.gz
2.2. Pipeline - One-off Job 1. 一危一 燕 Pipeline (2) Run
4. Share - sprint #2
Run  Run ろ
2.2. Pipeline 襴觀
GOOD!
- SDK襯 螻給蠍 覓語 所 pipeline 煙 螳ロ.
- recurring run (cron) 螳ロ蠍 覓語 れ  螳ロ.
-  ろ螳 螻 Docker container願鍵 覓語 危殊 豌伎 dependency
.
BAD!
- 螻牛伎朱  豌企 GCP蠍一朱   GCP 蟆曙 朱  ろ
企れ. るジ 貉危碁る 襷谿螳讌
- 焔 危殊碁   蠍 覓語 覲蟆曙 襦 燕伎 . API語 覈螻
coupling 伎 覿ク.
- Recurring Run ろ 襭 Run 蟯襴 .
- 蠏碁讌 朱 譬襭 , 豌螳 貉企螳 蟆
- Cause Exception. getKubeletContainers failed: rpc error
- Node status : Not Ready
4. Share - sprint #2
DL team Infra team Output
0 day -ろ 覈 覲
-朱 蟲
-朱襴 一危 
-static GPU 
-Kubernetes on prem
-Kubeflow install
-Worker node drain
-Jupyter notebook
-Two stream  覈
-Kubernetes / Kubeflow on prem infra
-Storage(MINIO)
#1 -覈 覲 襭
-れ启
-貉れろ 一危 覲 覦 讌
-Kubeflow 襴觀
-Katib
-Pipeline
-Scaffold
-Custom dataset & models
-Kubeflow review report 1 (showcase)
#2 -Katib  危狩朱誤

-Pipeline 一危一 襦
-Kubeflow 襴觀
-Fairing
-Seldon
-Katib
-Data pipeline (one-off)
-Custom dataset & models
#3 -Watch 覃碁Ν, 豢襦 
-Fairing (ろ)
-Multi-cluster 襴觀
-Storage sync 襴觀
(Rok)
-Watch pipeline run
#4 -覈  / 觚  -open-rok
(Index) 120 days log (6)
DL team Infra team Output
#3 -Pipeline - 覈 覦壱 覦 豢襦

-Fairing (ろ)
-Multi-cluster 襴觀
-Storage sync 襴觀
(Rok)
-Watch pipeline run
4. Share - sprint #3
- Watch pipeline run
- 覓 覿ク Katib UI 螳 讀
- watch + action + slack ui
- Fairing
- ろ
- on prem contribute 螻覩
- GPU scale out 讀, 覃壱企ろ 襴豺
[Open-infradays 2019 Korea] jabayo on Kubeflow
4. Share - sprint #3
3.1. Recurring run - Watch Katib (accuracy)
4. Share - sprint #3
3.1. Recurring run - Watch Katib (dataset)
3.1. Recurring run - Demo
4. Share - sprint #3
[Open-infradays 2019 Korea] jabayo on Kubeflow
3.2. Fairing : Python SDK for building, training, and deploying ML models
 ML Model れ (local, cloud) Train / Deploy   殊企襴
4. Share - sprint #3
Check fairing-example.html
3.2. Fairing : Python SDK for building, training, and deploying ML models
Fairing on Prem
- 螳 燕 Train model襯 docker image 襦 覲 Train job朱 submit
- Docker image 煙 kaniko襯 
- kaniko 企語 煙 Context レ襯 GCS, S3, Local, Git襷 蠍 覓語
(S3 AWS襷 螳) on prem 蟆曙  Local Dir 伎 企語 煙 螳
- But,  fairing stable 覯殊 Cluster Context Source螳 GCS, S3襷  螳
4. Share - sprint #3
3.2. Fairing
4. Share - sprint #3
讌
GCS_Context
S3_Context
襷 讌.
[Open-infradays 2019 Korea] jabayo on Kubeflow
3.2. Fairing
OnPrem Context support
- Context 伎 ロ Storage -> 危殊語  Minio襯 ?
- Minio client襯  Upload! -> Kaniko minio bucket 襷危碁 PV襯 local:// 襦 context 襦
5. Roadmap - Fairing on prem context support
3.2. Fairing
Discuss fairing On-prem on kubeflow fairing slack channel
5. Roadmap - Fairing on prem context support
on prem
GCP
control cluster
GCS
NAS
sync
open-rok
Custom schedule
No code change
No code change
worker cluster
5. Roadmap - Katib cloud bursting support
Appendix - Kubecon Europe 2019 (Spain)
# Stateful(storage), Security, Multi cluster
We are hiring!
れ - AI Service Accelerator
- Kubeflow, Kubernetes, Cloud native ecosystem
- AI Service, AI toolkits
- Open source project襯 牛 炎概 襦襯 蠖蠑碁...
shhong@dudaji.com

More Related Content

[Open-infradays 2019 Korea] jabayo on Kubeflow

  • 2. 2. Project 螳 Jabayo on Kubeflow 伎 襴 觜
  • 3. 2. Project 螳 - () Jabayo - 伎 襴 觜, Video understanding 覿 - 覓伎誤ク / 企Π伎 襦 - 危蟲 煙 蟲 螻 - 譴 覯れ 蠍一覿 R&D 螻殊 + NIPA (IPA) 讌 (螻煙 企)
  • 4. 2. Project 螳 - () Jabayo 磯
  • 5. ML Toolkit for kubernetes 2. Project 螳 - on Kubeflow
  • 6. Kubeflow 貉危 2. Project 螳 - on Kubeflow Jupyter Notebook Easy GPU provisioning Katb 危狩朱誤 Pipeline ML Workflow () Fairing building, training, and deploying ML models in a hybrid cloud env TFJob, PyTorchJob CRD Seldon, TF Serving Model serving
  • 7. Kubeflow 0.5 Release! - kfctl襦 譬 讌 kubeflow れ 覦 れ 螻殊 - Kubeflow hub UI 螳螻 Notebook UI螳 譬 蠍 所 覲蟆 - 誤碁 覲朱エ 襷危 蟆 ! - Fairing : 襦貉/ 企殊磯 蟆曙 所 KF component submit library 螻 2. Project 螳 - on Kubeflow
  • 8. DL team Infra team Output 0 day -ろ 覈 覲 -朱 蟲 -朱襴 一危 -static GPU -Kubernetes on prem -Kubeflow install -Worker node drain -Jupyter notebook -Two stream 覈 -Kubernetes / Kubeflow on prem infra -Storage(MINIO) #1 -覈 覲 襭 -れ启 -貉れろ 一危 覲 覦 讌 -Kubeflow 襴觀 -Katib -Pipeline -Scaffold -Custom dataset & models #2 -Katib 危狩朱誤 -覃碁Ν 螳煙 襴 -Pipeline - 貉れろ 一危一 襦 -Kubeflow 襴觀 -Fairing -Seldon -Custom dataset & models -Data pipeline #3 -Watch 覃碁Ν, 豢襦 -Fairing (ろ) -Multi-cluster 襴觀 -Storage sync 襴觀 (Rok) -Watch pipeline run #4 -覈 / 觚 -open-rok 3. 120 days log (3~)
  • 9. - 覈 覲 (Two stream ) - 朱襴一危 + 覈 螳覦 - 覯 螳 覓語 覓語 願屋 - Worker node drain - 譟伎煙 豕 - Kubeflow 曙 覈 郁規 レ 譯殊 襦 - Kubernetes(Kubeflow) on prem infra with storage (minio) 4. Share - 0 day (3) DL team Infra team Output 0 day -ろ 覈 覲 -朱 蟲 -朱襴 一危 -static GPU -Kubernetes on prem -Kubeflow install -Worker node drain -Jupyter notebook -覈誤覲 (Two stream ) -Kube / Kubeflow on prem infra with storage(minio)
  • 10. DL team Infra team Output 0 day -ろ 覈 覲 -朱 蟲 -朱襴 一危 -static GPU -Kubernetes on prem -Kubeflow install -Worker node drain -Jupyter notebook -Two stream 覈 -Kubernetes / Kubeflow on prem infra -Storage(MINIO) #1 -覈 覲 襭 -れ启 -貉れろ 一危 覲 覦 讌 -Kubeflow 襴觀 -Katib -Pipeline -Jupyter notebook -Scaffold -Custom dataset & models #2 -Katib 危狩朱誤 -覃碁Ν 螳煙 襴 -Pipeline - 貉れろ 一危一 襦 -Kubeflow 襴觀 -Fairing -Seldon -Custom dataset & models -Data pipeline -Kubeflow review report 2 (showcase) #3 -Watch 覃碁Ν, 豢襦 -Fairing (ろ) -Multi-cluster 襴觀 -Storage sync 襴觀 (Rok) -Watch pipeline run #4 -覈 / 觚 -open-rok (Index) 120 days log (4)
  • 11. - Jupyter notebook - 譴觜 襭 & - Scaffold - DL team - Infra team螳 , DL team螳 覦 螻旧 螳讌 - Scaffold & quick start - 貉れろ 一危 - 朱 一危 - 蠍 譟 4. Share - sprint #1 DL team Infra team Output #1 -覈 覲 襭 -れ启 -貉れろ 一危 覲 覦 讌 -Kubeflow 襴觀 -Katib -Pipeline -Jupyter notebook -Scaffold -Custom dataset & models
  • 13. 1.1. Jupyter Notebook : Easy GPU Node provisioning Tensorflow + Fairing + Kubectl = Kubeflow Jupyter Notebook れ襷 覃 至 豌覯讌 企 4. Share - sprint #1
  • 15. 1.2. Scaffold - 朱 . - 覈 ろ 覦朱 磯Μ 襷蟆 朱 貉れろ http://drivendata.github.io/cookiecutter-data-science/ 谿瑚) git-flow 貊襴觀 伎 伎 :) 4. Share - sprint #1
  • 16. 1.2. Scaffold 4. Share - sprint #1
  • 17. DL team Infra team Output 0 day -ろ 覈 覲 -朱 蟲 -朱襴 一危 -static GPU -Kubernetes on prem -Kubeflow install -Worker node drain -Jupyter notebook -Two stream 覈 -Kubernetes / Kubeflow on prem infra -Storage(MINIO) #1 -覈 覲 襭 -れ启 -貉れろ 一危 覲 覦 讌 -Kubeflow 襴觀 -Katib -Pipeline -Scaffold -Custom dataset & models -Kubeflow review report 1 (showcase) #2 -Katib 危狩朱誤 -Pipeline 一危一 襦 -Kubeflow 襴觀 -Fairing -Seldon -Katib -Data pipeline (one-off) -Custom dataset & models #3 -Watch 覃碁Ν, 豢襦 -Fairing (ろ) -Multi-cluster 襴觀 -Storage sync 襴觀 (Rok) -Watch pipeline run #4 -覈 / 觚 -open-rok (Index) 120 days log (5)
  • 18. 4. Share - sprint #2 DL team Infra team Output #2 -Katib 危狩朱誤 -Pipeline 一危一 襦 -Kubeflow 襴觀 -Fairing -Seldon -Katib -Data pipeline (one-off) -Custom dataset & models - Katib襯 伎 危狩朱誤壱 - Data pipeline (One-off run )
  • 20. 2.1. Katib 螳: 危狩朱誤 - 危狩朱誤 ? 覈 旧 れ 覲 螳 豕 蟆郁骸襦 蟆 譟一 Learning Rate, Dropout Rate, hidden layer , batch size, activation function 焔奄 - Katib(secretary) 覈語 Google vizier(prime minister) 豢覦 - 蟆 螻襴讀 蠍一朱 覈 豺蟾讌 螻 Train job ろ 蠏 蟆郁骸襯 讌. 螳讌 蟆 螻襴讀 螻 : Grid, Random, Bayesian, hyperbans, nasrl - StudyJob企朱 Custom Kubernetes Resource蠍 覓語 yaml CRD 襦 螳 - 覈 覲蟆曙 Metric Collector螳 讌 Log 襷 覲蟆渚 - Worker, Metric Collector 螳 貉危碁 ろ覃, 炎ロ UI伎 螻 4. Share - sprint #2
  • 21. 2.1. Katib 襴 : MetricCollectorPod 襦蠏 讌 (襦蠏 覃 key=value) 4. Share - sprint #2
  • 22. 2.1. Katib StudyJob Apply - Validation accuracy 0.5 覈 - 30 requestcount : Gpu Node(worker) 30覯 ろ - parameter れ batch_size, learning_rate, dropout, optimizer - Search Algorithm : random - requestNumber: 2 -> 2螳 Node(worker) 讌 => 豐 60螳 Trial ろ 螻 願啓 ! 4. Share - sprint #2
  • 23. 2.1. Katib Dashboard 4. Share - sprint #2
  • 24. 2.1. Katib 危狩朱誤 - 0.5 ろ : 0.35 optimizer螳 SGD 蟆曙 val-acc 觜 Dropout 0.3 0.4 蟆曙 val-acc 觜 - れ 朱誤 optimizer襯 SGD襦 螻 ろ dropout 豕螳 0.3朱 ... 0.48 -> 0.66 -> 0.77 4. Share - sprint #2
  • 25. 蠏朱, るジ 讌 ?
  • 26. 2.1. Katib 危狩朱誤 蠏朱 るジ 讌 ? - 貅殊 (https://github.com/keras-team/keras-tuner) - Hyperas (https://github.com/maxpumperla/hyperas) - Kopt (https://github.com/Avsecz/kopt) 4. Share - sprint #2 貅殊 Hyperas
  • 27. 2.1. Katib 危狩朱誤 - 貎覯ろ一 豺伎 - 蠍一ヾ 覈語れ ... 4. Share - sprint #2 貅殊 Hyperas
  • 28. 2.1. Katib 危狩朱誤 - 豺危磯 on kubernetes! - Hyperparameter ル朱襷 - 蠍一ヾ 覈語れ Log 襷 覲蟆渚覃 (Keras Log callback 襷 豢螳覃 !) 4. Share - sprint #2
  • 29. 2.1. Katib 襴觀 GOOD! - hyperparameter tuning tool WITHOUT CODE CHANGE! - 螻 蠍磯るΜ螻 螻 蠍磯るΜ螻 襯 譯手鍵 覓語 豌願螳 豢 -> るジ ! - Random Search襷朱 蟯谿 螳ロ. - Train 豌企ゼ Katib ろ蟆 . BAD! - UI螳 豺伎 : 螻牛 Rest API襯 蟆 ク. - 螳 蟆螻襴讀 讌讌襷 讌 Grid, Random 襷螻 蠍郁 曙螳 . - Hyperparameter 覯襯 int, double, categorical, discrete襯 讌讌襷 Pair襦 讌 蟆 - 覈 Metric 襷 れ - UI襯 牛 燕 StudyJob 朱 Metric Collector 讌讌 螳 . 4. Share - sprint #2
  • 31. 2.2. Pipeline : ML Pipelines for Kubeflow Cloud native ML Workflow 燕 () - Argo - REST api 讌 - Pipeline SDK 螻 Jupyter Notebook pipeline SDK れ Pipeline / 覦壱 螳 - 蟯襴襯 UI 螻 - Pipeline, Experiments, Run朱 蟲 - One-off , Recurring Run 讌 4. Share - sprint #2
  • 32. 2.2. Pipeline - One-off Job 1. 一危一 燕 Pipeline (1) Pipeline 4. Share - sprint #2 Compile create_pipeline.py $ dsl-compile --py create_pipeline.py --output create_pipeline.tar.gz Upload create_pipeline.tar.gz
  • 33. 2.2. Pipeline - One-off Job 1. 一危一 燕 Pipeline (2) Run 4. Share - sprint #2 Run Run ろ
  • 34. 2.2. Pipeline 襴觀 GOOD! - SDK襯 螻給蠍 覓語 所 pipeline 煙 螳ロ. - recurring run (cron) 螳ロ蠍 覓語 れ 螳ロ. - ろ螳 螻 Docker container願鍵 覓語 危殊 豌伎 dependency . BAD! - 螻牛伎朱 豌企 GCP蠍一朱 GCP 蟆曙 朱 ろ 企れ. るジ 貉危碁る 襷谿螳讌 - 焔 危殊碁 蠍 覓語 覲蟆曙 襦 燕伎 . API語 覈螻 coupling 伎 覿ク. - Recurring Run ろ 襭 Run 蟯襴 . - 蠏碁讌 朱 譬襭 , 豌螳 貉企螳 蟆 - Cause Exception. getKubeletContainers failed: rpc error - Node status : Not Ready 4. Share - sprint #2
  • 35. DL team Infra team Output 0 day -ろ 覈 覲 -朱 蟲 -朱襴 一危 -static GPU -Kubernetes on prem -Kubeflow install -Worker node drain -Jupyter notebook -Two stream 覈 -Kubernetes / Kubeflow on prem infra -Storage(MINIO) #1 -覈 覲 襭 -れ启 -貉れろ 一危 覲 覦 讌 -Kubeflow 襴觀 -Katib -Pipeline -Scaffold -Custom dataset & models -Kubeflow review report 1 (showcase) #2 -Katib 危狩朱誤 -Pipeline 一危一 襦 -Kubeflow 襴觀 -Fairing -Seldon -Katib -Data pipeline (one-off) -Custom dataset & models #3 -Watch 覃碁Ν, 豢襦 -Fairing (ろ) -Multi-cluster 襴觀 -Storage sync 襴觀 (Rok) -Watch pipeline run #4 -覈 / 觚 -open-rok (Index) 120 days log (6)
  • 36. DL team Infra team Output #3 -Pipeline - 覈 覦壱 覦 豢襦 -Fairing (ろ) -Multi-cluster 襴觀 -Storage sync 襴觀 (Rok) -Watch pipeline run 4. Share - sprint #3 - Watch pipeline run - 覓 覿ク Katib UI 螳 讀 - watch + action + slack ui - Fairing - ろ - on prem contribute 螻覩 - GPU scale out 讀, 覃壱企ろ 襴豺
  • 38. 4. Share - sprint #3 3.1. Recurring run - Watch Katib (accuracy)
  • 39. 4. Share - sprint #3 3.1. Recurring run - Watch Katib (dataset)
  • 40. 3.1. Recurring run - Demo 4. Share - sprint #3
  • 42. 3.2. Fairing : Python SDK for building, training, and deploying ML models ML Model れ (local, cloud) Train / Deploy 殊企襴 4. Share - sprint #3 Check fairing-example.html
  • 43. 3.2. Fairing : Python SDK for building, training, and deploying ML models Fairing on Prem - 螳 燕 Train model襯 docker image 襦 覲 Train job朱 submit - Docker image 煙 kaniko襯 - kaniko 企語 煙 Context レ襯 GCS, S3, Local, Git襷 蠍 覓語 (S3 AWS襷 螳) on prem 蟆曙 Local Dir 伎 企語 煙 螳 - But, fairing stable 覯殊 Cluster Context Source螳 GCS, S3襷 螳 4. Share - sprint #3
  • 44. 3.2. Fairing 4. Share - sprint #3 讌 GCS_Context S3_Context 襷 讌.
  • 46. 3.2. Fairing OnPrem Context support - Context 伎 ロ Storage -> 危殊語 Minio襯 ? - Minio client襯 Upload! -> Kaniko minio bucket 襷危碁 PV襯 local:// 襦 context 襦 5. Roadmap - Fairing on prem context support
  • 47. 3.2. Fairing Discuss fairing On-prem on kubeflow fairing slack channel 5. Roadmap - Fairing on prem context support
  • 48. on prem GCP control cluster GCS NAS sync open-rok Custom schedule No code change No code change worker cluster 5. Roadmap - Katib cloud bursting support
  • 49. Appendix - Kubecon Europe 2019 (Spain) # Stateful(storage), Security, Multi cluster
  • 50. We are hiring! れ - AI Service Accelerator - Kubeflow, Kubernetes, Cloud native ecosystem - AI Service, AI toolkits - Open source project襯 牛 炎概 襦襯 蠖蠑碁... shhong@dudaji.com