Kubernetes can orchestrate and manage container workloads through components like Pods, Deployments, DaemonSets, and StatefulSets. It schedules containers across a cluster based on resource needs and availability. Services enable discovery and network access to Pods, while ConfigMaps and Secrets allow injecting configuration and credentials into applications.
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scalesparktc
油
1. GPU support in Spark allows for accelerating Spark applications by offloading compute-intensive tasks to GPUs. However, production deployments face challenges like low resource utilization and overload when scheduling mixed GPU and CPU workloads.
2. The presentation proposes solutions like recognizing GPU tasks to optimize the DAG and inserting new GPU stages. It also discusses policies for prioritizing and allocating GPU and CPU resources independently through multi-dimensional scheduling.
3. Evaluation shows the ALS Spark example achieving speedups on GPUs. IBM Spectrum Conductor provides a Spark-centric shared service with fine-grained resource scheduling, reducing wait times and improving utilization across shared GPU and CPU resources.
Using Deep Learning Toolkits with Kubernetes clustersJoy Qiao
油
際際滷s for the talk at the O'Reilly AI Conference San Francisco 2017 - https://conferences.oreilly.com/artificial-intelligence/ai-ca/public/schedule/detail/59613
Make Your Containers Faster: Linux Container Performance ToolsKernel TLV
油
If you look under the hood, Linux containers are just processes with some isolation features and resource quotas sprinkled on top. In this talk, we will apply modern Linux performance tools to container analysis: get high-level resource utilization on running containers with docker stats, htop, and nsenter; dig into high-CPU issues with perf; detect slow filesystem latency with BPF-based tools; and generate flame graphs of interesting event call stacks.
Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP and Regional Director, Pluralsight and O'Reilly author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing -- across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.
You can find more details on the meetup page - https://www.meetup.com/Tel-Aviv-Yafo-Linux-Kernel-Meetup/events/245319189/
This document provides an outline of manycore GPU architectures and programming. It introduces GPU architectures, the GPGPU concept, and CUDA programming. It discusses the GPU execution model, CUDA programming model, and how to work with different memory types in CUDA like global, shared and constant memory. It also covers streams and concurrency, CUDA intrinsics and libraries, performance profiling and debugging. Finally, it mentions directive-based programming models like OpenACC and OpenMP.
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance Cha...Puppet
油
This document provides an overview of measuring and tuning performance for the Puppet Enterprise (PE) platform. It discusses gathering data from PE services like Puppet Server and PuppetDB through JVM logging, metrics, and configurations. Important metrics for Puppet Server include JRuby usage and catalog compilation times. Tuning options involve adjusting JRuby capacity and rebalancing agent checkins. The document also covers monitoring PuppetDB for storage usage and command processing, as well as optimizing PostgreSQL query performance.
This document provides an introduction and overview of Kubernetes presented by Milos Zubal at a technology meetup. It begins with background on Milos and an outline of the topics to be covered, including the big picture of Kubernetes, its history and main features, containers, Kubernetes architecture, main components like pods and services, and deployment options. It then goes into more detail explaining each major Kubernetes concept like replicas, services, volumes, deployments and other primitives. The presentation aims to cover all of this in 30-35 minutes and concludes with questions and additional resources.
This document discusses Google Kubernetes Engine (GKE). It introduces containers and Kubernetes, then summarizes GKE as a container platform that fully manages master nodes. GKE provides automated operations like cluster autoscaling and node auto-repair. It allows creating multiple node pools with different configurations. GKE also enables high availability clusters across zones and monitoring with Stackdriver. Demos show using GKE to run game servers and implementing continuous integration and delivery pipelines.
This lecture discusses manycore GPU architectures and programming, focusing on the CUDA programming model. It covers GPU execution models, CUDA programming concepts like threads and blocks, and how to manage GPU memory including different memory types like global and shared memory. It also discusses optimizing memory access patterns for global memory and profiling CUDA programs.
A look at some of the ways available to deploy Postgres in a Kubernetes cloud environment, either in small scale using simple configurations, or in larger scale using tools such as Helm charts and the Crunchy PostgreSQL Operator. A short introduction to Kubernetes will be given to explain the concepts involved, followed by examples from each deployment method and observations on the key differences.
With employees based in countries around the globe which provide 24x7 services to MySQL users worldwide, Percona provides enterprise-grade MySQL Support, Consulting, Training, Managed Services, and Server Development services to companies ranging from large organizations, such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC, to recent startups building MySQL-powered solutions for businesses and consumers.
20180503 kube con eu kubernetes metrics deep diveBob Cotton
油
Kubernetes generates a wealth of metrics. Some explicitly within the Kubernetes API server, the Kublet, and cAdvisor or implicitly by observing events such as the kube-state-metrics project. A subset of these metrics are used within Kubernetes itself to make scheduling decisions, however, other metrics can be used to determine the overall health of the system or for capacity planning purposes.
Kubernetes exposes metrics from several places, some available internally, others through add-on projects. In this session you will learn about:
- Node level metrics, as exposed from the node_exporter
- Kublet metrics
- API server metrics
- etcd metrics
- cAdvisor metrics
- Metrics exposed from kube-state-metrics
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit
油
1. GPU support in Spark can accelerate analytics workloads through automatically generating CUDA code from Spark Java code or integrating Spark with GPU-enabled libraries and applications.
2. Production deployments face challenges in identifying GPU vs CPU execution, data preparation for GPU, and low resource utilization. Scheduling must handle mixed GPU and CPU workloads across non-identical hosts to avoid overload and improve utilization.
3. IBM Conductor with Spark provides solutions through fine-grained scheduling that recognizes GPU tasks, prioritizes and allocates resources independently, and allows adaptive scheduling between CPU and GPU. This improves time to results through better resource utilization.
Enabling ceph-mgr to control Ceph services via Kubernetesmountpoint.io
油
The document discusses enabling Ceph management services through Kubernetes using Rook and Ceph-mgr. Rook allows deploying Ceph in a containerized way on Kubernetes for simplified management. Ceph-mgr allows controlling Ceph services and integrating with Kubernetes through Rook. This provides multiple ways to consume Ceph based on needs, from simple storage with Rook to full control with Ceph tools. Upcoming improvements will reduce management complexity through automation.
Cuda Without a Phd - A practical guick startLloydMoore
油
NVIDIA CUDA is a tool kit for development of GPU accelerated applications. For specific types of applications and computational patterns the GPU allows you to deploy thousands of cores for processing in a very cost effective manner.
While getting the full benefit of GPU acceleration can take a considerable amount of knowledge and effort, considerable speedups can be achieved with minimal program changes.
This talk provides an overview of what CUDA is, where it can be effective, and then does a deep dive to convert a simple, sequential data processing loop running as a single thread on the CPU into a massively parallel operation running on the GPU.
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company
油
This document discusses scheduling GPUs on premise and in the cloud with Grid Engine. It covers challenges in using GPUs with Grid Engine, how applications interact with GPUs, configuring metadata for GPUs in Grid Engine, GPU and CPU binding, managing environments and containers for GPUs, accounting for GPU usage in Grid Engine, and an example workflow for setting everything up. It also previews upcoming improvements in Grid Engine for better support of GPUs like the A100 and MIGs.
CloudZone's Meetup at Google offices, 20.08.2018
Covering Google Cloud Platform Kubernetes Engine in Depth, including networking, compute, storage, monitoring & logging
Deploying Containers and Managing ThemDocker, Inc.
油
The document discusses managing Docker containers across multiple hosts. It introduces Dockermix/Maestro for defining deployments in YAML and synchronizing containers. It covers allocating CPU/RAM resources, potential scheduling solutions like Mesos and Omega, and advanced networking techniques like using Open vSwitch to bypass iptables overhead. Useful links are provided for Maestro, container metrics, Pipework for networking containers, and a Docker API pull request for resource allocation.
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
油
This document introduces 5 new high-performance features in Red Hat OpenShift Container Platform to support critical, latency-sensitive workloads. It describes CPU pinning, huge pages, device plugins for GPUs and other hardware, extended resources, and sysctl support. Demo sections show how these features allow workloads to consume exclusive CPUs, huge pages, GPUs, and configure kernel parameters in OpenShift pods. The roadmap discusses expanding support for NUMA, co-located device scheduling, and the Kubernetes resource API.
The document discusses Kubernetes cluster autoscaler, including how it works, deployment steps, configuration, and limitations. It describes setting up the cluster autoscaler manager, preparing extra nodes, protecting nodes from scale down, and installing the autoscaler using Helm. Some key points are that it can automatically scale a cluster from 0 to 1000 nodes handling 30 pods each, but has restrictions like not supporting regional instance groups and taking 10 minutes to scale nodes down.
Big data processing using hadoop poster presentationAmrut Patil
油
This document compares implementing Hadoop infrastructure on Amazon Web Services (AWS) versus commodity hardware. It discusses setting up Hadoop clusters on both AWS Elastic Compute Cloud (EC2) instances and several retired PCs running Ubuntu. The document also provides an overview of the Hadoop architecture, including the roles of the NameNode, DataNode, JobTracker, and TaskTracker in distributed storage and processing within Hadoop.
The document provides an overview of the cgroup subsystem and namespace subsystem in Linux, which form the basis of Linux containers. It discusses how cgroups and namespaces enable lightweight virtualization of processes through isolation of resources and namespaces. It then covers specific aspects of cgroups like the memory, CPU, devices, and PIDs controllers. It also summarizes the key differences and improvements in the cgroup v2 implementation, such as having a single unified hierarchy and consistent controller interfaces.
The presentation provides you with the necessary steps to follow when migrating to XtraDB Cluster.
Percona provides an in-depth review of your database and recommends appropriate changes by performing a complete MySQL health check in which we identify inefficiencies, find problems before they occur, and ensure that your MySQL database is in the best condition.
The document discusses programming models for heterogeneous chips that contain both CPUs and GPUs. It provides motivation for utilizing both device types through examples of hardware with integrated GPUs that could benefit from programming models that allow collaboration between the CPU and GPU. The document outlines hardware features of chips from Intel, AMD, Samsung, Qualcomm and others, and discusses programming models like OpenCL, HSA, and approaches from Intel, Qualcomm and others that aim to support programming across heterogeneous devices.
This document discusses Google Kubernetes Engine (GKE). It introduces containers and Kubernetes, then summarizes GKE as a container platform that fully manages master nodes. GKE provides automated operations like cluster autoscaling and node auto-repair. It allows creating multiple node pools with different configurations. GKE also enables high availability clusters across zones and monitoring with Stackdriver. Demos show using GKE to run game servers and implementing continuous integration and delivery pipelines.
This lecture discusses manycore GPU architectures and programming, focusing on the CUDA programming model. It covers GPU execution models, CUDA programming concepts like threads and blocks, and how to manage GPU memory including different memory types like global and shared memory. It also discusses optimizing memory access patterns for global memory and profiling CUDA programs.
A look at some of the ways available to deploy Postgres in a Kubernetes cloud environment, either in small scale using simple configurations, or in larger scale using tools such as Helm charts and the Crunchy PostgreSQL Operator. A short introduction to Kubernetes will be given to explain the concepts involved, followed by examples from each deployment method and observations on the key differences.
With employees based in countries around the globe which provide 24x7 services to MySQL users worldwide, Percona provides enterprise-grade MySQL Support, Consulting, Training, Managed Services, and Server Development services to companies ranging from large organizations, such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC, to recent startups building MySQL-powered solutions for businesses and consumers.
20180503 kube con eu kubernetes metrics deep diveBob Cotton
油
Kubernetes generates a wealth of metrics. Some explicitly within the Kubernetes API server, the Kublet, and cAdvisor or implicitly by observing events such as the kube-state-metrics project. A subset of these metrics are used within Kubernetes itself to make scheduling decisions, however, other metrics can be used to determine the overall health of the system or for capacity planning purposes.
Kubernetes exposes metrics from several places, some available internally, others through add-on projects. In this session you will learn about:
- Node level metrics, as exposed from the node_exporter
- Kublet metrics
- API server metrics
- etcd metrics
- cAdvisor metrics
- Metrics exposed from kube-state-metrics
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit
油
1. GPU support in Spark can accelerate analytics workloads through automatically generating CUDA code from Spark Java code or integrating Spark with GPU-enabled libraries and applications.
2. Production deployments face challenges in identifying GPU vs CPU execution, data preparation for GPU, and low resource utilization. Scheduling must handle mixed GPU and CPU workloads across non-identical hosts to avoid overload and improve utilization.
3. IBM Conductor with Spark provides solutions through fine-grained scheduling that recognizes GPU tasks, prioritizes and allocates resources independently, and allows adaptive scheduling between CPU and GPU. This improves time to results through better resource utilization.
Enabling ceph-mgr to control Ceph services via Kubernetesmountpoint.io
油
The document discusses enabling Ceph management services through Kubernetes using Rook and Ceph-mgr. Rook allows deploying Ceph in a containerized way on Kubernetes for simplified management. Ceph-mgr allows controlling Ceph services and integrating with Kubernetes through Rook. This provides multiple ways to consume Ceph based on needs, from simple storage with Rook to full control with Ceph tools. Upcoming improvements will reduce management complexity through automation.
Cuda Without a Phd - A practical guick startLloydMoore
油
NVIDIA CUDA is a tool kit for development of GPU accelerated applications. For specific types of applications and computational patterns the GPU allows you to deploy thousands of cores for processing in a very cost effective manner.
While getting the full benefit of GPU acceleration can take a considerable amount of knowledge and effort, considerable speedups can be achieved with minimal program changes.
This talk provides an overview of what CUDA is, where it can be effective, and then does a deep dive to convert a simple, sequential data processing loop running as a single thread on the CPU into a massively parallel operation running on the GPU.
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company
油
This document discusses scheduling GPUs on premise and in the cloud with Grid Engine. It covers challenges in using GPUs with Grid Engine, how applications interact with GPUs, configuring metadata for GPUs in Grid Engine, GPU and CPU binding, managing environments and containers for GPUs, accounting for GPU usage in Grid Engine, and an example workflow for setting everything up. It also previews upcoming improvements in Grid Engine for better support of GPUs like the A100 and MIGs.
CloudZone's Meetup at Google offices, 20.08.2018
Covering Google Cloud Platform Kubernetes Engine in Depth, including networking, compute, storage, monitoring & logging
Deploying Containers and Managing ThemDocker, Inc.
油
The document discusses managing Docker containers across multiple hosts. It introduces Dockermix/Maestro for defining deployments in YAML and synchronizing containers. It covers allocating CPU/RAM resources, potential scheduling solutions like Mesos and Omega, and advanced networking techniques like using Open vSwitch to bypass iptables overhead. Useful links are provided for Maestro, container metrics, Pipework for networking containers, and a Docker API pull request for resource allocation.
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
油
This document introduces 5 new high-performance features in Red Hat OpenShift Container Platform to support critical, latency-sensitive workloads. It describes CPU pinning, huge pages, device plugins for GPUs and other hardware, extended resources, and sysctl support. Demo sections show how these features allow workloads to consume exclusive CPUs, huge pages, GPUs, and configure kernel parameters in OpenShift pods. The roadmap discusses expanding support for NUMA, co-located device scheduling, and the Kubernetes resource API.
The document discusses Kubernetes cluster autoscaler, including how it works, deployment steps, configuration, and limitations. It describes setting up the cluster autoscaler manager, preparing extra nodes, protecting nodes from scale down, and installing the autoscaler using Helm. Some key points are that it can automatically scale a cluster from 0 to 1000 nodes handling 30 pods each, but has restrictions like not supporting regional instance groups and taking 10 minutes to scale nodes down.
Big data processing using hadoop poster presentationAmrut Patil
油
This document compares implementing Hadoop infrastructure on Amazon Web Services (AWS) versus commodity hardware. It discusses setting up Hadoop clusters on both AWS Elastic Compute Cloud (EC2) instances and several retired PCs running Ubuntu. The document also provides an overview of the Hadoop architecture, including the roles of the NameNode, DataNode, JobTracker, and TaskTracker in distributed storage and processing within Hadoop.
The document provides an overview of the cgroup subsystem and namespace subsystem in Linux, which form the basis of Linux containers. It discusses how cgroups and namespaces enable lightweight virtualization of processes through isolation of resources and namespaces. It then covers specific aspects of cgroups like the memory, CPU, devices, and PIDs controllers. It also summarizes the key differences and improvements in the cgroup v2 implementation, such as having a single unified hierarchy and consistent controller interfaces.
The presentation provides you with the necessary steps to follow when migrating to XtraDB Cluster.
Percona provides an in-depth review of your database and recommends appropriate changes by performing a complete MySQL health check in which we identify inefficiencies, find problems before they occur, and ensure that your MySQL database is in the best condition.
The document discusses programming models for heterogeneous chips that contain both CPUs and GPUs. It provides motivation for utilizing both device types through examples of hardware with integrated GPUs that could benefit from programming models that allow collaboration between the CPU and GPU. The document outlines hardware features of chips from Intel, AMD, Samsung, Qualcomm and others, and discusses programming models like OpenCL, HSA, and approaches from Intel, Qualcomm and others that aim to support programming across heterogeneous devices.
Guide to a Winning Interview March 2025Bruce Bennett
油
This webinar is an in-depth review of the interview process. Preparation is a key element to acing an interview. Learn the best approaches from the initial phone screen to the face-to-face meeting with the hiring manager. You will hear great answers to several standard questions, including the dreaded Tell Me About Yourself.
2. Kubernetes resource scheduling
Terminology:
- Allocatable what is
available at node
- Used what is already
being used from node
(called
RequestedResource)
- Requests what is
requested by
container(s) for the pod
Scheduler Keeps
track of Used
Worker 1
Worker 2
Worker N
Pod (Contianer) Spec
- Container Requests
Kubelets send
Allocatable
resources for nodes
Scheduling
Request
3. Resources
All resources (allocatable, used, and requests) are represented as a
ResourceList which is simply a list of key-value pairs, e.g.
memory : 64GiB
cpu : 8
4. Simple scheduling
1. Find worker nodes that can fit a pod spec
plugin/pkg/scheduler/algorithm/predicates
2. Prioritize list of nodes
plugin/pkg/scheduler/algorithm/priorities
3. Try to schedule pod on node node may have additional admission
policy so pod may fail
4. If fails, try next node on list
5. Find nodes that fit
For simple scheduling, node will NOT fit if
Allocatable < Request + Used
Example
if allocatable.MilliCPU < podRequest.MilliCPU+nodeInfo.RequestedResource().MilliCPU {
predicateFails = append(predicateFails, NewInsufficientResourceError(api.ResourceCPU,
podRequest.MilliCPU, nodeInfo.RequestedResource().MilliCPU, allocatable.MilliCPU))
}
if allocatable.Memory < podRequest.Memory+nodeInfo.RequestedResource().Memory {
predicateFails = append(predicateFails, NewInsufficientResourceError(api.ResourceMemory,
podRequest.Memory, nodeInfo.RequestedResource().Memory, allocatable.Memory))
}
if allocatable.NvidiaGPU < podRequest.NvidiaGPU+nodeInfo.RequestedResource().NvidiaGPU {
predicateFails = append(predicateFails, NewInsufficientResourceError(api.ResourceNvidiaGPU,
podRequest.NvidiaGPU, nodeInfo.RequestedResource().NvidiaGPU, allocatable.NvidiaGPU))
}
6. Why do we need modifications?
Only allows for constraints like following in pod spec
Need 4 GPUs
Does NOT allow for constraints like following in pod spec
Need 4 GPUs with minimum memory 12GiB OR
Need 2 GPUs with minimum memory 4GiB and 2 GPUs with 12GiB
Need 2 GPUs interconnected via NVLink (peer-to-peer for high speed inter-
GPU communication)
7. Solution 1
Label nodes and use node selector
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
However, not optimal in cases with heterogeneous configurations
For example, one machine may have GPUs of several types, some with large
amounts of memory and some with small
If label used, then dont know which GPUs will get assigned. Thus only
minimally performant GPU can be used to label node
Also even in homogenous configurations, kubelet running on worker
nodes needs to keep track of bookkeeping and which GPUs are in use
8. Solution 2 Group Scheduler
Define richer syntax on ResourceLists to allow for such constraints to
be scheduled
Example:
Instead of:
NvidiaGPU: 2
Use something like now memory for each GPU is clearly specified
Gpu/0/cards: 1
Gpu/0/memory: 12GiB
Gpu/1/cards: 1
Gpu/1/memory: 6GiB
Use of cards is present to prevent sharing of GPU cards
9. GpuGrp1
GpuGrp0
Example GPU with NVLink
For 4 GPUs with two groups, each connected via NVLink to another
GpuGrp/0/Gpu/0/cards: 1
GpuGrp/0/Gpu/0/memory: 12GiB
GpuGrp/0/Gpu/1/cards: 1
GpuGrp/0/Gpu/1/memory: 12GiB
GpuGrp/1/Gpu/2/cards: 1
GpuGrp/1/Gpu/2/memory: 8GiB
GpuGrp/1/Gpu/3/cards: 1
GpuGrp/1/Gpu/3/memory: 8GiB
Gpu0 Gpu1
Gpu2 Gpu3
10. Group scheduler
All resource lists (allocatable, used, and requests) specified in this
manner
Scheduling can no longer compare values with same key to see fit
e.g: allocatable[memory] < used[memory] + requested[memory]
Example
Allocatable:
GpuGrp/0/Gpu/0/cards: 1
GpuGrp/0/Gpu/0/memory: 12GiB
GpuGrp/0/Gpu/1/cards: 1
GpuGrp/0/Gpu/1/memory: 12GiB
GpuGrp/1/Gpu/2/cards: 1
GpuGrp/1/Gpu/2/memory: 8GiB
GpuGrp/1/Gpu/3/cards: 1
GpuGrp/1/Gpu/3/memory: 8GiB
Requested (two GPUs minimum memory
10GiB, dont require about NVLink):
GpuGrp/A/Gpu/0/cards: 1
GpuGrp/A/Gpu/0/memory: 10GiB
GpuGrp/B/Gpu/1/cards: 1
GpuGrp/B/Gpu/1/memory: 10GiB
11. Group scheduler
Group scheduler uses hierarchical group allocation with arbitrary
scorers to accomplish both checking for fit and allocation
Allocation is a string-to-string key-value which specifies a mapping
from Requests to Allocatable
Allocatable:
GpuGrp/0/Gpu/0/cards: 1
GpuGrp/0/Gpu/0/memory: 12GiB
GpuGrp/0/Gpu/1/cards: 1
GpuGrp/0/Gpu/1/memory: 12GiB
GpuGrp/1/Gpu/2/cards: 1
GpuGrp/1/Gpu/2/memory: 8GiB
GpuGrp/1/Gpu/3/cards: 1
GpuGrp/1/Gpu/3/memory: 8GiB
Requested (two GPUs minimum memory
10GiB, dont require about NVLink):
GpuGrp/A/Gpu/0/cards: 1
GpuGrp/A/Gpu/0/memory: 10GiB
GpuGrp/B/Gpu/1/cards: 1
GpuGrp/B/Gpu/1/memory: 10GiB
13. Main Modifications scheduler side
1. Addition of AllocateFrom field in pod specification. This is a list of key-
value pairs which specify mapping from Requests to Allocatable
pkg/api/types.go
2. Addition of group scheduler code
plugin/pkg/scheduler/algorithm/predicates/grpallocate.go
plugin/pkg/scheduler/algorithm/scorer
3. Modification in scheduler to write pod update after scheduling and to
call group allocator
plugin/pkg/scheduler/generic_scheduler.go
plugin/pkg/scheduler/scheduler.go
14. Kubelet modifications
Existing multi-GPU code makes the kubelet do the work of keeping
track of which GPUs are available and uses /dev/nvidia* to see
number of devices, both of which are hacks
With addition of AllocateFrom field, scheduler decides which GPUs
to use and keeps track of which ones are in use.
15. Main Modifications kubelet side
1. Use of AllocateFrom to decide which GPUs to use
2. Use of nvidia-docker-plugin to find GPUs (instead of looking at
/dev/nvidia*)
This is also needed to get richer information such as memory in GPU, GPU
type, topology information (i.e. NVLink)
3. Use of nvidia-docker-plugin to find correct location for nvidia drivers
inside container (in conjunction with nvidia-docker driver)
4. Allow specification of driver when specifying mount needed to
use nvidia-docker driver
16. Integration with community
Eventual goal
Scheduler Keeps
track of Used
Worker 1
Worker 2
Worker N
Pod (Contianer) Spec
- Container Requests
Kubelets send
Allocatable
resources for nodes
Device Plugins
(e.g. GPU)
Resources to
advertise
Resources usage /
docker params
Kubelets know
nothing about
GPUs
Scheduler
extender Scheduling
Request
Asks for fit
Performs group
allocation writes
update to pod spec
with allocation
17. Needed in Kubernetes core
We will need a few things in order to achieve separation with core
which will allow for directly using latest Kubernetes binaries
Resource Class, scheduled for v1.9 will allow for non-identity
mappings between requests and allocatable
Device plugins and native Nvidia GPU support is v1.13 for now
https://docs.google.com/a/google.com/spreadsheets/d/1NWarIgtSLsq3
izc5wOzV7ItdhDNRd-6oBVawmvs-LGw
18. Other future Kubernetes/Scheduler work
Pod placement using other constraints such as pod-level constraints
or higher (e.g. multiple pods for distributed training)
For example, networking constraints for distributed training when
scheduling
Container networking for faster cross-pod communication (e.g. using
RDMA / IB)