Eugene Burd
- Feb 15
- 4 min read

Optimizing AI and Machine Learning Workloads in Kubernetes 🚀

Kubernetes is well suited for all sorts of containerized workloads from services to jobs to stateful applications. But what about AI and machine learning workloads that require GPUs? Yes, Kubernetes supports these as well, but there are a lot of nuances.

This post will cover how Kubernetes supports GPUs including scheduling, over subscribing and time sharing and security/isolation. Additionally, we will discuss how each of the three public cloud providers support these capabilities, and how to ensure that your GPU nodes are only used by GPU workloads.

Device Plugins

Lets start by looking at the mechanism by which Kubernetes supports GPUs. Natively Kubernetes doesn't know anything about GPUs. Instead, it provides an extension mechanism, called Device Plugins. The device plugin framework allows third parties to advertise additional capabilities such as GPUs, InfiniBand adapaters, etc, that are available on a node.

A device plugin, usually implemented as a daemon set, registers itself with the node's kubelet and advertises schedulable resources available on the node to kubelet. Kubelet passes this information to the API server and it's then used by the Kubernetes Scheduler to schedule workloads on nodes that have the resources being requested by each container.

Requesting GPUs from a Workload

Now that we understand how Kubernetes knows about GPUs, lets talk about how a container can request one. A workload can request GPUs in a similar fashion as it requests CPUs or memory, but with a twist. Unlike CPUs, which are natively supported by Kubernetes, GPUs (and device plugins in general) only support limits (you can supply a request, but if you do, you must also supply a limit and the two values must be equal). The limit must also be an integer (fractional limits aren't allowed).

Lets take a look at an example pod. In this case, the pod is requesting 1 Nvidia gpu. The scheduler will attempt to find a node that has Nvidia gpu's available and not yet allocated and proceed to place the pod on that node.

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 600; done;"]
    resources:
      requests:
        cpu: 100m
        memory: 500Mi
      limits:
        memory: 1000Mi
        nvidia.com/gpu: 1

Over subscribing & Time Sharing

CPU time sharing is handled natively by the CNI using linux cgroups. Its influenced by your requests and limits - see post on how to set your CPU requests and limits (and why to avoid limits).

GPU time sharing is supported for Nvidia GPUs through two mechanisms:

Multi instance GPUs (Nvidia A100, H100) support multiple compute and memory units. In this case, you can configure how many partitions you want to expose. This configuration drives the device plugin to show multiple "virtual GPUs" for each physical GPU. This is supported by AWS, Azure, and GCP.
For single instance GPUs, time sharing is supported by Nvidia's GPU scheduler, by time slicing the time a workload has on the GPU. This is only supported by AWS and GCP.

While this approach means that over subscribing GPUs is possible, you must be careful as your workloads may be starved and unlike with CPU, there is no Completely Fair Scheduler (CFS) and no cgroup priorities, so time can only be split equally by workload.

Security/Isolation

Unlike with CPUs there is currently no process or memory isolation within the GPU. This means that all workloads scheduled on a GPU share its memory, so you should only share GPU between workloads you trust.

Creating GPU nodes

Now that we know how to request GPUs, you may be wondering how to create nodes with GPUs and how to install the device plugin. This varies by the kubernetes provider you are using and we will cover the 3 major providers below.

AWS

AWS supports creating node groups with any EC2 GPU instance types. You can choose from two options:

Run an EKS accelerated Amazon Linux AMI that has the Nvidia drivers installed. In this case, you will need to separately install the Nvidia device plugin yourself.
Run Nvidia's GPU Operator on the node group. In this case upgrades are manual.

Azure

Azure supports creating node pools with three options:

Create a GPU node pool, which automatically includes the GPU drivers, but requires you to install the Nvidia device plugin yourself.
Use the AKS GPU image preview, which includes both GPU drivers and the Nvidia device plugin. In this case upgrades are manual.
Run Nvidia's GPU Operator on the node group, which handles everything for you.

GCP

GKE supports creating node pools with two options.

Let google manage GPU Driver installation along with the Device Plugin. Using this option also allows you to have GKE automatically upgrade your nodes.
Manage the GPU Driver and the Device Plugin yourself

Protecting GPU nodes from non GPU workloads

Finally, now that you have created GPU nodes, you will want to protect these from any non GPU workloads running on your cluster. You can achieve this with taints and tolerations. When creating node pools and groups, you will want to apply a taint. GKE does this automatically for you, if your cluster has non GPU node pools. The other providers don't, so you will want to make sure to.

For your pods, you will want to provide a toleration to the taint, so that they can be scheduled on GPU nodes. The example below creates a toleration for a taint called "nvidia.com/gpu", which allows this pod to run on nvidia GPU nodes.

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 600; done;"]
    resources:
      requests:
        cpu: 100m
        memory: 500Mi
      limits:
        memory: 1000Mi
        nvidia.com/gpu: 1
    tolerations:
    - key: "nvidia.com/gpu"
      operator: "Exists"
      effect: "NoSchedule"

As your AI and machine learning workloads continue to grow, hopefully you will consider running them in Kubernetes vs the more expensive cloud provider proprietary options.

Have you tried to run GPU workloads on Kubernetes? What has worked well? What issues have you run into?

Optimizing AI and Machine Learning Workloads in Kubernetes 🚀

Device Plugins

Requesting GPUs from a Workload

Over subscribing & Time Sharing

Security/Isolation

Creating GPU nodes

AWS

Azure

GCP

Protecting GPU nodes from non GPU workloads

Recent Posts

LINKS

ABOUT

SOCIAL