Schedule GPUs

FEATURE STATE:

Kubernetes includes stable support for managing AMD and NVIDIA GPUs (graphical processing units) across different nodes in your cluster, using device plugins.

This page describes how users can consume GPUs, and outlines some of the limitations in the implementation.

Kubernetes implements device plugins to let Pods access specialized hardware features such as GPUs.

Note: This section links to third party projects that provide functionality required by Kubernetes. The Kubernetes project authors aren’t responsible for these projects, which are listed alphabetically. To add a project to this list, read the content guide before submitting a change.

Once you have installed the plugin, your cluster exposes a custom schedulable resource such as amd.com/gpu or nvidia.com/gpu.

You can consume these GPUs from your containers by requesting the custom GPU resource, the same way you request cpu or . However, there are some limitations in how you specify the resource requirements for custom devices.

GPUs are only supposed to be specified in the limits section, which means:

You can specify GPU in both and requests but these two values must be equal.
You cannot specify GPU requests without specifying limits.

Here’s an example manifest for a Pod that requests a GPU:

If different nodes in your cluster have different types of GPUs, then you can use Node Labels and Node Selectors to schedule pods to appropriate nodes.

That label key is just an example; you can use a different label key if you prefer.

If you’re using AMD GPU devices, you can deploy Node Labeller. Node Labeller is a that automatically labels your nodes with GPU device properties.

At the moment, that controller can add labels for:

Device ID (-device-id)
VRAM Size (-vram)
Number of SIMD (-simd-count)
Number of Compute Unit (-cu-count)
GPU Family, in two letters acronym (-family)
- SI - Southern Islands
- CI - Sea Islands
- KV - Kaveri
- VI - Volcanic Islands
- CZ - Carrizo
- RV - Raven

With the Node Labeller in use, you can specify the GPU type in the Pod spec:

This ensures that the Pod will be scheduled to a node that has the GPU type you specified.