Schedule GPUs

    FEATURE STATE:

    Kubernetes includes stable support for managing AMD and NVIDIA GPUs (graphical processing units) across different nodes in your cluster, using device plugins.

    This page describes how users can consume GPUs, and outlines some of the limitations in the implementation.

    Kubernetes implements device plugins to let Pods access specialized hardware features such as GPUs.

    As an administrator, you have to install GPU drivers from the corresponding hardware vendor on the nodes and run the corresponding device plugin from the GPU vendor. Here are some links to vendors’ instructions:

    Once you have installed the plugin, your cluster exposes a custom schedulable resource such as amd.com/gpu or nvidia.com/gpu.

    You can consume these GPUs from your containers by requesting the custom GPU resource, the same way you request cpu or memory. However, there are some limitations in how you specify the resource requirements for custom devices.

    GPUs are only supposed to be specified in the section, which means:

    • You can specify GPU limits without specifying requests, because Kubernetes will use the limit as the request value by default.
    • You cannot specify GPU without specifying limits.

    If different nodes in your cluster have different types of GPUs, then you can use to schedule pods to appropriate nodes.

    For example:

    1. # Label your nodes with the accelerator type they have.
    2. kubectl label nodes node1 accelerator=example-gpu-x100

    That label key is just an example; you can use a different label key if you prefer.

    If you’re using AMD GPU devices, you can deploy . Node Labeller is a controller that automatically labels your nodes with GPU device properties.