GPU Support
kOps will also install a RuntimeClass . As the nvidia runtime is not the default runtime, you will need to add runtimeClassName: nvidia
to any Pod spec you want to use for GPU workloads. The RuntimeClass also configures the appropriate node selectors and tolerations to run on GPU Nodes.
The taint will prevent you from accidentially scheduling workloads on GPU Nodes.
Due to the cost of GPU instances you want to minimize the amount of pods running on them. Therefore start by provisioning a regular cluster following the getting started documentation.
OpenStack does not support enabling containerd configuration in cluster level. It needs to be done in instance group:
- after new GPU nodes are coming up, you should see them in
kubectl get nodes
- namespace should have nvidia-device-plugin-daemonset pod provisioned to GPU node(s)
- if you see
nvidia.com/gpu
in kubectl describe node everything should work.