Managing nodes

    To make configuration changes to a cluster, or machine pool, you must create a custom resource definition (CRD), or kubeletConfig object. OKD uses the Machine Config Controller to watch for changes introduced through the CRD to apply the changes to the cluster.

    Because the fields in a kubeletConfig object are passed directly to the kubelet from upstream Kubernetes, the validation of those fields is handled directly by the kubelet itself. Please refer to the relevant Kubernetes documentation for the valid values for these fields. Invalid values in the kubeletConfig object can render cluster nodes unusable.

    Procedure

    1. Obtain the label associated with the static CRD, Machine Config Pool, for the type of node you want to configure. Perform one of the following steps:

      1. Check current labels of the desired machine config pool.

        For example:

        Example output

        1. NAME CONFIG UPDATED UPDATING DEGRADED LABELS
        2. master rendered-master-e05b81f5ca4db1d249a1bf32f9ec24fd True False False operator.machineconfiguration.openshift.io/required-for-upgrade=
        3. worker rendered-worker-f50e78e1bc06d8e82327763145bfcf62 True False False
      2. Add a custom label to the desired machine config pool.

        For example:

        1. $ oc label machineconfigpool worker custom-kubelet=enabled
    2. Create a kubeletconfig custom resource (CR) for your configuration change.

      For example:

      Sample configuration for a custom-config CR

      1. apiVersion: machineconfiguration.openshift.io/v1
      2. kind: KubeletConfig
      3. metadata:
      4. name: custom-config (1)
      5. spec:
      6. machineConfigPoolSelector:
      7. matchLabels:
      8. custom-kubelet: enabled (2)
      9. kubeletConfig: (3)
      10. podsPerCore: 10
      11. maxPods: 250
      12. systemReserved:
      13. cpu: 2000m
      14. memory: 1Gi
      1Assign a name to CR.
      2Specify the label to apply the configuration change, this is the label you added to the machine config pool.
      3Specify the new value(s) you want to change.
    3. Create the CR object.

      1. $ oc create -f <file-name>

      For example:

      1. $ oc create -f master-kube-config.yaml

    Most can be set by the user. The following options are not allowed to be overwritten:

    • CgroupDriver

    • ClusterDNS

    • ClusterDomain

    • StaticPodPath

    If a single node contains more than 50 images, pod scheduling might be imbalanced across nodes. This is because the list of images on a node is shortened to 50 by default. You can disable the image limit by editing the KubeletConfig object and setting the value of nodeStatusMaxImages to -1.

    Configuring control plane nodes as schedulable

    You can configure control plane nodes to be schedulable, meaning that new pods are allowed for placement on the master nodes. By default, control plane nodes are not schedulable.

    You can set the masters to be schedulable, but must retain the worker nodes.

    You can deploy OKD with no worker nodes on a bare metal cluster. In this case, the control plane nodes are marked schedulable by default.

    You can allow or disallow control plane nodes to be schedulable by configuring the mastersSchedulable field.

    Procedure

    1. Edit the schedulers.config.openshift.io resource.

      1. $ oc edit schedulers.config.openshift.io cluster
    2. 1Set to true to allow control plane nodes to be schedulable, or false to disallow control plane nodes to be schedulable.
    3. Save the file to apply the changes.

    OKD allows you to enable and disable an SELinux boolean on a Fedora CoreOS (FCOS) node. The following procedure explains how to modify SELinux booleans on nodes using the Machine Config Operator (MCO). This procedure uses container_manage_cgroup as the example boolean. You can modify this value to whichever boolean you need.

    Prerequisites

    • You have installed the OpenShift CLI (oc).

    Procedure

    1. Create a new YAML file with a MachineConfig object, displayed in the following example:

      1. apiVersion: machineconfiguration.openshift.io/v1
      2. kind: MachineConfig
      3. metadata:
      4. labels:
      5. machineconfiguration.openshift.io/role: worker
      6. name: 99-worker-setsebool
      7. spec:
      8. config:
      9. ignition:
      10. version: 3.2.0
      11. systemd:
      12. units:
      13. - contents: |
      14. [Unit]
      15. Description=Set SELinux booleans
      16. Before=kubelet.service
      17. [Service]
      18. Type=oneshot
      19. ExecStart=/sbin/setsebool container_manage_cgroup=on
      20. RemainAfterExit=true
      21. [Install]
      22. WantedBy=multi-user.target graphical.target
      23. enabled: true
      24. name: setsebool.service
    2. Create the new MachineConfig object by running the following command:

      1. $ oc create -f 99-worker-setsebool.yaml

    Applying any changes to the MachineConfig object causes all affected nodes to gracefully reboot after the change is applied.

    Adding kernel arguments to nodes

    In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.

    Improper use of kernel arguments can result in your systems becoming unbootable.

    Examples of kernel arguments you could set include:

    • nosmt: Disables symmetric multithreading (SMT) in the kernel. Multithreading allows multiple logical threads for each CPU. You could consider nosmt in multi-tenant environments to reduce risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.

    • systemd.unified_cgroup_hierarchy: Configures the version of Linux control group that is installed on your nodes: or cgroup v2. cgroup v2 is the next version of the kernel and offers multiple improvements. However, it can have some unwanted effects on your nodes.

      cgroup v2 is enabled by default. To disable cgroup v2, use the systemd.unified_cgroup_hierarchy=0 kernel argument, as shown in the following procedure.

    See Kernel.org kernel parameters for a list and descriptions of kernel arguments.

    In the following procedure, you create a object that identifies:

    • A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.

    • Kernel arguments that are appended to the end of the existing kernel arguments.

    • A label that indicates where in the list of machine configs the change is applied.

    Prerequisites

    • Have administrative privilege to a working OKD cluster.

    Procedure

    1. List existing MachineConfig objects for your OKD cluster to determine how to label your machine config:

      1. $ oc get MachineConfig

      Example output

      1. NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
      2. 00-master 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      3. 00-worker 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      4. 01-master-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      5. 01-master-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      6. 01-worker-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      7. 01-worker-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      8. 99-master-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      9. 99-master-ssh 3.2.0 40m
      10. 99-worker-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      11. 99-worker-ssh 3.2.0 40m
      12. rendered-master-23e785de7587df95a4b517e0647e5ab7 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      13. rendered-worker-5d596d9293ca3ea80c896a1191735bb1 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    2. Create a MachineConfig object file that identifies the kernel argument (for example, 05-worker-kernelarg-selinuxpermissive.yaml)

      1. apiVersion: machineconfiguration.openshift.io/v1
      2. kind: MachineConfig
      3. metadata:
      4. labels:
      5. machineconfiguration.openshift.io/role: worker (1)
      6. name: 05-worker-kernelarg-selinuxpermissive (2)
      7. spec:
      8. config:
      9. ignition:
      10. version: 3.2.0
      11. kernelArguments:
      12. - enforcing=0 (3)
      13. systemd.unified_cgroup_hierarchy=0 (4)
    3. Create the new machine config:

      1. $ oc create -f 05-worker-kernelarg-selinuxpermissive.yaml
    4. Check the machine configs to see that the new one was added:

      Example output

      1. NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
      2. 00-master 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      3. 00-worker 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      4. 01-master-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      5. 01-master-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      6. 01-worker-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      7. 01-worker-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      8. 05-worker-kernelarg-selinuxpermissive 3.2.0 105s
      9. 99-master-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      10. 99-master-ssh 3.2.0 40m
      11. 99-worker-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      12. 99-worker-ssh 3.2.0 40m
      13. rendered-master-23e785de7587df95a4b517e0647e5ab7 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
      14. rendered-worker-5d596d9293ca3ea80c896a1191735bb1 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    5. Check the nodes:

      1. $ oc get nodes
      1. NAME STATUS ROLES AGE VERSION
      2. ip-10-0-136-161.ec2.internal Ready worker 28m v1.26.0
      3. ip-10-0-136-243.ec2.internal Ready master 34m v1.26.0
      4. ip-10-0-141-105.ec2.internal Ready,SchedulingDisabled worker 28m v1.26.0
      5. ip-10-0-142-249.ec2.internal Ready master 34m v1.26.0
      6. ip-10-0-153-11.ec2.internal Ready worker 28m v1.26.0
      7. ip-10-0-153-150.ec2.internal Ready master 34m v1.26.0

      You can see that scheduling on each worker node is disabled as the change is being applied.

    6. Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in /proc/cmdline on the host):

      1. $ oc debug node/ip-10-0-141-105.ec2.internal

      Example output

      1. Starting pod/ip-10-0-141-105ec2internal-debug ...
      2. To use host binaries, run `chroot /host`
      3. sh-4.2# cat /host/proc/cmdline
      4. BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8
      5. rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16...
      6. sh-4.2# exit

      You should see the enforcing=0 argument added to the other kernel arguments.

    Enabling swap memory use on nodes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

    For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

    You can enable swap memory use for OKD workloads on a per-node basis.

    Enabling swap memory can negatively impact workload performance and out-of-resource handling. Do not enable swap memory on control plane nodes.

    To enable swap memory, create a custom resource (CR) to set the swapbehavior parameter. You can set limited or unlimited swap memory:

    • Limited: Use the LimitedSwap value to limit how much swap memory workloads can use. Any workloads on the node that are not managed by OKD can still use swap memory. The LimitedSwap behavior depends on whether the node is running with Linux control groups or version 2 (cgroup v2):

      • cgroup v1: OKD workloads can use any combination of memory and swap, up to the pod’s memory limit, if set.

      • cgroup v2: OKD workloads cannot use swap memory.

    • Unlimited: Use the UnlimitedSwap value to allow workloads to use as much swap memory as they request, up to the system limit.

    Because the kubelet will not start in the presence of swap memory without this configuration, you must enable swap memory in OKD before enabling swap memory on the nodes. If there is no swap memory present on a node, enabling swap memory in OKD has no effect.

    Prerequisites

    • You have a running OKD cluster that uses version 4.10 or later.

    • You are logged in to the cluster as a user with administrative privileges.

    • You have enabled the TechPreviewNoUpgrade feature set on the cluster (see Nodes → Working with clusters → Enabling features using feature gates).

      Enabling the TechPreviewNoUpgrade feature set cannot be undone and prevents minor version updates. These feature sets are not recommended on production clusters.

    • If cgroup v2 is enabled on a node, you must enable swap accounting on the node, by setting the swapaccount=1 kernel argument.

    Procedure

    1. Apply a custom label to the machine config pool where you want to allow swap memory.

      1. $ oc label machineconfigpool worker kubelet-swap=enabled
    2. Create a custom resource (CR) to enable and configure swap settings.

      1Set to false to enable swap memory use on the associated nodes. Set to true to disable swap memory use.
      2Specify the swap memory behavior. If unspecified, the default is LimitedSwap.
    3. Enable swap memory on the machines.

    Migrating control plane nodes from one RHOSP host to another

    You can run a script that moves a control plane node from one OpenStack node to another.

    Prerequisites

    • The environment variable OS_CLOUD refers to a clouds entry that has administrative credentials in a clouds.yaml file.

    • The environment variable KUBECONFIG refers to a configuration that contains administrative OKD credentials.

    • From a command line, run the following script:
    1. #!/usr/bin/env bash
    2. set -Eeuo pipefail
    3. if [ $# -lt 1 ]; then
    4. echo "Usage: '$0 node_name'"
    5. exit 64
    6. fi
    7. # Check for admin OpenStack credentials
    8. openstack server list --all-projects >/dev/null || { >&2 echo "The script needs OpenStack admin credentials. Exiting"; exit 77; }
    9. # Check for admin OpenShift credentials
    10. oc adm top node >/dev/null || { >&2 echo "The script needs OpenShift admin credentials. Exiting"; exit 77; }
    11. set -x
    12. declare -r node_name="$1"
    13. declare server_id
    14. server_id="$(openstack server list --all-projects -f value -c ID -c Name | grep "$node_name" | cut -d' ' -f1)"
    15. readonly server_id
    16. # Drain the node
    17. oc adm cordon "$node_name"
    18. oc adm drain "$node_name" --delete-emptydir-data --ignore-daemonsets --force
    19. # Power off the server
    20. oc debug "node/${node_name}" -- chroot /host shutdown -h 1
    21. # Verify the server is shut off
    22. until openstack server show "$server_id" -f value -c status | grep -q 'SHUTOFF'; do sleep 5; done
    23. # Migrate the node
    24. openstack server migrate --wait "$server_id"
    25. # Resize the VM
    26. openstack server resize confirm "$server_id"
    27. # Wait for the resize confirm to finish
    28. until openstack server show "$server_id" -f value -c status | grep -q 'SHUTOFF'; do sleep 5; done
    29. # Restart the VM
    30. openstack server start "$server_id"
    31. # Wait for the node to show up as Ready:
    32. until oc get node "$node_name" | grep -q "^${node_name}[[:space:]]\+Ready"; do sleep 5; done
    33. # Uncordon the node
    34. oc adm uncordon "$node_name"
    35. # Wait for cluster operators to stabilize

    If the script completes, the control plane machine is migrated to a new OpenStack node.