Recommended single-node OpenShift cluster configuration for vDU application workloads

    Additional resources

    OKD enables low latency processing for applications running on commercial off-the-shelf (COTS) hardware by using several technologies and specialized hardware devices:

    Real-time kernel for RHCOS

    Ensures workloads are handled with a high degree of process determinism.

    CPU isolation

    Avoids CPU scheduling delays and ensures CPU capacity is available consistently.

    NUMA-aware topology management

    Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the non-uniform memory access (NUMA) node. Pod resources for all Quality of Service (QoS) classes stay on the same NUMA node. This decreases latency and improves performance of the node.

    Huge pages memory management

    Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.

    Precision timing synchronization using PTP

    Allows synchronization between nodes in the network with sub-microsecond accuracy.

    Running vDU application workloads requires a bare-metal host with sufficient resources to run OKD services and production workloads.

    One vCPU is equivalent to one physical core when simultaneous multithreading (SMT), or Hyper-Threading, is not enabled. When enabled, use the following formula to calculate the corresponding ratio:

    • (threads per core × cores) × sockets = vCPUs

    The server must have a Baseboard Management Controller (BMC) when booting with virtual media.

    Bare-metal hosts require the firmware to be configured before the host can be provisioned. The firmware configuration is dependent on the specific hardware and the particular requirements of your installation.

    Procedure

    1. Set the UEFI/BIOS Boot Mode to .

    2. In the host boot sequence order, set Hard drive first.

    3. Apply the specific firmware configuration for your hardware. The following table describes a representative firmware configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.

      The exact firmware configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.

      Table 2. Sample firmware configuration for an Intel Xeon Skylake or Cascade Lake server
      Firmware settingConfiguration

      CPU Power and Performance Policy

      Performance

      Uncore Frequency Scaling

      Disabled

      Performance P-limit

      Disabled

      Enhanced Intel SpeedStep ® Tech

      Enabled

      Intel Configurable TDP

      Enabled

      Configurable TDP Level

      Level 2

      Intel® Turbo Boost Technology

      Enabled

      Energy Efficient Turbo

      Disabled

      Disabled

      Package C-State

      C0/C1 state

      C1E

      Disabled

      Processor C6

      Disabled

    Enable global SR-IOV and VT-d settings in the firmware for the host. These settings are relevant to bare-metal environments.

    Connectivity prerequisites for managed cluster networks

    Before you can install and provision a managed cluster with the GitOps Zero Touch Provisioning (ZTP) pipeline, the managed cluster host must meet the following networking prerequisites:

    • There must be bi-directional connectivity between the GitOps ZTP container in the hub cluster and the Baseboard Management Controller (BMC) of the target bare-metal host.

    • The managed cluster must be able to resolve and reach the API hostname of the hub hostname and *.apps hostname. Here is an example of the API hostname of the hub and *.apps hostname:

      • api.hub-cluster.internal.domain.com

      • console-openshift-console.apps.hub-cluster.internal.domain.com

    • The hub cluster must be able to resolve and reach the API and *.apps hostname of the managed cluster. Here is an example of the API hostname of the managed cluster and *.apps hostname:

      • api.sno-managed-cluster-1.internal.domain.com

      • console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com

    Workload partitioning configures OKD services, cluster management workloads, and infrastructure pods to run on a reserved number of host CPUs.

    To configure workload partitioning with GitOps Zero Touch Provisioning (ZTP), you specify cluster management CPU resources with the cpuset field of the SiteConfig custom resource (CR) and the reserved field of the group PolicyGenTemplate CR. The GitOps ZTP pipeline uses these values to populate the required fields in the workload partitioning MachineConfig CR (cpuset) and the PerformanceProfile CR (reserved) that configure the single-node OpenShift cluster.

    For maximum performance, ensure that the reserved and isolated CPU sets do not share CPU cores across NUMA zones.

    • The workload partitioning MachineConfig CR pins the OKD infrastructure pods to a defined cpuset configuration.

    • The PerformanceProfile CR pins the systemd services to the reserved CPUs.

    Additional resources

    • For the recommended single-node OpenShift workload partitioning configuration, see .

    The ZTP pipeline applies the following custom resources (CRs) during cluster installation. These configuration CRs ensure that the cluster meets the feature and performance requirements necessary for running a vDU application.

    When using the GitOps ZTP plugin and SiteConfig CRs for cluster deployment, the following MachineConfig CRs are included by default.

    Use the SiteConfig extraManifests filter to alter the CRs that are included by default. For more information, see .

    Single-node OpenShift clusters that run DU workloads require workload partitioning. This limits the cores allowed to run platform services, maximizing the CPU core for application payloads.

    Workload partitioning can only be enabled during cluster installation. You cannot disable workload partitioning post-installation. However, you can reconfigure workload partitioning by updating the cpu value that you define in the performance profile, and in the related MachineConfig custom resource (CR).

    • The base64-encoded CR that enables workload partitioning contains the CPU set that the management workloads are constrained to. Encode host-specific values for crio.conf and kubelet.conf in base64. Adjust the content to match the CPU set that is specified in the cluster performance profile. It must match the number of cores in the cluster host.

      Recommended workload partitioning configuration

    • When configured in the cluster host, the contents of /etc/crio/crio.conf.d/01-workload-partitioning should look like this:

      1. [crio.runtime.workloads.management]
      2. activation_annotation = "target.workload.openshift.io/management"
      3. annotation_prefix = "resources.workload.openshift.io"
      4. resources = { "cpushares" = 0, "cpuset" = "0-1,52-53" } (1)
      1The cpuset value varies based on the installation. If Hyper-Threading is enabled, specify both threads for each core. The cpuset value must match the reserved CPUs that you define in the spec.cpu.reserved field in the performance profile.
    • When configured in the cluster, the contents of /etc/kubernetes/openshift-workload-pinning should look like this:

      1. {
      2. "management": {
      3. "cpuset": "0-1,52-53" (1)
      4. }
      5. }
      1The cpuset must match the cpuset value in /etc/crio/crio.conf.d/01-workload-partitioning.

    Verification

    Check that the applications and cluster system CPU pinning is correct. Run the following commands:

    1. Open a remote shell connection to the managed cluster:

      1. $ oc debug node/example-sno-1
    2. Check that the user applications CPU pinning is correct:

      1. sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done

      Example output

      1. pid 8481's current affinity list: 0-3
      2. pid 8726's current affinity list: 0-3
      3. pid 9088's current affinity list: 0-3
      4. pid 9945's current affinity list: 0-3
      5. pid 10387's current affinity list: 0-3
      6. pid 12123's current affinity list: 0-3
      7. pid 13313's current affinity list: 0-3
    3. Check that the system applications CPU pinning is correct:

      1. sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done

      Example output

      1. pid 1's current affinity list: 0-3
      2. pid 938's current affinity list: 0-3
      3. pid 962's current affinity list: 0-3
      4. pid 1197's current affinity list: 0-3

    Reduced platform management footprint

    To reduce the overall management footprint of the platform, a MachineConfig custom resource (CR) is required that places all Kubernetes-specific mount points in a new namespace separate from the host operating system. The following base64-encoded example MachineConfig CR illustrates this configuration.

    Recommended container mount namespace configuration

    SCTP

    Stream Control Transmission Protocol (SCTP) is a key protocol used in RAN applications. This MachineConfig object adds the SCTP kernel module to the node to enable this protocol.

    Recommended SCTP configuration

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfig
    3. metadata:
    4. labels:
    5. machineconfiguration.openshift.io/role: master
    6. name: load-sctp-module
    7. spec:
    8. config:
    9. ignition:
    10. version: 2.2.0
    11. storage:
    12. files:
    13. - contents:
    14. source: data:,
    15. verification: {}
    16. filesystem: root
    17. mode: 420
    18. path: /etc/modprobe.d/sctp-blacklist.conf
    19. - contents:
    20. source: data:text/plain;charset=utf-8,sctp
    21. filesystem: root
    22. mode: 420
    23. path: /etc/modules-load.d/sctp-load.conf

    Accelerated container startup

    The following MachineConfig CR configures core OpenShift processes and containers to use all available CPU cores during system startup and shutdown. This accelerates the system recovery during initial boot and reboots.

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfig
    3. metadata:
    4. labels:
    5. machineconfiguration.openshift.io/role: master
    6. name: 04-accelerated-container-startup-master
    7. spec:
    8. config:
    9. ignition:
    10. version: 3.2.0
    11. storage:
    12. files:
    13. - contents:
    14. source: data:text/plain;charset=utf-8;base64,#!/bin/bash
#
# Temporarily reset the core system processes's CPU affinity to be unrestricted to accelerate startup and shutdown
#
# The defaults below can be overridden via environment variables
#

# The default set of critical processes whose affinity should be temporarily unbound:
CRITICAL_PROCESSES=${CRITICAL_PROCESSES:-"systemd ovs crio kubelet NetworkManager conmon dbus"}

# Default wait time is 600s = 10m:
MAXIMUM_WAIT_TIME=${MAXIMUM_WAIT_TIME:-600}

# Default steady-state threshold = 2%
# Allowed values:
#  4  - absolute pod count (+/-)
#  4% - percent change (+/-)
#  -1 - disable the steady-state check
STEADY_STATE_THRESHOLD=${STEADY_STATE_THRESHOLD:-2%}

# Default steady-state window = 60s
# If the running pod count stays within the given threshold for this time
# period, return CPU utilization to normal before the maximum wait time has
# expires
STEADY_STATE_WINDOW=${STEADY_STATE_WINDOW:-60}

# Default steady-state allows any pod count to be "steady state"
# Increasing this will skip any steady-state checks until the count rises above
# this number to avoid false positives if there are some periods where the
# count doesn't increase but we know we can't be at steady-state yet.
STEADY_STATE_MINIMUM=${STEADY_STATE_MINIMUM:-0}

#######################################################

KUBELET_CPU_STATE=/var/lib/kubelet/cpu_manager_state
FULL_CPU_STATE=/sys/fs/cgroup/cpuset/cpuset.cpus
unrestrictedCpuset() {
  local cpus
  if [[ -e $KUBELET_CPU_STATE ]]; then
      cpus=$(jq -r '.defaultCpuSet' <$KUBELET_CPU_STATE)
  fi
  if [[ -z $cpus ]]; then
    # fall back to using all cpus if the kubelet state is not configured yet
    [[ -e $FULL_CPU_STATE ]] || return 1
    cpus=$(<$FULL_CPU_STATE)
  fi
  echo $cpus
}

restrictedCpuset() {
  for arg in $(</proc/cmdline); do
    if [[ $arg =~ ^systemd.cpu_affinity= ]]; then
      echo ${arg#*=}
      return 0
    fi
  done
  return 1
}

getCPUCount () {
  local cpuset="$1"
  local cpulist=()
  local cpus=0
  local mincpus=2

  if [[ -z $cpuset || $cpuset =~ [^0-9,-] ]]; then
    echo $mincpus
    return 1
  fi

  IFS=',' read -ra cpulist <<< $cpuset

  for elm in "${cpulist[@]}"; do
    if [[ $elm =~ ^[0-9]+$ ]]; then
      (( cpus++ ))
    elif [[ $elm =~ ^[0-9]+-[0-9]+$ ]]; then
      local low=0 high=0
      IFS='-' read low high <<< $elm
      (( cpus += high - low + 1 ))
    else
      echo $mincpus
      return 1
    fi
  done

  # Return a minimum of 2 cpus
  echo $(( cpus > $mincpus ? cpus : $mincpus ))
  return 0
}

resetOVSthreads () {
  local cpucount="$1"
  local curRevalidators=0
  local curHandlers=0
  local desiredRevalidators=0
  local desiredHandlers=0
  local rc=0

  curRevalidators=$(ps -Teo pid,tid,comm,cmd | grep -e revalidator | grep -c ovs-vswitchd)
  curHandlers=$(ps -Teo pid,tid,comm,cmd | grep -e handler | grep -c ovs-vswitchd)

  # Calculate the desired number of threads the same way OVS does.
  # OVS will set these thread count as a one shot process on startup, so we
  # have to adjust up or down during the boot up process. The desired outcome is
  # to not restrict the number of thread at startup until we reach a steady
  # state.  At which point we need to reset these based on our restricted  set
  # of cores.
  # See OVS function that calculates these thread counts:
  # https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-upcall.c#L635
  (( desiredRevalidators=$cpucount / 4 + 1 ))
  (( desiredHandlers=$cpucount - $desiredRevalidators ))


  if [[ $curRevalidators -ne $desiredRevalidators || $curHandlers -ne $desiredHandlers ]]; then

    logger "Recovery: Re-setting OVS revalidator threads: ${curRevalidators} -> ${desiredRevalidators}"
    logger "Recovery: Re-setting OVS handler threads: ${curHandlers} -> ${desiredHandlers}"

    ovs-vsctl set \
      Open_vSwitch . \
      other-config:n-handler-threads=${desiredHandlers} \
      other-config:n-revalidator-threads=${desiredRevalidators}
    rc=$?
  fi

  return $rc
}

resetAffinity() {
  local cpuset="$1"
  local failcount=0
  local successcount=0
  logger "Recovery: Setting CPU affinity for critical processes \"$CRITICAL_PROCESSES\" to $cpuset"
  for proc in $CRITICAL_PROCESSES; do
    local pids="$(pgrep $proc)"
    for pid in $pids; do
      local tasksetOutput
      tasksetOutput="$(taskset -apc "$cpuset" $pid 2>&1)"
      if [[ $? -ne 0 ]]; then
        echo "ERROR: $tasksetOutput"
        ((failcount++))
      else
        ((successcount++))
      fi
    done
  done

  resetOVSthreads "$(getCPUCount ${cpuset})"
  if [[ $? -ne 0 ]]; then
    ((failcount++))
  else
    ((successcount++))
  fi

  logger "Recovery: Re-affined $successcount pids successfully"
  if [[ $failcount -gt 0 ]]; then
    logger "Recovery: Failed to re-affine $failcount processes"
    return 1
  fi
}

setUnrestricted() {
  logger "Recovery: Setting critical system processes to have unrestricted CPU access"
  resetAffinity "$(unrestrictedCpuset)"
}

setRestricted() {
  logger "Recovery: Resetting critical system processes back to normally restricted access"
  resetAffinity "$(restrictedCpuset)"
}

currentAffinity() {
  local pid="$1"
  taskset -pc $pid | awk -F': ' '{print $2}'
}

within() {
  local last=$1 current=$2 threshold=$3
  local delta=0 pchange
  delta=$(( current - last ))
  if [[ $current -eq $last ]]; then
    pchange=0
  elif [[ $last -eq 0 ]]; then
    pchange=1000000
  else
    pchange=$(( ( $delta * 100) / last ))
  fi
  echo -n "last:$last current:$current delta:$delta pchange:${pchange}%: "
  local absolute limit
  case $threshold in
    *%)
      absolute=${pchange##-} # absolute value
      limit=${threshold%%%}
      ;;
    *)
      absolute=${delta##-} # absolute value
      limit=$threshold
      ;;
  esac
  if [[ $absolute -le $limit ]]; then
    echo "within (+/-)$threshold"
    return 0
  else
    echo "outside (+/-)$threshold"
    return 1
  fi
}

steadystate() {
  local last=$1 current=$2
  if [[ $last -lt $STEADY_STATE_MINIMUM ]]; then
    echo "last:$last current:$current Waiting to reach $STEADY_STATE_MINIMUM before checking for steady-state"
    return 1
  fi
  within $last $current $STEADY_STATE_THRESHOLD
}

waitForReady() {
  logger "Recovery: Waiting ${MAXIMUM_WAIT_TIME}s for the initialization to complete"
  local lastSystemdCpuset="$(currentAffinity 1)"
  local lastDesiredCpuset="$(unrestrictedCpuset)"
  local t=0 s=10
  local lastCcount=0 ccount=0 steadyStateTime=0
  while [[ $t -lt $MAXIMUM_WAIT_TIME ]]; do
    sleep $s
    ((t += s))
    # Re-check the current affinity of systemd, in case some other process has changed it
    local systemdCpuset="$(currentAffinity 1)"
    # Re-check the unrestricted Cpuset, as the allowed set of unreserved cores may change as pods are assigned to cores
    local desiredCpuset="$(unrestrictedCpuset)"
    if [[ $systemdCpuset != $lastSystemdCpuset || $lastDesiredCpuset != $desiredCpuset ]]; then
      resetAffinity "$desiredCpuset"
      lastSystemdCpuset="$(currentAffinity 1)"
      lastDesiredCpuset="$desiredCpuset"
    fi

    # Detect steady-state pod count
    ccount=$(crictl ps | wc -l)
    if steadystate $lastCcount $ccount; then
      ((steadyStateTime += s))
      echo "Steady-state for ${steadyStateTime}s/${STEADY_STATE_WINDOW}s"
      if [[ $steadyStateTime -ge $STEADY_STATE_WINDOW ]]; then
        logger "Recovery: Steady-state (+/- $STEADY_STATE_THRESHOLD) for ${STEADY_STATE_WINDOW}s: Done"
        return 0
      fi
    else
      if [[ $steadyStateTime -gt 0 ]]; then
        echo "Resetting steady-state timer"
        steadyStateTime=0
      fi
    fi
    lastCcount=$ccount
  done
  logger "Recovery: Recovery Complete Timeout"
}

main() {
  if ! unrestrictedCpuset >&/dev/null; then
    logger "Recovery: No unrestricted Cpuset could be detected"
    return 1
  fi

  if ! restrictedCpuset >&/dev/null; then
    logger "Recovery: No restricted Cpuset has been configured.  We are already running unrestricted."
    return 0
  fi

  # Ensure we reset the CPU affinity when we exit this script for any reason
  # This way either after the timer expires or after the process is interrupted
  # via ^C or SIGTERM, we return things back to the way they should be.
  trap setRestricted EXIT

  logger "Recovery: Recovery Mode Starting"
  setUnrestricted
  waitForReady
}

if [[ "${BASH_SOURCE[0]}" = "${0}" ]]; then
  main "${@}"
  exit $?
fi

    15. mode: 493
    16. path: /usr/local/bin/accelerated-container-startup.sh
    17. systemd:
    18. units:
    19. - contents: |
    20. [Unit]
    21. Description=Unlocks more CPUs for critical system processes during container startup
    22. [Service]
    23. Type=simple
    24. ExecStart=/usr/local/bin/accelerated-container-startup.sh
    25. # Maximum wait time is 600s = 10m:
    26. Environment=MAXIMUM_WAIT_TIME=600
    27. # Steady-state threshold = 2%
    28. # Allowed values:
    29. # 4 - absolute pod count (+/-)
    30. # 4% - percent change (+/-)
    31. # -1 - disable the steady-state check
    32. # Note: '%' must be escaped as '%%' in systemd unit files
    33. Environment=STEADY_STATE_THRESHOLD=2%%
    34. # Steady-state window = 120s
    35. # If the running pod count stays within the given threshold for this time
    36. # period, return CPU utilization to normal before the maximum wait time has
    37. # expires
    38. Environment=STEADY_STATE_WINDOW=120
    39. # Steady-state minimum = 40
    40. # Increasing this will skip any steady-state checks until the count rises above
    41. # this number to avoid false positives if there are some periods where the
    42. # count doesn't increase but we know we can't be at steady-state yet.
    43. Environment=STEADY_STATE_MINIMUM=40
    44. [Install]
    45. WantedBy=multi-user.target
    46. enabled: true
    47. name: accelerated-container-startup.service
    48. - contents: |
    49. [Unit]
    50. Description=Unlocks more CPUs for critical system processes during container shutdown
    51. DefaultDependencies=no
    52. [Service]
    53. Type=simple
    54. ExecStart=/usr/local/bin/accelerated-container-startup.sh
    55. # Maximum wait time is 600s = 10m:
    56. Environment=MAXIMUM_WAIT_TIME=600
    57. # Steady-state threshold
    58. # Allowed values:
    59. # 4 - absolute pod count (+/-)
    60. # 4% - percent change (+/-)
    61. # -1 - disable the steady-state check
    62. # Note: '%' must be escaped as '%%' in systemd unit files
    63. Environment=STEADY_STATE_THRESHOLD=-1
    64. # Steady-state window = 60s
    65. # If the running pod count stays within the given threshold for this time
    66. # period, return CPU utilization to normal before the maximum wait time has
    67. # expires
    68. Environment=STEADY_STATE_WINDOW=60
    69. [Install]
    70. WantedBy=shutdown.target reboot.target halt.target
    71. enabled: true
    72. name: accelerated-container-shutdown.service

    Automatic kernel crash dumps with kdump

    kdump is a Linux kernel feature that creates a kernel crash dump when the kernel crashes. kdump is enabled with the following MachineConfig CR:

    Recommended kdump configuration

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfig
    3. metadata:
    4. labels:
    5. machineconfiguration.openshift.io/role: master
    6. name: 06-kdump-enable-master
    7. spec:
    8. config:
    9. ignition:
    10. version: 3.2.0
    11. systemd:
    12. units:
    13. - enabled: true
    14. name: kdump.service
    15. kernelArguments:
    16. - crashkernel=512M

    The following ContainerRuntimeConfig custom resources (CRs) configure crun as the default OCI container runtime for control plane and worker nodes. The crun container runtime is fast and lightweight and has a low memory footprint.

    For optimal performance, enable crun for master and worker nodes in single-node OpenShift, three-node OpenShift, and standard clusters. To avoid the cluster rebooting when the CR is applied, apply the change as a GitOps ZTP additional day-0 install-time manifest.

    Recommended ContainerRuntimeConfig CR for control plane nodes

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: ContainerRuntimeConfig
    3. metadata:
    4. name: enable-crun-master
    5. spec:
    6. machineConfigPoolSelector:
    7. matchLabels:
    8. pools.operator.machineconfiguration.openshift.io/master: ""
    9. containerRuntimeConfig:
    10. defaultRuntime: crun

    Recommended ContainerRuntimeConfig CR for worker nodes

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. metadata:
    3. name: enable-crun-worker
    4. spec:
    5. matchLabels:
    6. pools.operator.machineconfiguration.openshift.io/worker: ""
    7. containerRuntimeConfig:
    8. defaultRuntime: crun

    When the cluster installation is complete, the ZTP pipeline applies the following custom resources (CRs) that are required to run DU workloads.

    In GitOps ZTP v4.10 and earlier, you configure UEFI secure boot with a MachineConfig CR. This is no longer required in GitOps ZTP v4.11 and later. In v4.11, you configure UEFI secure boot for single-node OpenShift clusters using Performance profile CRs. For more information, see .

    Operator namespaces and Operator groups

    Single-node OpenShift clusters that run DU workloads require the following OperatorGroup and Namespace custom resources (CRs):

    • Local Storage Operator

    • Logging Operator

    • PTP Operator

    • SR-IOV Network Operator

    The following YAML summarizes these CRs:

    Recommended Operator Namespace and OperatorGroup configuration

    1. apiVersion: v1
    2. kind: Namespace
    3. metadata:
    4. annotations:
    5. workload.openshift.io/allowed: management
    6. name: openshift-local-storage
    7. ---
    8. apiVersion: operators.coreos.com/v1
    9. kind: OperatorGroup
    10. metadata:
    11. name: openshift-local-storage
    12. namespace: openshift-local-storage
    13. spec:
    14. targetNamespaces:
    15. - openshift-local-storage
    16. ---
    17. apiVersion: v1
    18. kind: Namespace
    19. metadata:
    20. annotations:
    21. workload.openshift.io/allowed: management
    22. name: openshift-logging
    23. ---
    24. apiVersion: operators.coreos.com/v1
    25. kind: OperatorGroup
    26. metadata:
    27. name: cluster-logging
    28. namespace: openshift-logging
    29. spec:
    30. targetNamespaces:
    31. - openshift-logging
    32. ---
    33. apiVersion: v1
    34. kind: Namespace
    35. metadata:
    36. annotations:
    37. workload.openshift.io/allowed: management
    38. labels:
    39. openshift.io/cluster-monitoring: "true"
    40. name: openshift-ptp
    41. ---
    42. apiVersion: operators.coreos.com/v1
    43. kind: OperatorGroup
    44. metadata:
    45. name: ptp-operators
    46. namespace: openshift-ptp
    47. spec:
    48. targetNamespaces:
    49. - openshift-ptp
    50. ---
    51. apiVersion: v1
    52. kind: Namespace
    53. metadata:
    54. annotations:
    55. workload.openshift.io/allowed: management
    56. name: openshift-sriov-network-operator
    57. ---
    58. apiVersion: operators.coreos.com/v1
    59. kind: OperatorGroup
    60. metadata:
    61. name: sriov-network-operators
    62. namespace: openshift-sriov-network-operator
    63. spec:
    64. targetNamespaces:
    65. - openshift-sriov-network-operator

    Operator subscriptions

    Single-node OpenShift clusters that run DU workloads require the following Subscription CRs. The subscription provides the location to download the following Operators:

    • Local Storage Operator

    • Logging Operator

    • PTP Operator

    • SR-IOV Network Operator

    Recommended Operator subscriptions

    1. apiVersion: operators.coreos.com/v1alpha1
    2. kind: Subscription
    3. metadata:
    4. name: cluster-logging
    5. namespace: openshift-logging
    6. spec:
    7. channel: "stable" (1)
    8. name: cluster-logging
    9. source: redhat-operators
    10. sourceNamespace: openshift-marketplace
    11. installPlanApproval: Manual (2)
    12. ---
    13. apiVersion: operators.coreos.com/v1alpha1
    14. kind: Subscription
    15. metadata:
    16. name: local-storage-operator
    17. namespace: openshift-local-storage
    18. spec:
    19. channel: "stable"
    20. installPlanApproval: Automatic
    21. name: local-storage-operator
    22. source: redhat-operators
    23. sourceNamespace: openshift-marketplace
    24. installPlanApproval: Manual
    25. ---
    26. apiVersion: operators.coreos.com/v1alpha1
    27. kind: Subscription
    28. metadata:
    29. name: ptp-operator-subscription
    30. namespace: openshift-ptp
    31. spec:
    32. channel: "stable"
    33. name: ptp-operator
    34. source: redhat-operators
    35. sourceNamespace: openshift-marketplace
    36. installPlanApproval: Manual
    37. ---
    38. apiVersion: operators.coreos.com/v1alpha1
    39. kind: Subscription
    40. metadata:
    41. name: sriov-network-operator-subscription
    42. namespace: openshift-sriov-network-operator
    43. spec:
    44. channel: "stable"
    45. name: sriov-network-operator
    46. source: redhat-operators
    47. sourceNamespace: openshift-marketplace
    48. installPlanApproval: Manual

    Cluster logging and log forwarding

    Single-node OpenShift clusters that run DU workloads require logging and log forwarding for debugging. The following example YAML illustrates the required ClusterLogging and ClusterLogForwarder CRs.

    Recommended cluster logging and log forwarding configuration

    1Updates the existing ClusterLogging instance or creates the instance if it does not exist.
    2Updates the existing ClusterLogForwarder instance or creates the instance if it does not exist.
    3Specifies the URL of the Kafka server where the logs are forwarded to.

    Performance profile

    Single-node OpenShift clusters that run DU workloads require a Node Tuning Operator performance profile to use real-time host capabilities and services.

    In earlier versions of OKD, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OKD 4.11 and later, this functionality is part of the Node Tuning Operator.

    The following example PerformanceProfile CR illustrates the required cluster configuration.

    Recommended performance profile configuration

    1. apiVersion: performance.openshift.io/v2
    2. kind: PerformanceProfile
    3. metadata:
    4. name: openshift-node-performance-profile (1)
    5. spec:
    6. additionalKernelArgs:
    7. - "rcupdate.rcu_normal_after_boot=0"
    8. - "efi=runtime" (2)
    9. cpu:
    10. isolated: 2-51,54-103 (3)
    11. reserved: 0-1,52-53 (4)
    12. hugepages:
    13. defaultHugepagesSize: 1G
    14. pages:
    15. - count: 32 (5)
    16. size: 1G (6)
    17. node: 0 (7)
    18. machineConfigPoolSelector:
    19. pools.operator.machineconfiguration.openshift.io/master: ""
    20. nodeSelector:
    21. node-role.kubernetes.io/master: ""
    22. numa:
    23. topologyPolicy: "restricted"
    24. realTimeKernel:
    25. enabled: true (8)
    1Ensure that the value for name matches that specified in the spec.profile.data field of TunedPerformancePatch.yaml and the status.configuration.source.name field of validatorCRs/informDuValidator.yaml.
    2Configures UEFI secure boot for the cluster host.
    3Set the isolated CPUs. Ensure all of the Hyper-Threading pairs match.

    The reserved and isolated CPU pools must not overlap and together must span all available cores. CPU cores that are not accounted for cause an undefined behaviour in the system.

    4Set the reserved CPUs. When workload partitioning is enabled, system processes, kernel threads, and system container threads are restricted to these CPUs. All CPUs that are not isolated should be reserved.
    5Set the number of huge pages.
    6Set the huge page size.
    7Set node to the NUMA node where the hugepages are allocated.
    8Set enabled to true to install the real-time Linux kernel.

    Single-node OpenShift clusters use Precision Time Protocol (PTP) for network time synchronization. The following example PtpConfig CR illustrates the required PTP slave configuration.

    Recommended PTP configuration

    1. apiVersion: ptp.openshift.io/v1
    2. kind: PtpConfig
    3. metadata:
    4. name: du-ptp-slave
    5. namespace: openshift-ptp
    6. spec:
    7. profile:
    8. - interface: ens5f0 (1)
    9. name: slave
    10. phc2sysOpts: -a -r -n 24
    11. ptp4lConf: |
    12. [global]
    13. #
    14. # Default Data Set
    15. #
    16. twoStepFlag 1
    17. slaveOnly 0
    18. priority1 128
    19. priority2 128
    20. domainNumber 24
    21. #utc_offset 37
    22. clockClass 248
    23. clockAccuracy 0xFE
    24. offsetScaledLogVariance 0xFFFF
    25. free_running 0
    26. freq_est_interval 1
    27. dscp_event 0
    28. dscp_general 0
    29. dataset_comparison ieee1588
    30. G.8275.defaultDS.localPriority 128
    31. #
    32. # Port Data Set
    33. #
    34. logAnnounceInterval -3
    35. logSyncInterval -4
    36. logMinDelayReqInterval -4
    37. logMinPdelayReqInterval -4
    38. announceReceiptTimeout 3
    39. syncReceiptTimeout 0
    40. delayAsymmetry 0
    41. fault_reset_interval 4
    42. neighborPropDelayThresh 20000000
    43. masterOnly 0
    44. G.8275.portDS.localPriority 128
    45. #
    46. #
    47. assume_two_step 0
    48. path_trace_enabled 0
    49. follow_up_info 0
    50. hybrid_e2e 0
    51. inhibit_multicast_service 0
    52. net_sync_monitor 0
    53. tc_spanning_tree 0
    54. tx_timestamp_timeout 1
    55. unicast_listen 0
    56. unicast_master_table 0
    57. unicast_req_duration 3600
    58. use_syslog 1
    59. verbose 0
    60. summary_interval 0
    61. kernel_leap 1
    62. check_fup_sync 0
    63. #
    64. # Servo Options
    65. #
    66. pi_proportional_const 0.0
    67. pi_integral_const 0.0
    68. pi_proportional_scale 0.0
    69. pi_proportional_exponent -0.3
    70. pi_proportional_norm_max 0.7
    71. pi_integral_scale 0.0
    72. pi_integral_exponent 0.4
    73. pi_integral_norm_max 0.3
    74. step_threshold 2.0
    75. first_step_threshold 0.00002
    76. max_frequency 900000000
    77. clock_servo pi
    78. sanity_freq_limit 200000000
    79. ntpshm_segment 0
    80. #
    81. # Transport options
    82. #
    83. transportSpecific 0x0
    84. ptp_dst_mac 01:1B:19:00:00:00
    85. p2p_dst_mac 01:80:C2:00:00:0E
    86. udp_ttl 1
    87. udp6_scope 0x0E
    88. uds_address /var/run/ptp4l
    89. #
    90. # Default interface options
    91. #
    92. clock_type OC
    93. network_transport L2
    94. delay_mechanism E2E
    95. time_stamping hardware
    96. tsproc_mode filter
    97. delay_filter moving_median
    98. delay_filter_length 10
    99. egressLatency 0
    100. ingressLatency 0
    101. boundary_clock_jbod 0
    102. #
    103. # Clock description
    104. #
    105. productDescription ;;
    106. revisionData ;;
    107. manufacturerIdentity 00:00:00
    108. userDescription ;
    109. timeSource 0xA0
    110. ptp4lOpts: -2 -s --summary_interval -4
    111. recommend:
    112. - match:
    113. - nodeLabel: node-role.kubernetes.io/master
    114. priority: 4
    115. profile: slave
    1Sets the interface used to receive the PTP clock signal.

    Extended Tuned profile

    Single-node OpenShift clusters that run DU workloads require additional performance tuning configurations necessary for high-performance workloads. The following example Tuned CR extends the Tuned profile:

    Recommended extended Tuned profile configuration

    1. apiVersion: tuned.openshift.io/v1
    2. kind: Tuned
    3. metadata:
    4. name: performance-patch
    5. namespace: openshift-cluster-node-tuning-operator
    6. spec:
    7. profile:
    8. - data: |
    9. [main]
    10. summary=Configuration changes profile inherited from performance created tuned
    11. include=openshift-node-performance-openshift-node-performance-profile
    12. [bootloader]
    13. cmdline_crash=nohz_full=2-51,54-103
    14. [sysctl]
    15. kernel.timer_migration=1
    16. [scheduler]
    17. group.ice-ptp=0:f:10:*:ice-ptp.*
    18. [service]
    19. service.stalld=start,enable
    20. service.chronyd=stop,disable
    21. name: performance-patch
    22. recommend:
    23. - machineConfigLabels:
    24. machineconfiguration.openshift.io/role: master
    25. priority: 19
    26. profile: performance-patch

    SR-IOV

    Single root I/O virtualization (SR-IOV) is commonly used to enable the fronthaul and the midhaul networks. The following YAML example configures SR-IOV for a single-node OpenShift cluster.

    Recommended SR-IOV configuration

    1. apiVersion: sriovnetwork.openshift.io/v1
    2. kind: SriovOperatorConfig
    3. metadata:
    4. name: default
    5. namespace: openshift-sriov-network-operator
    6. spec:
    7. configDaemonNodeSelector:
    8. node-role.kubernetes.io/master: ""
    9. disableDrain: true
    10. enableInjector: true
    11. enableOperatorWebhook: true
    12. ---
    13. apiVersion: sriovnetwork.openshift.io/v1
    14. kind: SriovNetwork
    15. metadata:
    16. name: sriov-nw-du-mh
    17. namespace: openshift-sriov-network-operator
    18. spec:
    19. networkNamespace: openshift-sriov-network-operator
    20. resourceName: du_mh
    21. vlan: 150 (1)
    22. ---
    23. apiVersion: sriovnetwork.openshift.io/v1
    24. kind: SriovNetworkNodePolicy
    25. metadata:
    26. name: sriov-nnp-du-mh
    27. namespace: openshift-sriov-network-operator
    28. spec:
    29. deviceType: vfio-pci (2)
    30. isRdma: false
    31. nicSelector:
    32. pfNames:
    33. - ens7f0 (3)
    34. nodeSelector:
    35. node-role.kubernetes.io/master: ""
    36. numVfs: 8 (4)
    37. priority: 10
    38. resourceName: du_mh
    39. ---
    40. apiVersion: sriovnetwork.openshift.io/v1
    41. kind: SriovNetwork
    42. metadata:
    43. name: sriov-nw-du-fh
    44. namespace: openshift-sriov-network-operator
    45. spec:
    46. networkNamespace: openshift-sriov-network-operator
    47. resourceName: du_fh
    48. vlan: 140 (5)
    49. ---
    50. apiVersion: sriovnetwork.openshift.io/v1
    51. kind: SriovNetworkNodePolicy
    52. metadata:
    53. name: sriov-nnp-du-fh
    54. namespace: openshift-sriov-network-operator
    55. spec:
    56. deviceType: netdevice (6)
    57. isRdma: true
    58. nicSelector:
    59. pfNames:
    60. - ens5f0 (7)
    61. nodeSelector:
    62. node-role.kubernetes.io/master: ""
    63. numVfs: 8 (8)
    64. priority: 10
    65. resourceName: du_fh
    1Specifies the VLAN for the midhaul network.
    2Select either vfio-pci or netdevice, as needed.
    3Specifies the interface connected to the midhaul network.
    4Specifies the number of VFs for the midhaul network.
    5The VLAN for the fronthaul network.
    6Select either vfio-pci or netdevice, as needed.
    7Specifies the interface connected to the fronthaul network.
    8Specifies the number of VFs for the fronthaul network.

    Console Operator

    The console-operator installs and maintains the web console on a cluster. When the node is centrally managed the Operator is not needed and makes space for application workloads. The following Console custom resource (CR) example disables the console.

    Recommended console configuration

    1. apiVersion: operator.openshift.io/v1
    2. kind: Console
    3. metadata:
    4. annotations:
    5. include.release.openshift.io/ibm-cloud-managed: "false"
    6. include.release.openshift.io/self-managed-high-availability: "false"
    7. include.release.openshift.io/single-node-developer: "false"
    8. release.openshift.io/create-only: "true"
    9. name: cluster
    10. spec:
    11. logLevel: Normal
    12. managementState: Removed
    13. operatorLogLevel: Normal

    Grafana and Alertmanager

    Single-node OpenShift clusters that run DU workloads require reduced CPU resources consumed by the OKD monitoring components. The following ConfigMap custom resource (CR) disables Grafana and Alertmanager.

    Recommended cluster monitoring configuration

    1. apiVersion: v1
    2. kind: ConfigMap
    3. metadata:
    4. name: cluster-monitoring-config
    5. namespace: openshift-monitoring
    6. data:
    7. config.yaml: |
    8. grafana:
    9. enabled: false
    10. alertmanagerMain:
    11. enabled: false
    12. prometheusK8s:
    13. retention: 24h

    You can dynamically provision local storage on single-node OpenShift clusters with Logical volume manager storage (LVM Storage).

    The following YAML example configures the storage of the node to be available to OKD applications.

    Recommended LVMCluster configuration

    1. apiVersion: lvm.topolvm.io/v1alpha1
    2. kind: LVMCluster
    3. metadata:
    4. name: odf-lvmcluster
    5. namespace: openshift-storage
    6. spec:
    7. storage:
    8. deviceClasses:
    9. - name: vg1
    10. deviceSelector: (1)
    11. paths:
    12. - /usr/disk/by-path/pci-0000:11:00.0-nvme-1
    13. thinPoolConfig:
    14. name: thin-pool-1
    15. overprovisionRatio: 10
    1If no disks are specified in the deviceSelector.paths field, the LVM Storage uses all the unused disks in the specified thin pool.

    Network diagnostics

    Single-node OpenShift clusters that run DU workloads require less inter-pod network connectivity checks to reduce the additional load created by these pods. The following custom resource (CR) disables these checks.

    Recommended network diagnostics configuration