The Node Feature Discovery Operator

    The Node Feature Discovery Operator (NFD) manages the detection of hardware features and configuration in a OKD cluster by labeling the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.

    The NFD Operator can be found on the Operator Hub by searching for “Node Feature Discovery”.

    The Node Feature Discovery (NFD) Operator orchestrates all resources needed to run the NFD daemon set. As a cluster administrator, you can install the NFD Operator using the OKD CLI or the web console.

    As a cluster administrator, you can install the NFD Operator using the CLI.

    Prerequisites

    • An OKD cluster

    • Install the OpenShift CLI ().

    • Log in as a user with cluster-admin privileges.

    Procedure

    1. Create a namespace for the NFD Operator.

      1. Create the following Namespace custom resource (CR) that defines the openshift-nfd namespace, and then save the YAML in the nfd-namespace.yaml file:

      2. Create the namespace by running the following command:

        1. $ oc create -f nfd-namespace.yaml
    2. Install the NFD Operator in the namespace you created in the previous step by creating the following objects:

      1. Create the following OperatorGroup CR and save the YAML in the nfd-operatorgroup.yaml file:

        1. apiVersion: operators.coreos.com/v1
        2. kind: OperatorGroup
        3. metadata:
        4. generateName: openshift-nfd-
        5. name: openshift-nfd
        6. namespace: openshift-nfd
        7. spec:
        8. targetNamespaces:
        9. - openshift-nfd
      2. Create the OperatorGroup CR by running the following command:

        1. $ oc create -f nfd-operatorgroup.yaml
      3. Run the following command to get the channel value required for the next step.

        1. $ oc get packagemanifest nfd -n openshift-marketplace -o jsonpath='{.status.defaultChannel}'

        Example output

        1. 4.8
      4. Create the following Subscription CR and save the YAML in the nfd-sub.yaml file:

        Example Subscription

        1. apiVersion: operators.coreos.com/v1alpha1
        2. kind: Subscription
        3. metadata:
        4. name: nfd
        5. namespace: openshift-nfd
        6. spec:
        7. channel: "4.8"
        8. installPlanApproval: Automatic
        9. name: nfd
        10. source: redhat-operators
        11. sourceNamespace: openshift-marketplace
      5. Create the subscription object by running the following command:

        1. $ oc create -f nfd-sub.yaml
      6. Change to the openshift-nfd project:

        1. $ oc project openshift-nfd

    Verification

    • To verify that the Operator deployment is successful, run:

      Example output

      1. NAME READY STATUS RESTARTS AGE
      2. nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 10m

      A successful deployment shows a Running status.

    Installing the NFD Operator using the web console

    As a cluster administrator, you can install the NFD Operator using the web console.

    Procedure

    1. In the OKD web console, click OperatorsOperatorHub.

    2. Choose Node Feature Discovery from the list of available Operators, and then click Install.

    3. On the Install Operator page, select a specific namespace on the cluster, select the namespace created in the previous section, and then click Install.

    Verification

    To verify that the NFD Operator installed successfully:

    1. Navigate to the OperatorsInstalled Operators page.

    2. Ensure that Node Feature Discovery is listed in the openshift-nfd project with a Status of InstallSucceeded.

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

    Troubleshooting

    If the Operator does not appear as installed, troubleshoot further:

    1. Navigate to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

    2. Navigate to the WorkloadsPods page and check the logs for pods in the openshift-nfd project.

    The Node Feature Discovery (NFD) Operator orchestrates all resources needed to run the Node-Feature-Discovery daemon set by watching for a NodeFeatureDiscovery CR. Based on the NodeFeatureDiscovery CR, the Operator will create the operand (NFD) components in the desired namespace. You can edit the CR to choose another namespace, image, imagePullPolicy, and nfd-worker-conf, among other options.

    As a cluster administrator, you can create a NodeFeatureDiscovery instance using the OKD CLI or the web console.

    As a cluster administrator, you can create a NodeFeatureDiscovery CR instance using the CLI.

    Prerequisites

    • An OKD cluster

    • Install the OpenShift CLI (oc).

    • Log in as a user with cluster-admin privileges.

    Procedure

      1. apiVersion: nfd.openshift.io/v1
      2. kind: NodeFeatureDiscovery
      3. metadata:
      4. name: nfd-instance
      5. namespace: openshift-nfd
      6. spec:
      7. instance: "" # instance is empty by default
      8. operand:
      9. namespace: openshift-nfd
      10. image: quay.io/openshift/origin-node-feature-discovery:4.8
      11. imagePullPolicy: Always
      12. workerConfig:
      13. configData: |
      14. #core:
      15. # labelWhiteList:
      16. # noPublish: false
      17. # sleepInterval: 60s
      18. # sources: [all]
      19. # klog:
      20. # addDirHeader: false
      21. # alsologtostderr: false
      22. # logBacktraceAt:
      23. # logtostderr: true
      24. # skipHeaders: false
      25. # stderrthreshold: 2
      26. # v: 0
      27. # vmodule:
      28. ## NOTE: the following options are not dynamically run-time configurable
      29. ## and require a nfd-worker restart to take effect after being changed
      30. # logDir:
      31. # logFile:
      32. # logFileMaxSize: 1800
      33. # skipLogHeaders: false
      34. #sources:
      35. # cpu:
      36. # cpuid:
      37. ## NOTE: whitelist has priority over blacklist
      38. # attributeBlacklist:
      39. # - "BMI1"
      40. # - "BMI2"
      41. # - "CLMUL"
      42. # - "CMOV"
      43. # - "CX16"
      44. # - "ERMS"
      45. # - "HTT"
      46. # - "LZCNT"
      47. # - "MMX"
      48. # - "MMXEXT"
      49. # - "NX"
      50. # - "POPCNT"
      51. # - "RDRAND"
      52. # - "RDSEED"
      53. # - "RDTSCP"
      54. # - "SGX"
      55. # - "SSE"
      56. # - "SSE2"
      57. # - "SSE3"
      58. # - "SSE4.1"
      59. # - "SSE4.2"
      60. # - "SSSE3"
      61. # attributeWhitelist:
      62. # kernel:
      63. # kconfigFile: "/path/to/kconfig"
      64. # configOpts:
      65. # - "NO_HZ"
      66. # - "X86"
      67. # - "DMI"
      68. # pci:
      69. # deviceClassWhitelist:
      70. # - "0200"
      71. # - "03"
      72. # - "12"
      73. # deviceLabelFields:
      74. # - "class"
      75. # - "vendor"
      76. # - "device"
      77. # - "subsystem_vendor"
      78. # - "subsystem_device"
      79. # usb:
      80. # deviceClassWhitelist:
      81. # - "0e"
      82. # - "ef"
      83. # - "fe"
      84. # - "ff"
      85. # deviceLabelFields:
      86. # - "class"
      87. # - "device"
      88. # custom:
      89. # - name: "my.kernel.feature"
      90. # matchOn:
      91. # - loadedKMod: ["example_kmod1", "example_kmod2"]
      92. # - name: "my.pci.feature"
      93. # matchOn:
      94. # - pciId:
      95. # class: ["0200"]
      96. # vendor: ["15b3"]
      97. # device: ["1014", "1017"]
      98. # - pciId :
      99. # vendor: ["8086"]
      100. # device: ["1000", "1100"]
      101. # - name: "my.usb.feature"
      102. # matchOn:
      103. # - usbId:
      104. # class: ["ff"]
      105. # vendor: ["03e7"]
      106. # device: ["2485"]
      107. # - usbId:
      108. # class: ["fe"]
      109. # vendor: ["1a6e"]
      110. # device: ["089a"]
      111. # - name: "my.combined.feature"
      112. # matchOn:
      113. # - pciId:
      114. # vendor: ["15b3"]
      115. # device: ["1014", "1017"]
      116. # loadedKMod : ["vendor_kmod1", "vendor_kmod2"]
      117. customConfig:
      118. configData: |
      119. # - name: "more.kernel.features"
      120. # matchOn:
      121. # - loadedKMod: ["example_kmod3"]
      122. # - name: "more.features.by.nodename"
      123. # value: customValue
      124. # matchOn:
      125. # - nodename: ["special-.*-node-.*"]
    1. Create the NodeFeatureDiscovery CR instance by running the following command:

      1. $ oc create -f NodeFeatureDiscovery.yaml

    Verification

    • To verify that the instance is created, run:

      1. $ oc get pods

      Example output

      1. NAME READY STATUS RESTARTS AGE
      2. nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 11m
      3. nfd-master-hcn64 1/1 Running 0 60s
      4. nfd-master-lnnxx 1/1 Running 0 60s
      5. nfd-master-mp6hr 1/1 Running 0 60s
      6. nfd-worker-vgcz9 1/1 Running 0 60s
      7. nfd-worker-xqbws 1/1 Running 0 60s

      A successful deployment shows a Running status.

    Create a NodeFeatureDiscovery CR using the web console

    Procedure

    1. Navigate to the OperatorsInstalled Operators page.

    2. Find Node Feature Discovery and see a box under Provided APIs.

    3. Click Create instance.

    4. Edit the values of the NodeFeatureDiscovery CR.

    5. Click Create.

    The core section contains common configuration settings that are not specific to any particular feature source.

    core.sleepInterval

    core.sleepInterval specifies the interval between consecutive passes of feature detection or re-detection, and thus also the interval between node re-labeling. A non-positive value implies infinite sleep interval; no re-detection or re-labeling is done.

    This value is overridden by the deprecated --sleep-interval command line flag, if specified.

    Example usage

    1. core:

    The default value is 60s.

    core.sources

    core.sources specifies the list of enabled feature sources. A special value all enables all feature sources.

    This value is overridden by the deprecated --sources command line flag, if specified.

    Default: [all]

    Example usage

    1. core:
    2. sources:
    3. - system
    4. - custom

    core.labelWhiteList

    core.labelWhiteList specifies a regular expression for filtering feature labels based on the label name. Non-matching labels are not published.

    The regular expression is only matched against the basename part of the label, the part of the name after ‘/‘. The label prefix, or namespace, is omitted.

    This value is overridden by the deprecated --label-whitelist command line flag, if specified.

    Default: null

    Example usage

    1. core:
    2. labelWhiteList: '^cpu-cpuid'

    core.noPublish

    Setting core.noPublish to true disables all communication with the nfd-master. It is effectively a dry run flag; nfd-worker runs feature detection normally, but no labeling requests are sent to nfd-master.

    This value is overridden by the --no-publish command line flag, if specified.

    Example:

    Example usage

    The default value is false.

    core.klog

    The following options specify the logger configuration, most of which can be dynamically adjusted at run-time.

    The logger options can also be specified using command line flags, which take precedence over any corresponding config file options.

    core.klog.addDirHeader

    If set to true, core.klog.addDirHeader adds the file directory to the header of the log messages.

    Default: false

    Run-time configurable: yes

    core.klog.alsologtostderr

    Log to standard error as well as files.

    Default: false

    Run-time configurable: yes

    core.klog.logBacktraceAt

    When logging hits line file:N, emit a stack trace.

    Default: empty

    Run-time configurable: yes

    core.klog.logDir

    If non-empty, write log files in this directory.

    Default: empty

    Run-time configurable: no

    core.klog.logFile

    If not empty, use this log file.

    Default: empty

    Run-time configurable: no

    core.klog.logFileMaxSize

    core.klog.logFileMaxSize defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited.

    Default: 1800

    core.klog.logtostderr

    Log to standard error instead of files

    Default: true

    Run-time configurable: yes

    core.klog.skipHeaders

    If core.klog.skipHeaders is set to true, avoid header prefixes in the log messages.

    Default: false

    Run-time configurable: yes

    core.klog.skipLogHeaders

    If core.klog.skipLogHeaders is set to true, avoid headers when opening log files.

    Default: false

    Run-time configurable: no

    core.klog.stderrthreshold

    Logs at or above this threshold go to stderr.

    Default: 2

    Run-time configurable: yes

    core.klog.v

    core.klog.v is the number for the log level verbosity.

    Default: 0

    Run-time configurable: yes

    core.klog.vmodule

    core.klog.vmodule is a comma-separated list of pattern=N settings for file-filtered logging.

    Default: empty

    Run-time configurable: yes

    The sources section contains feature source specific configuration parameters.

    sources.cpu.cpuid.attributeBlacklist

    Prevent publishing cpuid features listed in this option.

    This value is overridden by sources.cpu.cpuid.attributeWhitelist, if specified.

    Default: [BMI1, BMI2, CLMUL, CMOV, CX16, ERMS, F16C, HTT, LZCNT, MMX, MMXEXT, NX, POPCNT, RDRAND, RDSEED, RDTSCP, SGX, SGXLC, SSE, SSE2, SSE3, SSE4.1, SSE4.2, SSSE3]

    Example usage

    1. sources:
    2. cpu:
    3. cpuid:
    4. attributeBlacklist: [MMX, MMXEXT]

    sources.cpu.cpuid.attributeWhitelist

    Only publish the cpuid features listed in this option.

    sources.cpu.cpuid.attributeWhitelist takes precedence over sources.cpu.cpuid.attributeBlacklist.

    Default: empty

    Example usage

    1. sources:
    2. cpu:
    3. cpuid:
    4. attributeWhitelist: [AVX512BW, AVX512CD, AVX512DQ, AVX512F, AVX512VL]

    sources.kernel.kconfigFile

    sources.kernel.kconfigFile is the path of the kernel config file. If empty, NFD runs a search in the well-known standard locations.

    Default: empty

    Example usage

    1. sources:
    2. kernel:
    3. kconfigFile: "/path/to/kconfig"

    sources.kernel.configOpts

    sources.kernel.configOpts represents kernel configuration options to publish as feature labels.

    Default: [NO_HZ, NO_HZ_IDLE, NO_HZ_FULL, PREEMPT]

    Example usage

    1. sources:
    2. kernel:
    3. configOpts: [NO_HZ, X86, DMI]

    soures.pci.deviceClassWhitelist

    soures.pci.deviceClassWhitelist is a list of PCI device class IDs for which to publish a label. It can be specified as a main class only (for example, 03) or full class-subclass combination (for example 0300). The former implies that all subclasses are accepted. The format of the labels can be further configured with deviceLabelFields.

    Default: ["03", "0b40", "12"]

    Example usage

    1. sources:
    2. pci:
    3. deviceClassWhitelist: ["0200", "03"]

    soures.pci.deviceLabelFields

    soures.pci.deviceLabelFields is the set of PCI ID fields to use when constructing the name of the feature label. Valid fields are class, vendor, device, subsystem_vendor and subsystem_device.

    Default: [class, vendor]

    Example usage

    1. sources:
    2. pci:
    3. deviceLabelFields: [class, vendor, device]

    With the example config above, NFD would publish labels such as feature.node.kubernetes.io/pci-<class-id>_<vendor-id>_<device-id>.present=true

    soures.usb.deviceClassWhitelist

    soures.usb.deviceClassWhitelist is a list of USB IDs for which to publish a feature label. The format of the labels can be further configured with deviceLabelFields.

    Default: ["0e", "ef", "fe", "ff"]

    Example usage

    1. sources:
    2. usb:
    3. deviceClassWhitelist: ["ef", "ff"]

    soures.usb.deviceLabelFields

    soures.usb.deviceLabelFields is the set of USB ID fields from which to compose the name of the feature label. Valid fields are class, vendor, and device.

    Default: [class, vendor, device]

    Example usage

    1. sources:
    2. pci:

    With the example config above, NFD would publish labels like: feature.node.kubernetes.io/usb-<class-id>_<vendor-id>.present=true.

    soures.custom

    is the list of rules to process in the custom feature source to create user-specific labels.

    Default: empty