Requesting CRI-O and Kubelet profiling data by using the Node Observability Operator

    The following workflow outlines on how to query the profiling data using the Node Observability Operator:

    1. Install the Node Observability Operator in the OKD cluster.

    2. Create a NodeObservability custom resource to enable the CRI-O profiling on the worker nodes of your choice.

    3. Run the profiling query to generate the profiling data.

    The Node Observability Operator is not installed in OKD by default. You can install the Node Observability Operator by using the OKD CLI or the web console.

    You can install the Node Observability Operator by using the OpenShift CLI (oc).

    Prerequisites

    • You have installed the OpenShift CLI (oc).

    • You have access to the cluster with privileges.

    Procedure

    1. Confirm that the Node Observability Operator is available by running the following command:

      Example output

      1. NAME CATALOG AGE
      2. node-observability-operator Red Hat Operators 9h
    2. Create the node-observability-operator namespace by running the following command:

      1. $ oc new-project node-observability-operator
    3. Create an OperatorGroup object YAML file:

      1. cat <<EOF | oc apply -f -
      2. apiVersion: operators.coreos.com/v1
      3. kind: OperatorGroup
      4. metadata:
      5. name: node-observability-operator
      6. namespace: node-observability-operator
      7. spec:
      8. targetNamespaces: []
      9. EOF
    4. Create a Subscription object YAML file to subscribe a namespace to an Operator:

      1. cat <<EOF | oc apply -f -
      2. apiVersion: operators.coreos.com/v1alpha1
      3. kind: Subscription
      4. metadata:
      5. name: node-observability-operator
      6. namespace: node-observability-operator
      7. spec:
      8. channel: alpha
      9. name: node-observability-operator
      10. source: redhat-operators
      11. EOF

    Verification

    1. View the install plan name by running the following command:

      1. $ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'

      Example output

      1. install-dt54w
    2. Verify the install plan status by running the following command:

      <install_plan_name> is the install plan name that you obtained from the output of the previous command.

      Example output

      1. COMPLETE
      1. $ oc get deploy -n node-observability-operator

      Example output

      1. NAME READY UP-TO-DATE AVAILABLE AGE

    Installing the Node Observability Operator using the web console

    You can install the Node Observability Operator from the OKD web console.

    Prerequisites

    • You have access to the cluster with cluster-admin privileges.

    • You have access to the OKD web console.

    Procedure

    1. Log in to the OKD web console.

    2. In the Administrator’s navigation panel, expand OperatorsOperatorHub.

    3. In the All items field, enter Node Observability Operator and select the Node Observability Operator tile.

    4. Click Install.

    5. On the Install Operator page, configure the following settings:

      1. In the Update channel area, click alpha.

      2. In the Installation mode area, click A specific namespace on the cluster.

      3. From the Installed Namespace list, select node-observability-operator from the list.

      4. In the Update approval area, select Automatic.

      5. Click Install.

    Verification

    1. In the Administrator’s navigation panel, expand OperatorsInstalled Operators.

    2. Verify that the Node Observability Operator is listed in the Operators list.

    You must create and run the NodeObservability custom resource (CR) before you run the profiling query. When you run the NodeObservability CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes matching the nodeSelector.

    The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRI-O to run the pprof request. Similarly, the kubelet-serving-ca certificate chain is mounted on the agent pod, which allows secure communication between the agent and node’s kubelet endpoint.

    • You have installed the Node Observability Operator.

    • You have installed the OpenShift CLI (oc).

    • You have access to the cluster with cluster-admin privileges.

    Procedure

    1. Log in to the OKD CLI by running the following command:

      1. $ oc login -u kubeadmin https://<HOSTNAME>:6443
    2. Switch back to the node-observability-operator namespace by running the following command:

      1. $ oc project node-observability-operator
    3. Create a CR file named nodeobservability.yaml that contains the following text:

      1. apiVersion: nodeobservability.olm.openshift.io/v1alpha2
      2. kind: NodeObservability
      3. metadata:
      4. name: cluster (1)
      5. spec:
      6. kubernetes.io/hostname: <node_hostname> (2)
      7. type: crio-kubelet
    4. Run the NodeObservability CR:

      Example output

      1. nodeobservability.olm.openshift.io/cluster created
    5. Review the status of the NodeObservability CR by running the following command:

      1. $ oc get nob/cluster -o yaml | yq '.status.conditions'

      Example output

      1. conditions:
      2. conditions:
      3. - lastTransitionTime: "2022-07-05T07:33:54Z"
      4. message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig
      5. ready: true'
      6. reason: Ready
      7. type: Ready

      NodeObservability CR run is completed when the reason is Ready and the status is True.

    To run the profiling query, you must create a NodeObservabilityRun resource. The profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. After the profiling query is complete, you must retrieve the profiling data inside the container file system /run/node-observability directory. The lifetime of data is bound to the agent pod through the emptyDir volume, so you can access the profiling data while the agent pod is in the running status.

    Prerequisites

    • You have installed the Node Observability Operator.

    • You have created the NodeObservability custom resource (CR).

    • You have access to the cluster with cluster-admin privileges.

    Procedure

    1. Create a NodeObservabilityRun resource file named nodeobservabilityrun.yaml that contains the following text:

      1. apiVersion: nodeobservability.olm.openshift.io/v1alpha2
      2. kind: NodeObservabilityRun
      3. metadata:
      4. name: nodeobservabilityrun
      5. spec:
      6. nodeObservabilityRef:
      7. name: cluster
    2. Trigger the profiling query by running the NodeObservabilityRun resource:

      1. $ oc apply -f nodeobservabilityrun.yaml
    3. Review the status of the NodeObservabilityRun by running the following command:

      1. $ oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq '.status.conditions'

      Example output

    4. Retrieve the profiling data from the container’s /run/node-observability path by running the following bash script:

      1. for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do
      2. echo "agent ${a}"
      3. mkdir -p "/tmp/${a}"
      4. for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do
      5. f="$(basename ${p})"
      6. echo "copying ${f} to /tmp/${a}/${f}"
      7. oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}"
      8. done