Verifying node health

    Prerequisites

    • You have access to the cluster as a user with the cluster-admin role.

    • You have installed the OpenShift CLI (oc).

    Procedure

    • List the name, status, and role for all nodes in the cluster:

    • Summarize CPU and memory usage for each node within the cluster:

      1. $ oc adm top nodes
    • Summarize CPU and memory usage for a specific node:

      1. $ oc adm top node my-node

    You can review cluster node health status, resource consumption statistics, and node logs. Additionally, you can query kubelet status on individual nodes.

    Prerequisites

    • You have installed the OpenShift CLI (oc).

    Procedure

    1. The kubelet is managed using a systemd service on each node. Review the kubelet’s status by querying the kubelet systemd service within a debug pod.

      1. Start a debug pod for a node:

      2. Set /host as the root directory within the debug shell. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to , you can run binaries contained in the host’s executable paths:

        1. # chroot /host
      3. Check whether the kubelet systemd service is active on the node:

        1. # systemctl is-active kubelet
      4. Output a more detailed kubelet.service status summary:

    You can gather journald unit logs and other logs within /var/log on individual cluster nodes.

    • You have access to the cluster as a user with the cluster-admin role.

    • You have installed the OpenShift CLI (oc).

    • You have SSH access to your hosts.

    Procedure

    1. Query kubelet journald unit logs from OKD cluster nodes. The following example queries control plane nodes only:

    2. Collect logs from specific subdirectories under /var/log/ on cluster nodes.

      1. Retrieve a list of logs contained within a /var/log/ subdirectory. The following example lists files in /var/log/openshift-apiserver/ on all control plane nodes:

        1. $ oc adm node-logs --role=master --path=openshift-apiserver
      2. Inspect a specific log within a /var/log/ subdirectory. The following example outputs /var/log/openshift-apiserver/audit.log contents from all control plane nodes:

      3. If the API is not functional, review the logs on each node using SSH instead. The following example tails /var/log/openshift-apiserver/audit.log:

        1. $ ssh core@<master-node>.<cluster_name>.<base_domain> sudo tail -f /var/log/openshift-apiserver/audit.log