Observability — Monitoring FAQ

    The KubeSphere monitoring engine is powered by Prometheus. For debugging purposes, you may want to access the built-in Prometheus service through a NodePort. Run the following command to change the service type to :

    Note

    To access the Prometheus console, you may need to open relevant ports and configure port forwarding rules depending on your environment.

    Host port 9100 conflict caused by the node exporter

    If you have processes occupying the host port 9100, the node exporter in kubesphere-monitoring-system will be crashing. To resolve the conflict, you need to either terminate the process or assign another available port to the node exporter.

    If you have deployed Prometheus Operator on your own, make sure it is removed before you install KubeSphere. Otherwise, there may be conflicts that the built-in Prometheus Operator of KubeSphere selects duplicate ServiceMonitor objects.

    How to change the monitoring data retention period

    Run the following command to edit the maximum retention period. Navigate to the field retention and set a desired retention period (7d by default).

      First, make sure the flag --bind-address is set to (default) rather than 127.0.0.1. Prometheus may need to access theses components from other hosts.

      Second, check the presence of endpoint objects for kube-scheduler and kube-controller-manager. If they are missing, create them manually by creating services and selecting target Pods.

      No monitoring data for the last few minutes

      Check your network plugin and make sure that there is no IP Pool overlap between your hosts and Pod network CIDR. It is strongly recommended that you install Kubernetes with KubeKey.

      Chinese readers may refer to in the KubeSphere China forum for more information.

      Prometheus produces an error log: opening storage failed, no such file or directory

      If the Prometheus Pod in kubesphere-monitoring-system is crashing and produces the following error log, your Prometheus data may be corrupt and needs manual deletion to recover.

      Exec into the Prometheus Pod (if possible), and remove the block directory /prometheus/01EM0016F8FB33J63RNHFMHK3: