Managing metrics

In OKD 4.13, cluster components are monitored by scraping metrics exposed through service endpoints. You can also configure metrics collection for user-defined projects.

You can define the metrics that you want to provide for your own workloads by using Prometheus client libraries at the application level.

In OKD, metrics are exposed through an HTTP service endpoint under the canonical name. You can list all available metrics for a service by running a curl query against http://<endpoint>/metrics. For instance, you can expose a route to the prometheus-example-app example service and then run the following to view all of its available metrics:

Example output

# HELP http_requests_total Count of all HTTP requests
# TYPE http_requests_total counter
http_requests_total{code="200",method="get"} 4
http_requests_total{code="404",method="get"} 2
# HELP version Version information about this binary
# TYPE version gauge
version{version="v0.1.0"} 1

Additional resources

You can create a ServiceMonitor resource to scrape metrics from a service endpoint in a user-defined project. This assumes that your application uses a Prometheus client library to expose metrics to the /metrics canonical name.

This section describes how to deploy a sample service in a user-defined project and then create a ServiceMonitor resource that defines how that service should be monitored.

To test monitoring of a service in a user-defined project, you can deploy a sample service.

Procedure

Create a YAML file for the service configuration. In this example, it is called prometheus-example-app.yaml.

Add the following deployment and service configuration details to the file:

apiVersion: v1
kind: Namespace
metadata:
  name: ns1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus-example-app
  name: prometheus-example-app
  namespace: ns1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-example-app
    metadata:
      labels:
        app: prometheus-example-app
      containers:
      - image: ghcr.io/rhobs/prometheus-example-app:0.4.1
        imagePullPolicy: IfNotPresent
        name: prometheus-example-app
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus-example-app
  name: prometheus-example-app
  namespace: ns1
spec:
  ports:
  - port: 8080
    protocol: TCP
    targetPort: 8080
    name: web
  selector:
    app: prometheus-example-app
  type: ClusterIP

This configuration deploys a service named prometheus-example-app in the user-defined ns1 project. This service exposes the custom version metric.

Apply the configuration to the cluster:
```
$ oc apply -f prometheus-example-app.yaml
```
It takes some time to deploy the service.

You can check that the pod is running:

NAME                                      READY     STATUS    RESTARTS   AGE
prometheus-example-app-7857545cb7-sbgwq   1/1       Running   0          81m

Specifying how a service is monitored

To use the metrics exposed by your service, you must configure OKD monitoring to scrape metrics from the /metrics endpoint. You can do this using a ServiceMonitor custom resource definition (CRD) that specifies how a service should be monitored, or a PodMonitor CRD that specifies how a pod should be monitored. The former requires a Service object, while the latter does not, allowing Prometheus to directly scrape metrics from the metrics endpoint exposed by a pod.

This procedure shows you how to create a ServiceMonitor resource for a service in a user-defined project.

Prerequisites

You have access to the cluster as a user with the cluster-admin role or the monitoring-edit role.
You have enabled monitoring for user-defined projects.
For this example, you have deployed the prometheus-example-app sample service in the project.

Procedure

Add the following ServiceMonitor resource configuration details:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: prometheus-example-monitor
  name: prometheus-example-monitor
  namespace: ns1
spec:
  endpoints:
  - interval: 30s
    port: web
    scheme: http
  selector:
    matchLabels:
      app: prometheus-example-app

This defines a ServiceMonitor resource that scrapes the metrics exposed by the prometheus-example-app sample service, which includes the version metric.

Apply the configuration to the cluster:
```
$ oc apply -f example-app-service-monitor.yaml
```
It takes some time to deploy the ServiceMonitor resource.

You can check that the ServiceMonitor resource is running:

NAME                         AGE
prometheus-example-monitor   81m

Additional resources

As a cluster administrator or as a user with view permissions for all projects, you can view a list of metrics available in a cluster and output the list in JSON format.

Prerequisites

You are a cluster administrator, or you have access to the cluster as a user with the cluster-monitoring-view role.
You have installed the OKD CLI (oc).
You have obtained the OKD API route for Thanos Querier.
You are able to get a bearer token by using the oc whoami -t command.

Procedure

If you have not obtained the OKD API route for Thanos Querier, run the following command:

$ oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}'

Retrieve a list of metrics in JSON format from the Thanos Querier API route by running the following command. This command uses oc to authenticate with a bearer token.