Configuring built-in monitoring with Prometheus

    is an open-source systems monitoring and alerting toolkit. The Prometheus Operator creates, configures, and manages Prometheus clusters running on Kubernetes-based clusters, such as OKD.

    Helper functions exist in the Operator SDK by default to automatically set up metrics in any generated Go-based Operator for use on clusters where the Prometheus Operator is deployed.

    As an Operator author, you can publish custom metrics by using the global Prometheus registry from the library.


    • Go-based Operator generated using the Operator SDK

    • Prometheus Operator, which is deployed by default on OKD clusters


    1. In your Operator SDK project, uncomment the following line in the config/default/kustomization.yaml file:

    2. Create a custom controller class to publish additional metrics from the Operator. The following example declares the widgets and widgetFailures collectors as global variables, and then registers them with the init() function in the controller’s package:

      controllers/memcached_controller_test_metrics.go file

      1. package controllers
      2. import (
      3. ""
      4. ""
      5. )
      6. var (
      7. widgets = prometheus.NewCounter(
      8. prometheus.CounterOpts{
      9. Name: "widgets_total",
      10. Help: "Number of widgets processed",
      11. },
      12. )
      13. widgetFailures = prometheus.NewCounter(
      14. prometheus.CounterOpts{
      15. Name: "widget_failures_total",
      16. Help: "Number of failed widgets",
      17. },
      18. )
      19. )
      20. func init() {
      21. // Register custom metrics with the global prometheus registry
      22. metrics.Registry.MustRegister(widgets, widgetFailures)
      23. }
    3. Record to these collectors from any part of the reconcile loop in the main controller class, which determines the business logic for the metric:

      controllers/memcached_controller.go file

      1. func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
      2. ...
      3. ...
      4. // Add metrics
      5. widgets.Inc()
      6. widgetFailures.Inc()
      7. return ctrl.Result{}, nil
      8. }
    4. Build and push the Operator:

      1. $ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
    5. Deploy the Operator:

      1. $ make deploy IMG=<registry>/<user>/<image_name>:<tag>
    6. Create role and role binding definitions to allow the service monitor of the Operator to be scraped by the Prometheus instance of the OKD cluster.

      Roles must be assigned so that service accounts have the permissions to scrape the metrics of the namespace:

      1. apiVersion:
      2. kind: ClusterRole
      3. metadata:
      4. name: prometheus-k8s-role
      5. namespace: <operator_namespace>
      6. rules:
      7. - apiGroups:
      8. - ""
      9. resources:
      10. - endpoints
      11. - pods
      12. - services
      13. - nodes
      14. - secrets
      15. verbs:
      16. - get
      17. - list

      config/prometheus/rolebinding.yaml role binding

      1. apiVersion:
      2. kind: ClusterRoleBinding
      3. metadata:
      4. name: prometheus-k8s-rolebinding
      5. namespace: memcached-operator-system
      6. roleRef:
      7. apiGroup:
      8. kind: ClusterRole
      9. name: prometheus-k8s-role
      10. subjects:
      11. - kind: ServiceAccount
      12. name: prometheus-k8s
      13. namespace: openshift-monitoring
    7. Apply the roles and role bindings for the deployed Operator:

      1. $ oc apply -f config/prometheus/role.yaml
      1. $ oc apply -f config/prometheus/rolebinding.yaml
    8. Set the labels for the namespace that you want to scrape, which enables OpenShift cluster monitoring for that namespace:


    • Query and view the metrics in the OKD web console. You can use the names that were set in the custom controller class, for example widgets_total and widget_failures_total.

    As an Operator author creating Ansible-based Operators, you can use the Operator SDK’s osdk_metrics module to expose custom Operator and Operand metrics, emit events, and support logging.


    • Ansible-based Operator generated using the Operator SDK

    • Prometheus Operator, which is deployed by default on OKD clusters


    1. Generate an Ansible-based Operator. This example uses a domain:

      1. $ operator-sdk init \
      2. --plugins=ansible \
    2. Create a metrics API. This example uses a kind named Testmetrics:

      1. $ operator-sdk create api \
      2. --group metrics \
      3. --version v1 \
      4. --kind Testmetrics \
      5. --generate-role
    3. Edit the roles/testmetrics/tasks/main.yml file and use the osdk_metrics module to create custom metrics for your Operator project:

      Example roles/testmetrics/tasks/main.yml file

      1. ---
      2. # tasks file for Memcached
      3. - name: start k8sstatus
      4. k8s:
      5. definition:
      6. kind: Deployment
      7. apiVersion: apps/v1
      8. metadata:
      9. name: '{{ }}-memcached'
      10. namespace: '{{ ansible_operator_meta.namespace }}'
      11. spec:
      12. replicas: "{{size}}"
      13. selector:
      14. matchLabels:
      15. app: memcached
      16. template:
      17. metadata:
      18. labels:
      19. app: memcached
      20. spec:
      21. containers:
      22. - name: memcached
      23. command:
      24. - memcached
      25. - -m=64
      26. - -o
      27. - modern
      28. - -v
      29. ports:
      30. - containerPort: 11211
      31. - osdk_metric:
      32. name: my_thing_counter
      33. counter: {}
      34. - osdk_metric:
      35. name: my_counter_metric
      36. description: Add 3.14 to the counter
      37. counter:
      38. increment: yes
      39. - osdk_metric:
      40. name: my_gauge_metric
      41. description: Create my gauge and set it to 2.
      42. gauge:
      43. set: 2
      44. - osdk_metric:
      45. name: my_histogram_metric
      46. description: Observe my histogram
      47. histogram:
      48. observe: 2
      49. - osdk_metric:
      50. name: my_summary_metric
      51. description: Observe my summary
      52. summary:
      53. observe: 2


    1. Run your Operator on a cluster. For example, to use the “run as a deployment” method:

      1. Build the Operator image and push it to a registry:

        1. $ make docker-build docker-push IMG=<registry>/<user>/<image_name>:<tag>
        1. $ make install
      2. Deploy the Operator:

        1. $ make deploy IMG=<registry>/<user>/<image_name>:<tag>
    2. Create a Testmetrics custom resource (CR):

      1. Define the CR spec:

        Example config/samples/metrics_v1_testmetrics.yaml file

        1. apiVersion:
        2. kind: Testmetrics
        3. metadata:
        4. name: testmetrics-sample
        5. spec:
        6. size: 1
      2. Create the object:

        1. $ oc create -f config/samples/metrics_v1_testmetrics.yaml
    3. Get the pod details:

      Example output

      2. ansiblemetrics-controller-manager-<id> 2/2 Running 0 149m
      3. testmetrics-sample-memcached-<id> 1/1 Running 0 147m
    4. Get the endpoint details:

      1. $ oc get ep

      Example output

      2. ansiblemetrics-controller-manager-metrics-service 150m
    5. Request a custom metrics token:

      1. $ token=`oc create token prometheus-k8s -n openshift-monitoring`
    6. Check the metrics values:

      1. Check the my_counter_metric value:

        1. $ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
        2. tion: Bearer $token" '' | grep my_counter

        Example output

        1. HELP my_counter_metric Add 3.14 to the counter
        2. TYPE my_counter_metric counter
        3. my_counter_metric 2
      2. Check the my_gauge_metric value:

        1. $ oc exec ansiblemetrics-controller-manager-<id> -- curl -k -H "Authoriza
        2. tion: Bearer $token" '' | grep gauge

        Example output

        1. HELP my_gauge_metric Create my gauge and set it to 2.
      3. Check the my_histogram_metric and my_summary_metric values: