This document is a quick guide to setting up the monitor for Longhorn.

Longhorn natively exposes metrics in on a REST endpoint .

You can use any collecting tools such as Prometheus, , Telegraf to scrape these metrics then visualize the collected data by tools such as .

See Longhorn Metrics for Monitoring for available metrics.

The monitoring system uses Prometheus for collecting data and alerting, and Grafana for visualizing/dashboarding the collected data.

  • Prometheus server which scrapes and stores time-series data from Longhorn metrics endpoints. The Prometheus is also responsible for generating alerts based on configured rules and collected data. Prometheus servers then send alerts to an Alertmanager.
  • AlertManager then manages those alerts, including silencing, inhibition, aggregation, and sending out notifications via methods such as email, on-call notification systems, and chat platforms.
  • Grafana which queries Prometheus server for data and draws a dashboard for visualization.

The below picture describes the detailed architecture of the monitoring system.

There are 2 unmentioned components in the above picture:

  • Longhorn Backend service is a service pointing to the set of Longhorn manager pods. Longhorn’s metrics are exposed in Longhorn manager pods at the endpoint http://LONGHORN_MANAGER_IP:PORT/metrics.
  • makes running Prometheus on top of Kubernetes very easy. The operator watches 3 custom resources: ServiceMonitor, Prometheus ,and AlertManager. When you create those custom resources, Prometheus Operator deploys and manages the Prometheus server, AlertManager with the user-specified configurations.

Installation

This document uses the default namespace for the monitoring system. To install on a different namespace, change the field namespace: <OTHER_NAMESPACE> in manifests.

Follow instructions in .

  1. Create a ServiceMonitor for Longhorn manager.

    Longhorn ServiceMonitor is included in the Prometheus custom resource so that the Prometheus server can discover all Longhorn manager pods and their endpoints.

  1. Create a highly available Alertmanager deployment with 3 instances.

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: Alertmanager
    3. metadata:
    4. name: longhorn
    5. namespace: default
    6. spec:
    7. replicas: 3
  2. The Alertmanager instances will not start unless a valid configuration is given. See Prometheus - Configuration for more explanation.

    1. global:
    2. resolve_timeout: 5m
    3. route:
    4. group_by: [alertname]
    5. receiver: email_and_slack
    6. receivers:
    7. - name: email_and_slack
    8. email_configs:
    9. - to: <the email address to send notifications to>
    10. from: <the sender address>
    11. smarthost: <the SMTP host through which emails are sent>
    12. # SMTP authentication information.
    13. auth_username: <the username>
    14. auth_identity: <the identity>
    15. auth_password: <the password>
    16. headers:
    17. subject: 'Longhorn-Alert'
    18. text: |-
    19. {{ range .Alerts }}
    20. *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
    21. *Description:* {{ .Annotations.description }}
    22. *Details:*
    23. {{ range .Labels.SortedPairs }} *{{ .Name }}:* `{{ .Value }}`
    24. {{ end }}
    25. {{ end }}
    26. slack_configs:
    27. - api_url: <the Slack webhook URL>
    28. channel: <the channel or user to send notifications to>
    29. text: |-
    30. {{ range .Alerts }}
    31. *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
    32. *Description:* {{ .Annotations.description }}
    33. *Details:*
    34. {{ range .Labels.SortedPairs }} *{{ .Name }}:* `{{ .Value }}`
    35. {{ end }}
    36. {{ end }}

    Save the above Alertmanager config in a file called alertmanager.yaml and create a secret from it using kubectl.

    Alertmanager instances require the secret resource naming to follow the format alertmanager-<ALERTMANAGER_NAME>. In the previous step, the name of the Alertmanager is longhorn, so the secret name must be alertmanager-longhorn

    1. $ kubectl create secret generic alertmanager-longhorn --from-file=alertmanager.yaml -n default
  3. To be able to view the web UI of the Alertmanager, expose it through a Service. A simple way to do this is to use a Service of type NodePort.

    1. apiVersion: v1
    2. kind: Service
    3. metadata:
    4. name: alertmanager-longhorn
    5. namespace: default
    6. spec:
    7. type: NodePort
    8. ports:
    9. - name: web
    10. nodePort: 30903
    11. port: 9093
    12. protocol: TCP
    13. targetPort: web
    14. alertmanager: longhorn

    After creating the above service, you can access the web UI of Alertmanager via a Node’s IP and the port 30903.

  1. Create PrometheusRule custom resource to define alert conditions. See more examples about Longhorn alert rules at .

    See Prometheus - Alerting rules for more information.

  2. If authorization is activated, Create a ClusterRole and ClusterRoleBinding for the Prometheus Pods.

    1. apiVersion: v1
    2. kind: ServiceAccount
    3. metadata:
    4. name: prometheus
    5. namespace: default
    1. apiVersion: rbac.authorization.k8s.io/v1
    2. kind: ClusterRole
    3. metadata:
    4. name: prometheus
    5. namespace: default
    6. rules:
    7. - apiGroups: [""]
    8. resources:
    9. - nodes
    10. - services
    11. - endpoints
    12. - pods
    13. verbs: ["get", "list", "watch"]
    14. - apiGroups: [""]
    15. resources:
    16. - configmaps
    17. verbs: ["get"]
    18. - nonResourceURLs: ["/metrics"]
    19. verbs: ["get"]
    1. apiVersion: rbac.authorization.k8s.io/v1
    2. kind: ClusterRoleBinding
    3. metadata:
    4. name: prometheus
    5. roleRef:
    6. apiGroup: rbac.authorization.k8s.io
    7. kind: ClusterRole
    8. name: prometheus
    9. subjects:
    10. - kind: ServiceAccount
    11. name: prometheus
    12. namespace: default
  3. Create a Prometheus custom resource. Notice that we select the Longhorn service monitor and Longhorn rules in the spec.

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: Prometheus
    3. metadata:
    4. name: longhorn
    5. namespace: default
    6. spec:
    7. replicas: 2
    8. serviceAccountName: prometheus
    9. alerting:
    10. alertmanagers:
    11. - namespace: default
    12. name: alertmanager-longhorn
    13. port: web
    14. serviceMonitorSelector:
    15. matchLabels:
    16. name: longhorn-prometheus-servicemonitor
    17. ruleSelector:
    18. matchLabels:
    19. prometheus: longhorn
    20. role: alert-rules
  4. After creating the above service, you can access the web UI of the Prometheus server via a Node’s IP and the port 30904.

  1. Create Grafana datasource ConfigMap.

    1. apiVersion: v1
    2. kind: ConfigMap
    3. metadata:
    4. name: grafana-datasources
    5. namespace: default
    6. data:
    7. prometheus.yaml: |-
    8. {
    9. "apiVersion": 1,
    10. "datasources": [
    11. "editable": true,
    12. "name": "prometheus-longhorn",
    13. "orgId": 1,
    14. "type": "prometheus",
    15. "url": "http://prometheus-longhorn.default.svc:9090",
    16. "version": 1
    17. }
    18. ]
    19. }
  2. Create Grafana Deployment.

    1. apiVersion: apps/v1
    2. kind: Deployment
    3. metadata:
    4. name: grafana
    5. namespace: default
    6. labels:
    7. app: grafana
    8. spec:
    9. replicas: 1
    10. selector:
    11. matchLabels:
    12. app: grafana
    13. template:
    14. metadata:
    15. name: grafana
    16. labels:
    17. app: grafana
    18. spec:
    19. containers:
    20. - name: grafana
    21. image: grafana/grafana:7.1.5
    22. ports:
    23. - name: grafana
    24. containerPort: 3000
    25. resources:
    26. limits:
    27. memory: "500Mi"
    28. cpu: "300m"
    29. requests:
    30. memory: "500Mi"
    31. cpu: "200m"
    32. volumeMounts:
    33. - mountPath: /var/lib/grafana
    34. name: grafana-storage
    35. - mountPath: /etc/grafana/provisioning/datasources
    36. name: grafana-datasources
    37. readOnly: false
    38. volumes:
    39. - name: grafana-storage
    40. emptyDir: {}
    41. - name: grafana-datasources
    42. configMap:
    43. defaultMode: 420
    44. name: grafana-datasources
  3. Create Grafana Service.

    1. apiVersion: v1
    2. kind: Service
    3. metadata:
    4. name: grafana
    5. namespace: default
    6. spec:
    7. selector:
    8. app: grafana
    9. type: ClusterIP
    10. ports:
    11. - port: 3000
    12. targetPort: 3000
  4. Expose Grafana on NodePort 32000.

    1. kubectl -n default patch svc grafana --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"replace","path":"/spec/ports/0/nodePort","value":32000}]'
  5. Access the Grafana dashboard using any node IP on port .

  6. Setup Longhorn dashboard.

    Once inside Grafana, import the prebuilt Longhorn example dashboard.

    You should see the following dashboard at successful setup: images