Monitoring - Setting up Prometheus and Grafana to monitor Longhorn - 《Longhorn v1.3.0 Documentation》

This document is a quick guide to setting up the monitor for Longhorn.

Longhorn natively exposes metrics in on a REST endpoint .

You can use any collecting tools such as Prometheus, , Telegraf to scrape these metrics then visualize the collected data by tools such as .

See Longhorn Metrics for Monitoring for available metrics.

The monitoring system uses Prometheus for collecting data and alerting, and Grafana for visualizing/dashboarding the collected data.

Prometheus server which scrapes and stores time-series data from Longhorn metrics endpoints. The Prometheus is also responsible for generating alerts based on configured rules and collected data. Prometheus servers then send alerts to an Alertmanager.
AlertManager then manages those alerts, including silencing, inhibition, aggregation, and sending out notifications via methods such as email, on-call notification systems, and chat platforms.
Grafana which queries Prometheus server for data and draws a dashboard for visualization.

The below picture describes the detailed architecture of the monitoring system.

There are 2 unmentioned components in the above picture:

Longhorn Backend service is a service pointing to the set of Longhorn manager pods. Longhorn’s metrics are exposed in Longhorn manager pods at the endpoint http://LONGHORN_MANAGER_IP:PORT/metrics.
makes running Prometheus on top of Kubernetes very easy. The operator watches 3 custom resources: ServiceMonitor, Prometheus ,and AlertManager. When you create those custom resources, Prometheus Operator deploys and manages the Prometheus server, AlertManager with the user-specified configurations.

Installation

This document uses the default namespace for the monitoring system. To install on a different namespace, change the field namespace: <OTHER_NAMESPACE> in manifests.

Follow instructions in .

Create a ServiceMonitor for Longhorn manager.

Longhorn ServiceMonitor is included in the Prometheus custom resource so that the Prometheus server can discover all Longhorn manager pods and their endpoints.

Create a highly available Alertmanager deployment with 3 instances.

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: longhorn
  namespace: default
spec:
  replicas: 3

The Alertmanager instances will not start unless a valid configuration is given. See Prometheus - Configuration for more explanation.

global:
  resolve_timeout: 5m
route:
  group_by: [alertname]
  receiver: email_and_slack
receivers:
- name: email_and_slack
  email_configs:
  - to: <the email address to send notifications to>
    from: <the sender address>
    smarthost: <the SMTP host through which emails are sent>
    # SMTP authentication information.
    auth_username: <the username>
    auth_identity: <the identity>
    auth_password: <the password>
    headers:
      subject: 'Longhorn-Alert'
    text: |-
      {{ range .Alerts }}
        *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
        *Description:* {{ .Annotations.description }}
        *Details:*
        {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
        {{ end }}
      {{ end }}
  slack_configs:
  - api_url: <the Slack webhook URL>
    channel: <the channel or user to send notifications to>
    text: |-
      {{ range .Alerts }}
        *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
        *Description:* {{ .Annotations.description }}
        *Details:*
        {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
        {{ end }}
      {{ end }}

Save the above Alertmanager config in a file called alertmanager.yaml and create a secret from it using kubectl.

Alertmanager instances require the secret resource naming to follow the format alertmanager-<ALERTMANAGER_NAME>. In the previous step, the name of the Alertmanager is longhorn, so the secret name must be alertmanager-longhorn

$ kubectl create secret generic alertmanager-longhorn --from-file=alertmanager.yaml -n default

To be able to view the web UI of the Alertmanager, expose it through a Service. A simple way to do this is to use a Service of type NodePort.

apiVersion: v1
kind: Service
metadata:
  name: alertmanager-longhorn
  namespace: default
spec:
  type: NodePort
  ports:
  - name: web
    nodePort: 30903
    port: 9093
    protocol: TCP
    targetPort: web
    alertmanager: longhorn

After creating the above service, you can access the web UI of Alertmanager via a Node’s IP and the port 30903.

Create PrometheusRule custom resource to define alert conditions. See more examples about Longhorn alert rules at .

See Prometheus - Alerting rules for more information.

If authorization is activated, Create a ClusterRole and ClusterRoleBinding for the Prometheus Pods.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
  namespace: default
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

Create a Prometheus custom resource. Notice that we select the Longhorn service monitor and Longhorn rules in the spec.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: longhorn
  namespace: default
spec:
  replicas: 2
  serviceAccountName: prometheus
  alerting:
    alertmanagers:
      - namespace: default
        name: alertmanager-longhorn
        port: web
  serviceMonitorSelector:
    matchLabels:
      name: longhorn-prometheus-servicemonitor
  ruleSelector:
    matchLabels:
      prometheus: longhorn
      role: alert-rules

After creating the above service, you can access the web UI of the Prometheus server via a Node’s IP and the port 30904.

Create Grafana datasource ConfigMap.

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
  namespace: default
data:
  prometheus.yaml: |-
    {
        "apiVersion": 1,
        "datasources": [
                "editable": true,
                "name": "prometheus-longhorn",
                "orgId": 1,
                "type": "prometheus",
                "url": "http://prometheus-longhorn.default.svc:9090",
                "version": 1
            }
        ]
    }

Create Grafana Deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: default
  labels:
    app: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      name: grafana
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:7.1.5
        ports:
        - name: grafana
          containerPort: 3000
        resources:
          limits:
            memory: "500Mi"
            cpu: "300m"
          requests:
            memory: "500Mi"
            cpu: "200m"
        volumeMounts:
          - mountPath: /var/lib/grafana
            name: grafana-storage
          - mountPath: /etc/grafana/provisioning/datasources
            name: grafana-datasources
            readOnly: false
      volumes:
        - name: grafana-storage
          emptyDir: {}
        - name: grafana-datasources
          configMap:
              defaultMode: 420
              name: grafana-datasources

Create Grafana Service.

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: default
spec:
  selector:
    app: grafana
  type: ClusterIP
  ports:
    - port: 3000
      targetPort: 3000

Expose Grafana on NodePort 32000.

kubectl -n default patch svc grafana --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"replace","path":"/spec/ports/0/nodePort","value":32000}]'

Access the Grafana dashboard using any node IP on port .
Setup Longhorn dashboard.

Once inside Grafana, import the prebuilt Longhorn example dashboard.

You should see the following dashboard at successful setup: