This document is a quick guide to setting up the monitor for Longhorn.
Longhorn natively exposes metrics in on a REST endpoint .
You can use any collecting tools such as Prometheus, , Telegraf to scrape these metrics then visualize the collected data by tools such as .
See Longhorn Metrics for Monitoring for available metrics.
The monitoring system uses Prometheus
for collecting data and alerting, and Grafana
for visualizing/dashboarding the collected data.
- Prometheus server which scrapes and stores time-series data from Longhorn metrics endpoints. The Prometheus is also responsible for generating alerts based on configured rules and collected data. Prometheus servers then send alerts to an Alertmanager.
- AlertManager then manages those alerts, including silencing, inhibition, aggregation, and sending out notifications via methods such as email, on-call notification systems, and chat platforms.
- Grafana which queries Prometheus server for data and draws a dashboard for visualization.
The below picture describes the detailed architecture of the monitoring system.
There are 2 unmentioned components in the above picture:
- Longhorn Backend service is a service pointing to the set of Longhorn manager pods. Longhorn’s metrics are exposed in Longhorn manager pods at the endpoint
http://LONGHORN_MANAGER_IP:PORT/metrics
. - makes running Prometheus on top of Kubernetes very easy. The operator watches 3 custom resources: ServiceMonitor, Prometheus ,and AlertManager. When you create those custom resources, Prometheus Operator deploys and manages the Prometheus server, AlertManager with the user-specified configurations.
Installation
This document uses the default
namespace for the monitoring system. To install on a different namespace, change the field namespace: <OTHER_NAMESPACE>
in manifests.
Follow instructions in .
Create a ServiceMonitor for Longhorn manager.
Longhorn ServiceMonitor is included in the Prometheus custom resource so that the Prometheus server can discover all Longhorn manager pods and their endpoints.
Create a highly available Alertmanager deployment with 3 instances.
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: longhorn
namespace: default
spec:
replicas: 3
The Alertmanager instances will not start unless a valid configuration is given. See Prometheus - Configuration for more explanation.
global:
resolve_timeout: 5m
route:
group_by: [alertname]
receiver: email_and_slack
receivers:
- name: email_and_slack
email_configs:
- to: <the email address to send notifications to>
from: <the sender address>
smarthost: <the SMTP host through which emails are sent>
# SMTP authentication information.
auth_username: <the username>
auth_identity: <the identity>
auth_password: <the password>
headers:
subject: 'Longhorn-Alert'
text: |-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
slack_configs:
- api_url: <the Slack webhook URL>
channel: <the channel or user to send notifications to>
text: |-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
Save the above Alertmanager config in a file called
alertmanager.yaml
and create a secret from it using kubectl.Alertmanager instances require the secret resource naming to follow the format
alertmanager-<ALERTMANAGER_NAME>
. In the previous step, the name of the Alertmanager islonghorn
, so the secret name must bealertmanager-longhorn
$ kubectl create secret generic alertmanager-longhorn --from-file=alertmanager.yaml -n default
To be able to view the web UI of the Alertmanager, expose it through a Service. A simple way to do this is to use a Service of type NodePort.
apiVersion: v1
kind: Service
metadata:
name: alertmanager-longhorn
namespace: default
spec:
type: NodePort
ports:
- name: web
nodePort: 30903
port: 9093
protocol: TCP
targetPort: web
alertmanager: longhorn
After creating the above service, you can access the web UI of Alertmanager via a Node’s IP and the port 30903.
Create PrometheusRule custom resource to define alert conditions. See more examples about Longhorn alert rules at .
See Prometheus - Alerting rules for more information.
If authorization is activated, Create a ClusterRole and ClusterRoleBinding for the Prometheus Pods.
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
namespace: default
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
Create a Prometheus custom resource. Notice that we select the Longhorn service monitor and Longhorn rules in the spec.
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: longhorn
namespace: default
spec:
replicas: 2
serviceAccountName: prometheus
alerting:
alertmanagers:
- namespace: default
name: alertmanager-longhorn
port: web
serviceMonitorSelector:
matchLabels:
name: longhorn-prometheus-servicemonitor
ruleSelector:
matchLabels:
prometheus: longhorn
role: alert-rules
-
After creating the above service, you can access the web UI of the Prometheus server via a Node’s IP and the port 30904.
Create Grafana datasource ConfigMap.
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: default
data:
prometheus.yaml: |-
{
"apiVersion": 1,
"datasources": [
"editable": true,
"name": "prometheus-longhorn",
"orgId": 1,
"type": "prometheus",
"url": "http://prometheus-longhorn.default.svc:9090",
"version": 1
}
]
}
Create Grafana Deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: default
labels:
app: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:7.1.5
ports:
- name: grafana
containerPort: 3000
resources:
limits:
memory: "500Mi"
cpu: "300m"
requests:
memory: "500Mi"
cpu: "200m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
- mountPath: /etc/grafana/provisioning/datasources
name: grafana-datasources
readOnly: false
volumes:
- name: grafana-storage
emptyDir: {}
- name: grafana-datasources
configMap:
defaultMode: 420
name: grafana-datasources
Create Grafana Service.
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: default
spec:
selector:
app: grafana
type: ClusterIP
ports:
- port: 3000
targetPort: 3000
Expose Grafana on NodePort
32000
.kubectl -n default patch svc grafana --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"replace","path":"/spec/ports/0/nodePort","value":32000}]'
Access the Grafana dashboard using any node IP on port .
Setup Longhorn dashboard.
Once inside Grafana, import the prebuilt Longhorn example dashboard.
You should see the following dashboard at successful setup: