最新的软件包获取地址

Prometheus

1、下载Prometheus
2、解压软件包
    3、配置Prometheus启动程序

    把解压出来的文件移动到/usr/local/目录下,并重命名为prometheus

    1. $ mv prometheus-2.6.0.linux-amd64 /usr/local/prometheus

    生成启动脚本

    1. $ vim /usr/lib/systemd/system/prometheus.service
    2. [Unit]
    3. Description=Prometheus: the monitoring system
    4. Documentation=http://prometheus.io/docs/
    5. [Service]
    6. ExecStart=/usr/local/prometheus/prometheus \
    7. --config.file=/usr/local/prometheus/prometheus.yml \
    8. --storage.tsdb.path=/var/lib/prometheus \
    9. --web.console.templates=/usr/local/prometheus/consoles \
    10. --web.console.libraries=/usr/local/prometheus/console_libraries \
    11. --web.listen-address=0.0.0.0:9090 --web.external-url=
    12. Restart=always
    13. StartLimitInterval=0
    14. RestartSec=10
    15. [Install]
    16. WantedBy=multi-user.target

    创建监控数据存储目录

    1. $ mkdir /var/lib/prometheus
    4、启动Prometheus
    1. $ systemctl daemon-reload
    2. $ systemctl enable prometheus
    3. $ systemctl start prometheus
    5、查看端口监听状态

    Prometheus监听的端口为9090,启动成功后可以通过netstat命令进行查看端口的监听状态

    1. $ netstat -antpu | grep 9090
    2. tcp 0 0 127.0.0.1:33270 127.0.0.1:9090 ESTABLISHED 6426/prometheus
    3. tcp6 0 0 :::9090 :::* LISTEN 6426/prometheus
    4. tcp6 0 0 ::1:9090 ::1:51821 ESTABLISHED 6426/prometheus
    5. tcp6 0 0 ::1:51821 ::1:9090 ESTABLISHED 6426/prometheus
    6. tcp6 0 0 127.0.0.1:9090 127.0.0.1:33270 ESTABLISHED 6426/prometheus
    6、通过浏览器进行访问

    Prometheus启动成功后,可以通过浏览器访问查看状态和配置信息

    1、安装软件Go环境
    1. $ yum -y install golang
    2、查看Go环境变量
    1. $ go env
    2. GOARCH="amd64"
    3. GOBIN=""
    4. GOCACHE="/root/.cache/go-build"
    5. GOEXE=""
    6. GOFLAGS=""
    7. GOHOSTARCH="amd64"
    8. GOHOSTOS="linux"
    9. GOOS="linux"
    10. GOPATH="/root/go"
    11. GOPROXY=""
    12. GORACE=""
    13. GOROOT="/usr/lib/golang"
    14. GOTMPDIR=""
    15. GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
    16. GCCGO="gccgo"
    17. CC="gcc"
    18. CXX="g++"
    19. CGO_ENABLED="1"
    20. GOMOD=""
    21. CGO_CFLAGS="-g -O2"
    22. CGO_CPPFLAGS=""
    23. CGO_CXXFLAGS="-g -O2"
    24. CGO_FFLAGS="-g -O2"
    25. CGO_LDFLAGS="-g -O2"
    26. PKG_CONFIG="pkg-config"
    27. GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build359765015=/tmp/go-build -gno-record-gcc-switches"
    3、设置Go环境变量
    1. export GOROOT=/usr/lib/golang
    2. export GOBIN=$GOROOT/bin
    3. export GOPATH=/root/go
    4. export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
    5. $ source /etc/profile.d/go.sh
    4、下载并编译Ceph_exporter
    5、创建Ceph_exporter启动程序
    1. $ mkdir ~/go/bin/
    2. $ cp ~/go/src/github.com/digitalocean/ceph_exporter/ceph_exporter ~/go/bin/
    3. $ vim /usr/lib/systemd/system/ceph_exporter.service
    4. [Unit]
    5. Description=Prometheus's ceph metrics exporter
    6. [Service]
    7. User=root
    8. Group=root
    9. ExecStart=/root/go/bin/ceph_exporter
    10. [Install]
    11. WantedBy=multi-user.target
    12. Alias=ceph_exporter.service
    6、启动Ceph_exporter
    1. $ systemctl daemon-reload
    2. $ systemctl enable ceph_exporter
    3. $ systemctl start ceph_exporter
    7、查看端口监听状态

    Ceph_exporter使用的是9128端口,可以通过netstat进行查看端口的监听状态

    1. $ netstat -antpu | grep 9128
    2. tcp6 0 0 :::9128 :::* LISTEN 6839/ceph_exporter
    8、修改Prometheus配置

    把Ceph_exporter的接口添加到Prometheus的配置中

    1. $ vim /usr/local/prometheus/prometheus.yml
    2. scrape_configs:
    3. - job_name: 'ceph'
    4. honor_labels: true
    5. static_configs:
    6. - targets: ['192.168.1.10:9128']
    7. labels:
    8. instance: Ceph测试集群
    9、重启Prometheus进程
    1. $ systemctl restart prometheus
    10、浏览器访问验证

    112.png

    Grafana

    1、下载软件包
    1. $ wget https://dl.grafana.com/oss/release/grafana-5.4.3-1.x86_64.rpm

    不同系统的最新软件包可以在Grafana的官网获取下载地址

    2、安装Grafana
    1. $ yum -y install grafana-5.4.3-1.x86_64.rpm
    3、启动Grafana
    1. $ systemctl enable grafana-server
    2. $ systemctl start grafana-server
    4、查看端口监听状态

    Grafana监听端口为3000,可以使用netstat查看监听状态

    1. $ netstat -antpu | grep 3000
    2. tcp6 0 0 :::3000 :::* LISTEN 7147/grafana-server
    5、浏览器访问登录

    访问地址为http://$IP:3000

    6、配置Dashboard

    点击Add data source添加数据源 选择Prometheus 4.png URL地址为Prometheus的访问地址http://$IP:9090 导入Dashboard,模板的编号为917,如果无法连接互联网,也可以在Grafana的官网下载模板后手动导入 6.png 查看监控状态 8.png

    1、安装Alertmanager
    2、生成启动程序
    1. $ vim /usr/lib/systemd/system/alertmanager.service
    2. [Unit]
    3. Description=Prometheus: the alerting system
    4. Documentation=http://prometheus.io/docs/
    5. After=prometheus.service
    6. [Service]
    7. ExecStart=/usr/bin/alertmanager --config.file=/usr/local/prometheus/alertmanager.yml
    8. Restart=always
    9. StartLimitInterval=0
    10. RestartSec=10
    11. [Install]
    12. WantedBy=multi-user.target
    3、启动Alertmanager
    1. $ systemctl enable alertmanager
    2. $ systemctl start alertmanager
    4、查看端口监听状态

    Alertmanager的监听端口为9093,可以使用netstat查看端口监听状态

    1. tcp6 0 0 :::9093 :::* LISTEN 7381/alertmanager
    5、配置Prometheus,添加Alertmanager端点
    1. $ vim /usr/local/prometheus/prometheus.yml
    2. alerting:
    3. alertmanagers:
    4. - static_configs:
    5. - targets: ["192.168.1.10:9093"]
    6、重启Prometheus
    1. $ systemctl restart prometheus

    配置钉钉告警

    1、配置webhook
    1. $ mkdir -p /usr/lib/golang/src/github.com/timonwong/
    2. $ cd /usr/lib/golang/src/github.com/timonwong/
    3. $ git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
    4. $ cd prometheus-webhook-dingtalk
    5. $ make
    6. $ nohup ./prometheus-webhook-dingtalk --ding.profile="webhook=https://oapi.dingtalk.com/robot/send?access_token=8fe12c1a58b0769d7fcbf6ebf3bcd2cfcba825f2c45b4b39055890fd705df543" &> /var/log/dingding.log &
    2、添加webhook告警
    1. $ vim /usr/local/prometheus/alertmanager.yml
    2. global:
    3. resolve_timeout: 5m
    4. route:
    5. group_by: ['alertname']
    6. group_wait: 10s
    7. group_interval: 10s
    8. repeat_interval: 1h
    9. receiver: 'web.hook'
    10. receivers:
    11. - name: 'web.hook'
    12. webhook_configs:
    13. - url: 'http://192.168.1.10:8060/dingtalk/webhook/send'
    14. inhibit_rules:
    15. - source_match:
    16. severity: 'critical'
    17. target_match:
    18. severity: 'warning'
    19. equal: ['alertname', 'dev', 'instance']
    3、添加告警规则文件
    1. $ vim /usr/local/prometheus/prometheus.yml
    2. rule_files:
    3. - /usr/local/prometheus/ceph.yml
    4、配置告警规则
    1. $ vim /usr/local/prometheus/ceph.yml
    2. groups:
    3. - name: ceph-rule
    4. rules:
    5. - alert: Ceph OSD Down
    6. expr: ceph_osd_down > 0
    7. for: 2m
    8. labels:
    9. product: Ceph测试集群
    10. annotations:
    11. Warn: "{{$labels.instance}}: 有{{ $value }}个OSD挂掉了"
    12. Description: "{{$labels.instance}}:{{ $labels.osd }}当前状态为{{ $labels.status }}"
    13. - alert: 集群空间使用率
    14. expr: ceph_cluster_used_bytes / ceph_cluster_capacity_bytes * 100 > 80
    15. for: 2m
    16. labels:
    17. product: Ceph测试集群
    18. annotations:
    19. Warn: "{{$labels.instance}}:集群空间不足"
    20. Description: "{{$labels.instance}}:当前空间使用率为{{ $value }}"
    5、重启进程使配置生效
    6、钉钉验证

    停掉一个OSD后,钉钉收到如下告警

    重新启动后收到恢复通知

    10.png