[转帖]Promethues + Grafana + AlertManager使用总结

promethues,grafana,alertmanager,使用,总结 · 浏览次数 : 0

小编点评

**内容简介** 本文介绍如何使用Spring Boot和Micrometer框架实现对Node5.0的微服务开发中自定义业务监控指标的平台。 **步骤** 1. **配置全局标签**：使用Tags.of()方法将一些自定义标签添加到Spring Boot Node5.0的配置中。 2. **打开prometheus端点功能**：使用Spring Boot Actuator提供的/actuator/promotheus断点来获取Prometheus数据。 3. **实现第三方请求的监控**：使用OkHttpMetricsEventListener监听OkHttp请求，并将一些指标数据注册到Spring Boot Node5.0的配置中。 4. **配置OkHttp Client事件监听**：使用OkHttpMetricsEventListener监听OkHttp请求，并将一些指标数据注册到Spring Boot Node5.0的配置中。 **代码示例** 以下是一个Spring Boot Node5.0的配置中配置全局标签的例子： ```java @Configuration @Property(name = "tags") @Tags( method = "method", uri = "uri", status = "status" ) public class MyConfig { // ... } ``` 以下是一个Spring Boot Node5.0的配置中配置第三方请求的监控的例子： ```java @Configuration @Property(name = "tags") @Tags( method = "method", uri = "uri", status = "status" ) public class MyConfig { @Bean public MyMetricsEventListener metricsEventListener() { return new MyMetricsEventListener(); } } ``` **其他** * 请阅读文档【1】和【2】以获取有关Grafana Dashboards的配置。 * 请阅读文档【3】以获取有关Centos7.X 搭建Prometheus+node-exporter+Grafana实时监控平台的配置。 * 请阅读文档【4】以获取有关Micrometer快速入门的配置。 * 请阅读文档【5】以获取有关Spring Boot+Prometheus：微服务开发中自定义业务监控指标的几点经验。

正文

Prometheus是一个开源监控报警系统和时序列数据库，通常会使用Grafana来美化数据展示。

1|01. 监控系统基础架

1|11.1核心组件
- Prometheus Server，主要用于抓取数据和存储时序数据，另外还提供查询和 Alert Rule 配置管理。
- exporters ，数据采样器，例如采集机器数据的node_exporter，采集MongoDB 信息的 MongoDB exporter 等等。
- alertmanager ，用于告警通知管理。
- Grafana ，监控数据图表化展示模块。
2|02. 基础组件安装

由于是学习研究使用，这里通过docker快速安装环境。

2|12.1 安装Node Exporter
- docker-compose-node-export.yml
  
  version: '3' services: node-exporter: image: prom/node-exporter container_name: node-exporter hostname: node-exporter restart: always ports: - "9100:9100"
2|22.2 安装Alert Manager
- docker-compose-alertmanager.yml
  
  version: '3' services: alertmanager: image: prom/alertmanager container_name: alertmanager hostname: alertmanager restart: always volumes: - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml ports: - "9093:9093"
- alertmanager.yml
  
  global: smtp_smarthost: 'smtp.qq.com:25'　　 #QQ服务器 smtp_from: '793272861@qq.com'　　　　　　　　 #发邮件的邮箱 smtp_auth_username: '793272861@qq.com'　　 #发邮件的邮箱用户名，也就是你的邮箱 smtp_auth_password: '****************'　　 #发邮件的邮箱密码 smtp_require_tls: false　　　　　　　　 #不进行tls验证 route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: live-monitoring receivers: - name: 'live-monitoring' email_configs: - to: '793272861@qq.com'　　　　　　　　 #收邮件的邮箱
2|32.3 安装Prometheus
- docker-compose-prometheus.yml
  
  version: '3' services: prometheus: image: prom/prometheus container_name: prometheus hostname: prometheus restart: always volumes: - /data/docker_file/prometheus/data:/prometheus - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090"
- prometheus.yml
  
  # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093'] # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. # 配置定时任务，轮询拉取监控数据 scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['prometheus:9090'] - job_name: 'node-exporter' scrape_interval: 5s static_configs: - targets: ['node-exporter:9100']
- Prometheus服务发现机制
  - 通过consul实现自动服务发现
- 访问：http://localhost:9090/
2|42.4 安装Grafana
- docker-compose-grafana.yml
  
  version: '3' services: grafana: image: grafana/grafana container_name: grafana hostname: grafana restart: always environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - /data/docker_file/grafana/data:/var/lib/grafana - /data/docker_file/grafana/log:/var/log/grafana ports: - "3000:3000"
- 添加数据源（Prometheus）
- 访问：http://localhost:30000/ , 默认用户名：admin，密码：admin
2|52.5 Docker-Compose脚本

version: '3' services: prometheus: image: prom/prometheus container_name: prometheus hostname: prometheus restart: always volumes: - /data/docker_file/prometheus/data:/prometheus - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090" networks: - monitor alertmanager: image: prom/alertmanager container_name: alertmanager hostname: alertmanager restart: always volumes: - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml ports: - "9093:9093" networks: - monitor grafana: image: grafana/grafana container_name: grafana hostname: grafana restart: always environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - /data/docker_file/grafana/data:/var/lib/grafana - /data/docker_file/grafana/log:/var/log/grafana ports: - "3000:3000" networks: - monitor node-exporter: image: prom/node-exporter container_name: node-exporter hostname: node-exporter restart: always ports: - "9100:9100" networks: - monitor networks: monitor: driver: bridge

3|03. 配置Grafana DashBoard

Grafana通过PromQL查询语句从Prometheus拉取数据，并有Pannel进行渲染，一个个Grafana Pannel 组成一个Grafana DashBoard。

3|13.1下载Grafana DashBoard文件

可以从官网下载已经写好的Grafana DashBoard文件，导入到我们Grafana系统就可以直接使用。

推荐的Grafana DashBoard
导入Grafana DashBoard

3|23.2 添加修改Grafana Panel（扩展）

官方自带的Spring Boot 2.1 Statistics Dashboard没有展示第三方请求的数据报表，我们以此为例，添加第三方请求的Client Request Count报表和Client Response Time报表。

Client Request Count

irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])

注意：应用中的Meter的名称必须为http.client.requests

Client Response Time

irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])

4|04. Spring Boot 集成Micrometer

Metrics（译：指标，度量）

Micrometer提供了与供应商无关的接口，包括 timers（计时器）， gauges（量规）， counters（计数器）， distribution summaries（分布式摘要）， long task timers（长任务定时器）。它具有维度数据模型，当与维度监视系统结合使用时，可以高效地访问特定的命名度量，并能够跨维度深入研究。

4|14.1 引入依赖

<dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> <version>${micrometer.version}</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency>

4|24.2 开启Prometheus功能

spring: application: name: spring-boot-node management: metrics: # 1.添加全局的tags，后面可以作为变量搜索数据 tags: application: ${spring.application.name} endpoints: web: exposure: # 2.打开prometheus端点功能 include: 'health,prometheus'

4|34.3 实现第三方请求的监控

基于OkHttpMetricsEventListener可以有好的对OkHttp Client的请求进行监控。

配置OkHttp Client事件监听

@Bean("okHttpClient") public OkHttpClient okHttpClient(ConnectionPool connectionPool) { return new OkHttpClient().newBuilder().connectionPool(connectionPool) .connectTimeout(5, TimeUnit.SECONDS) .readTimeout(10, TimeUnit.SECONDS) .eventListener(eventListener()) .build(); } /** * 事件监听器 OkHttpMetricsEventListener * metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可称为度量。 * @return */ private EventListener eventListener(){ return OkHttpMetricsEventListener.builder( meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName()) .build(); }

原理：OkHttpMetricsEventListener.java

public class OkHttpMetricsEventListener extends EventListener { /** * Header name for URI patterns which will be used for tag values. */ public static final String URI_PATTERN = "URI_PATTERN"; @Override public void callFailed(Call call, IOException e) { CallState state = callState.remove(call); if (state != null) { state.exception = e; // 请求完成时，注册监控数据 time(state); } } @Override public void responseHeadersEnd(Call call, Response response) { CallState state = callState.remove(call); if (state != null) { state.response = response; // 请求完成时，注册监控数据 time(state); } } private void time(CallState state) { String uri = state.response == null ? "UNKNOWN" : (state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request)); // 定义一些Tag或者是变量，在Prometheus和Grafana中可以使用 Iterable<Tag> tags = Tags.concat(extraTags, Tags.of( "method", state.request != null ? state.request.method() : "UNKNOWN", "uri", uri, "status", getStatusMessage(state.response, state.exception), "host", state.request != null ? state.request.url().host() : "UNKNOWN" )); // 注册计时器监控数据，此时Prometheus可以通过Spring Boot Actuator提供的/actuator/promotheus断点来pull数据 Timer.builder(this.requestsMetricName) .tags(tags) .description("Timer of OkHttp operation") .register(registry) .record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS); } }

4|44.4 Spring Boot集成案例
- Spring Boot Node
5|05. 参考文档

【1】Grafana Dashboards

【2】Centos7.X 搭建Prometheus+node-exporter+Grafana实时监控平台

【3】Micrometer 快速入门

【4】JVM应用度量框架Micrometer实战

【5】SpringBoot+Prometheus：微服务开发中自定义业务监控指标的几点经验