Grafana 系列文章(十四):Helm 安装Loki

grafana,系列,文章,十四,helm,安装,loki · 浏览次数 : 465

小编点评

**标题:配置 Grafana Dashboards 在同一个 NS 下** **正文:** 配置 Grafana Dashboards 在同一个 NS 下需要以下步骤: 1. 创建 ConfigMap: (只要打上 grafana_dashboard 这个 label 就会被 Grafana 的 sidecar 自动导入) ```apiVersion: v1kind: ConfigMapmetadata: name: sample-grafana-dashboard labels: grafana_dashboard: \"1\"data: k8s-dashboard.json: |- [...]Grafana 增加 DataSource在同一个 NS 下,创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入) ```apiVersion: v1kind: ConfigMapmetadata: name: loki-loki-stack labels: grafana_datasource: '1'data: loki-stack-datasource.yaml: |- ```apiVersion: 1 datasources: - name: Loki type: loki access: proxy url: http://loki:3100 version: 1Traefik 配置 Grafana IngressRoute因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:apiVersion: traefik.containo.us/v1alpha1kind: IngressRoutemetadata: name: grafanaspec: entryPoints: - web - websecure routes: - kind: Rule match: Host(`grafana.ewhisper.cn`) middlewares: - name: hsts-header namespace: kube-system - name: redirectshttps namespace: kube-system services: - name: loki-grafana namespace: monitoring port: 80 tls: {}最终效果如下:🎉🎉🎉📚️参考文档helm-charts/charts at main · grafana/helm-charts (github.com)Grafana 系列文章Grafana 系列文章三人行, 必有我师; 知识共享, 天下为公. **排版:** ``` apiVersion: v1kind: ConfigMapmetadata: name: sample-grafana-dashboard labels: grafana_dashboard: \"1\"data: k8s-dashboard.json: |- [...]Grafana 增加 DataSource在同一个 NS 下,创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入) apiVersion: v1kind: ConfigMapmetadata: name: loki-loki-stack labels: grafana_datasource: '1'data: loki-stack-datasource.yaml: |- apiVersion: 1 datasources: - name: Loki type: loki access: proxy url: http://loki:3100 version: 1Traefik 配置 Grafana IngressRoute因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:apiVersion: traefik.containo.us/v1alpha1kind: IngressRoutemetadata: name: grafanaspec: entryPoints: - web - websecure routes: - kind: Rule match: Host(`grafana.ewhisper.cn`) middlewares: - name: hsts-header namespace: kube-system - name: redirectshttps namespace: kube-system services: - name: loki-grafana namespace: monitoring port: 80 tls: {}最终效果如下:🎉🎉🎉📚️参考文档helm-charts/charts at main · grafana/helm-charts (github.com)Grafana 系列文章Grafana 系列文章三人行, 必有我师; 知识共享, 天下为公. ```

正文

前言

写或者翻译这么多篇 Loki 相关的文章了, 发现还没写怎么安装 😓

现在开始介绍如何使用 Helm 安装 Loki.

前提

有 Helm, 并且添加 Grafana 的官方源:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

🐾Warning:

网络受限, 需要保证网络通畅.

部署

架构

Promtail(收集) + Loki(存储及处理) + Grafana(展示)

Loki 架构图

Promtail

  1. 启用 Prometheus Operator Service Monitor 做监控
  2. 增加external_labels - cluster, 以识别是哪个 K8S 集群;
  3. pipeline_stages 改为 cri, 以对 cri 日志做处理(因为我的集群用的 Container Runtime 是 CRI, 而 Loki Helm 默认配置是 docker)
  4. 增加对 systemd-journal 的日志收集:
promtail:
  config:
    snippets:
      pipelineStages:
        - cri: {}

  extraArgs: 
    - -client.external-labels=cluster=ctyun
  # systemd-journal 额外配置:
  # Add additional scrape config
  extraScrapeConfigs:
    - job_name: journal
      journal:
        path: /var/log/journal
        max_age: 12h
        labels:
          job: systemd-journal
      relabel_configs:
        - source_labels: ['__journal__systemd_unit']
          target_label: 'unit'
        - source_labels: ['__journal__hostname']
          target_label: 'hostname'

  # Mount journal directory into Promtail pods
  extraVolumes:
    - name: journal
      hostPath:
        path: /var/log/journal

  extraVolumeMounts:
    - name: journal
      mountPath: /var/log/journal
      readOnly: true

Loki

  1. 启用持久化存储
  2. 启用 Prometheus Operator Service Monitor 做监控
    1. 并配置 Loki 相关 Prometheus Rule 做告警
  3. 因为个人集群日志量较小, 适当调大 ingester 相关配置

Grafana

  1. 启用持久化存储
  2. 启用 Prometheus Operator Service Monitor 做监控
  3. sidecar 都配置上, 方便动态更新 dashboards/datasources/plugins/notifiers;

Helm 安装

通过如下命令安装:

helm upgrade --install loki --namespace=loki --create-namespace grafana/loki-stack -f values.yaml

自定义 values.yaml 如下:

loki:
  enabled: true
  persistence:
    enabled: true
    storageClassName: local-path
    size: 20Gi
  serviceScheme: https
  user: admin
  password: changit!
  config:
    ingester:
      chunk_idle_period: 1h
      max_chunk_age: 4h
    compactor:
      retention_enabled: true
  serviceMonitor:
    enabled: true
    prometheusRule:
      enabled: true
      rules:
        #  Some examples from https://awesome-prometheus-alerts.grep.to/rules.html#loki
        - alert: LokiProcessTooManyRestarts
          expr: changes(process_start_time_seconds{job=~"loki"}[15m]) > 2
          for: 0m
          labels:
            severity: warning
          annotations:
            summary: Loki process too many restarts (instance {{ $labels.instance }})
            description: "A loki process had too many restarts (target {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestErrors
          expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10
          for: 15m
          labels:
            severity: critical
          annotations:
            summary: Loki request errors (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestPanic
          expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Loki request panic (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestLatency
          expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le)))  > 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Loki request latency (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

promtail:
  enabled: true
  config:
    snippets:
      pipelineStages:
        - cri: {}  
  extraArgs:
    - -client.external-labels=cluster=ctyun        
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: true

  # systemd-journal 额外配置:
  # Add additional scrape config
  extraScrapeConfigs:
    - job_name: journal
      journal:
        path: /var/log/journal
        max_age: 12h
        labels:
          job: systemd-journal
      relabel_configs:
        - source_labels: ['__journal__systemd_unit']
          target_label: 'unit'
        - source_labels: ['__journal__hostname']
          target_label: 'hostname'

  # Mount journal directory into Promtail pods
  extraVolumes:
    - name: journal
      hostPath:
        path: /var/log/journal

  extraVolumeMounts:
    - name: journal
      mountPath: /var/log/journal
      readOnly: true

fluent-bit:
  enabled: false

grafana:
  enabled: true
  adminUser: caseycui
  adminPassword: changit!
  ## Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders
  ## Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards
  sidecar:
    image:
      repository: quay.io/kiwigrid/k8s-sidecar
      tag: 1.15.6
      sha: ''
    dashboards:
      enabled: true
      SCProvider: true
      label: grafana_dashboard
    datasources:
      enabled: true
      # label that the configmaps with datasources are marked with
      label: grafana_datasource
    plugins:
      enabled: true
      # label that the configmaps with plugins are marked with
      label: grafana_plugin
    notifiers:
      enabled: true
      # label that the configmaps with notifiers are marked with
      label: grafana_notifier
  image:
    tag: 8.3.5
  persistence:
    enabled: true
    size: 2Gi
    storageClassName: local-path
  serviceMonitor:
    enabled: true
  imageRenderer:
    enabled: disable

filebeat:
  enabled: false

logstash:
  enabled: false

安装后的资源拓扑如下:

Loki K8S 资源拓扑

Day 2 配置(按需)

Grafana 增加 Dashboards

在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_dashboard 这个 label 就会被 Grafana 的 sidecar 自动导入)

apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-grafana-dashboard
  labels:
     grafana_dashboard: "1"
data:
  k8s-dashboard.json: |-
  [...]

Grafana 增加 DataSource

在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入)

apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-loki-stack
  labels:
    grafana_datasource: '1'
data:
  loki-stack-datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      access: proxy
      url: http://loki:3100
      version: 1

Traefik 配置 Grafana IngressRoute

因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: grafana
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - kind: Rule
      match: Host(`grafana.ewhisper.cn`)
      middlewares:
        - name: hsts-header
          namespace: kube-system
        - name: redirectshttps
          namespace: kube-system
      services:
        - name: loki-grafana
          namespace: monitoring
          port: 80
  tls: {}

最终效果

如下:

Grafana Explore Logs

🎉🎉🎉

📚️参考文档

Grafana 系列文章

Grafana 系列文章

三人行, 必有我师; 知识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.

与Grafana 系列文章(十四):Helm 安装Loki相似的内容:

Grafana 系列文章(十四):Helm 安装Loki

前言 写或者翻译这么多篇 Loki 相关的文章了, 发现还没写怎么安装 😓 现在开始介绍如何使用 Helm 安装 Loki. 前提 有 Helm, 并且添加 Grafana 的官方源: helm repo add grafana https://grafana.github.io/helm-cha

Grafana 系列文章(十):为什么应该使用 Loki

👉️URL: https://grafana.com/blog/2020/09/09/all-the-non-technical-advantages-of-loki-reduce-costs-streamline-operations-build-better-teams/ 📝Descript

Grafana 系列文章(十一):Loki 中的标签如何使日志查询更快更方便

👉️URL: https://grafana.com/blog/2020/04/21/how-labels-in-loki-can-make-log-queries-faster-and-easier/ 📝Description: 关于标签在 Loki 中如何真正发挥作用,你需要知道的一切。它可

Grafana 系列文章(十二):如何使用Loki创建一个用于搜索日志的Grafana仪表板

概述 创建一个简单的 Grafana 仪表板, 以实现对日志的快速搜索. 有经验的直接用 Grafana 的 Explore 功能就可以了. 但是对于没有经验的人, 他们如何能有一个已经预设了简单的标签搜索的仪表板,以帮助一些团队在排除故障时快速找到他们正在寻找的东西。虽然 Explore 很适合这

Grafana 系列文章(十三):如何用 Loki 收集查看 Kubernetes Events

前情提要 IoT 边缘集群基于 Kubernetes Events 的告警通知实现 IoT 边缘集群基于 Kubernetes Events 的告警通知实现(二):进一步配置 概述 在分析 K8S 集群问题时,Kubernetes Events 是超级有用的。 Kubernetes Events 可

Grafana 系列文章(十五):Exemplars

Exemplars 简介 Exemplar 是用一个特定的 trace,代表在给定时间间隔内的度量。Metrics 擅长给你一个系统的综合视图,而 traces 给你一个单一请求的细粒度视图;Exemplar 是连接这两者的一种方式。 假设你的公司网站正经历着流量的激增。虽然超过百分之八十的用户能够

Grafana系列-统一展示-8-ElasticSearch日志快速搜索仪表板

系列文章 Grafana 系列文章 概述 我们是基于这篇文章: Grafana 系列文章(十二):如何使用 Loki 创建一个用于搜索日志的 Grafana 仪表板, 创建一个类似的, 但是基于 ElasticSearch 的日志快速搜索仪表板. 最终完整效果如下: 📝Notes: 其实我基于 E

Grafana 系列文章(一):基于 Grafana 的全栈可观察性 Demo

📚️Reference: https://github.com/grafana/intro-to-mlt 这是关于 Grafana 中可观察性的三个支柱的一系列演讲的配套资源库。 它以一个自我封闭的 Docker 沙盒的形式出现,包括在本地机器上运行和实验所提供的服务所需的所有组件。 Grafan

Grafana 系列文章(二):使用 Grafana Agent 和 Grafana Tempo 进行 Tracing

👉️URL: https://grafana.com/blog/2020/11/17/tracing-with-the-grafana-cloud-agent-and-grafana-tempo/ ✍Author: Robert Fratto • 17 Nov 2020 📝Description

Grafana 系列文章(三):Tempo-使用 HTTP 推送 Spans

👉️URL: https://grafana.com/docs/tempo/latest/api_docs/pushing-spans-with-http/ 📝Description: 有时,使用追踪系统是令人生畏的,因为它似乎需要复杂的应用程序仪器或 span 摄取管道,以便 ... 有时,使