nginx 的监控规则可以参考:Awesome Prometheus alerts | Collection of alerting rules

实际上里面的规则可能还需要略微修改才可以使用:

groups:
- name: Nginx
  rules:
  - alert: NginxHighHttp4xxErrorRate
    expr: sum(rate(nginx_server_requests{code="4xx" ,host=~"[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+.?"}[1m])) by (host, instance) / sum(rate(nginx_server_requests[1m])) by (host, instance) * 100 > 30
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: Nginx high HTTP 4xx error rate (instance {{ $labels.instance }})
      description: "Too many HTTP requests with status 4xx (> 30%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: NginxHighHttp5xxErrorRate
    expr: sum(rate(nginx_server_requests{code="5xx" ,host=~"[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+.?"}[1m])) by (host, instance) / sum(rate(nginx_server_requests[1m])) by (host, instance) * 100 > 30
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: Nginx high HTTP 5xx error rate (instance {{ $labels.instance }})
      description: "Too many HTTP requests with status 5xx (> 30%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: NginxLatencyHigh
    expr: histogram_quantile(0.99, sum(rate(nginx_server_requestMsec[2m])) by (host, node)) > 3
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Nginx latency high (instance {{ $labels.instance }})
      description: "Nginx p99 latency is higher than 3 seconds\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

以上的规则修改了4xx和5xx的匹配规则,nginx_server_requests是通过安装nginx_ vts_exporter工具获取的,不是文章中的nginx-lua-prometheus。另外host中通过[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+.?这个正则匹配了域名出来,避免匹配到类似*或者_等不需要的信息出来。