问题:

nginx 使用了 nginx-module-vts 做 nginx 监控,在 prometheus 和 grafana 中看不到相关的监控数据。在 nginx 的 error.log 日志中发现大量的 shm_add_node::ngx_slab_alloc_locked()错误信息,大概内容如下:

2024/04/16 18:37:58 [error] 239430#0: *784097 shm_add_node::ngx_slab_alloc_locked() failed: used_size[886746], used_node[251] while logging request, client: *.*.*.*, server: www.demo.cn, request: "GET /api/get HTTP/1.1", upstream: "http://101.126.76.112:30030/api/get", host: "www.demo.cn"
2024/04/16 18:37:58 [error] 239430#0: *784097 handler::shm_add_server() failed while logging request, client: *.*.*.*, server: www.demo.cn, request: "GET /api/get HTTP/1.1", upstream: "http://101.126.76.112:30030/api/get", host: "www.demo.cn"
2024/04/16 18:37:58 [error] 239430#0: *784097 shm_add_node::ngx_slab_alloc_locked() failed: used_size[886746], used_node[251] while logging request, client: *.*.*.*, server: www.demo.cn, request: "GET /api/get HTTP/1.1", upstream: "http://101.126.76.112:30030/api/get", host: "www.demo.cn"

解决方法:

https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone中看到有如下说明:

vhost_traffic_status_zone.png

nginx 配置文件的 http 区域,修改 vhost_traffic_status_zone 配置,该配置的默认值为shared:vhost_traffic_status:1m。默认只有1m太小,调整为32m后就不再报错了,如果后续还保存则需要继续往上调(32m*2)。比如:

http {
    ...
    vhost_traffic_status_zone shared:vhost_traffic_status:32m;
    ...
}