[转帖]Using NGINX Logging for Application Performance Monitoring

using,nginx,logging,for,application,performance,monitoring · 浏览次数 : 0

小编点评

**NGINX Logging for Application Performance Monitoring** NGINX provides built-in timing variables that you can include in log entries for application performance monitoring (APM). These variables include: * `$request_time`: Full request time, measured in seconds with millisecond resolution. * `$upstream_connect_time`: Time spent establishing a connection with an upstream server. * `$upstream_header_time`: Time between establishing a connection to an upstream server and receiving the first byte of the response header. * `$upstream_response_time`: Time between establishing a connection to an upstream server and receiving the last byte of the response body. These variables can be used to create custom log formats that provide more granular insights into application performance. **Using Splunk to Analyze NGINX Access Log** You can use Splunk to analyze the NGINX access log to identify performance bottlenecks. Here's how to do it: 1. **Configure Splunk to send the NGINX access log entries to Splunk.** 2. **Create charts and graphs to visualize request and response times.** 3. **Drill down into specific pages or requests to identify performance issues.** 4. **Use application-defined timing values to capture and include performance metrics in log entries.** **Benefits of Using NGINX Access Logging for APM:** * **Easy to set up and use.** * **Provides detailed performance insights.** * **Can be used for one-off troubleshooting or continuous monitoring.** * **Simple to integrate with Splunk for data analysis.** **Example Use Case:** To monitor the performance of an application called `apmtest.php`, you can configure Splunk to send the NGINX access log to Splunk. Then, you can create charts and graphs to visualize request and response times for `apmtest.php`. You can also use application-defined timing values to capture and include performance metrics in log entries.

正文

The live activity monitoring dashboard and API in NGINX Plus track many system metrics that you can use to analyze the load and performance of your system. If you need request-level information, the access logging in NGINX and NGINX Plus is very flexible – you can configure which data is logged, selecting from the large number of data points that can be included in a log entry in the form of variables. You can also define customized log formats for different parts of your application.

One interesting use case for taking advantage of the flexibility of NGINX access logging is application performance monitoring (APM). There are certainly many APM tools to choose from and NGINX is not a complete replacement for them, but it’s simple to get detailed visibility into the performance of your applications by adding timing values to your code and passing them as response headers for inclusion in the NGINX access log.

In this post we describe how to feed timing data from NGINX or NGINX Plus to Splunk for analysis, but Splunk is just an example – any good log‑analysis tool can provide similar insights.

Using the NGINX Built‑In Timing Variables

NGINX provides a number of built‑in timing variables that you can include in log entries. All are measured in seconds with millisecond resolution.

  • $request_time – Full request time, starting when NGINX reads the first byte from the client and ending when NGINX sends the last byte of the response body
  • $upstream_connect_time – Time spent establishing a connection with an upstream server
  • $upstream_header_time – Time between establishing a connection to an upstream server and receiving the first byte of the response header
  • $upstream_response_time – Time between establishing a connection to an upstream server and receiving the last byte of the response body

Here is a sample log format called apm that includes these four NGINX timing variables along with some other useful information:

log_format apm '"$time_local" client=$remote_addr '
               'method=$request_method request="$request" '
               'request_length=$request_length '
               'status=$status bytes_sent=$bytes_sent '
               'body_bytes_sent=$body_bytes_sent '
               'referer=$http_referer '
               'user_agent="$http_user_agent" '
               'upstream_addr=$upstream_addr '
               'upstream_status=$upstream_status '
               'request_time=$request_time '
               'upstream_response_time=$upstream_response_time '
               'upstream_connect_time=$upstream_connect_time '
               'upstream_header_time=$upstream_header_time';

For example, let’s say that we are getting complaints about slow response times from an application that has three PHP pages – apmtest.php, apmtest2.php, and apmtest3.php – each of which does a database lookup, analyzes some data, and then writes to the database. To determine the source of the slowness, we generate load to our application and analyze the NGINX access log data. For this example, we have NGINX use syslog to send the access log entries to Splunk.

We use the following Splunk command to plot the total request time (corresponding to the $request_time variable) for each PHP page:

* | timechart avg(request_time) by request

And we get these results (the x‑axis shows requests and the y‑axis shows the response time in seconds):

Graph showing total response times for three PHP pages, to illustrate how NGINX logging can be used for request-level application performance monitoring

From this we see that the total request time for each execution of apmtest2.php and apmtest3.php is relatively consistent, but for apmtest.php there are large fluctuations. To take a closer look at that page, we use this Splunk command to plot the upstream response time and upstream connect time for that page only:

* | regex request="(^.+/apmtest.php.+$)" | timechart avg(upstream_response_time) avg(upstream_connect_time)

And we get these results:

Graph showing upstream connection and response time for the problematic PHP page, to illustrate how NGINX logging can be used for request-level application performance monitoring

This shows us that the upstream connect time is negligible, so we can focus on the upstream response time as the source of the large and variable total response times.

Using Application‑Defined Timing Values

To drill down, we capture timings in the application itself and include them as response headers, which NGINX then captures in its access log. How granular you want to get is up to you.

To continue with our example, we have the application return the following response headers which record processing time for the indicated internal operations:

  • db_read_time – Database lookup
  • db_write_time – Database write
  • analysis_time – Data analysis
  • other_time – All other types of processing

NGINX captures the timing values from the response headers by creating corresponding variables, which it names by prepending the string $upstream_http_ to the header name (for example, $upstream_http_db_read_time corresponds to db_read_time). You can then include the variables in log entries just like standard NGINX variables.

Here is the previous sample log format extended to include the application header values:

log_format apm 'timestamp="$time_local" client=$remote_addr '
               'request="$request" request_length=$request_length '
               'bytes_sent=$bytes_sent ' 
               'body_bytes_sent=$body_bytes_sent '
               'referer=$http_referer '
               'user_agent="$http_user_agent" '
               'upstream_addr=$upstream_addr '
               'upstream_status=$upstream_status '
               'request_time=$request_time ' 
               'upstream_response_time=$upstream_response_time '
               'upstream_connect_time=$upstream_connect_time '
               'upstream_header_time=$upstream_header_time '
               'app_db_read_time=$upstream_http_db_read_time '
               'app_db_write_time=$upstream_http_db_write_time '
               'app_analysis_time=$upstream_http_analysis_time '
               'app_other_time=$upstream_http_other_time ';

Now we run the test again, this time against apmtest.php only. We run the following Splunk command to plot the four application header values:

* | timechart avg(app_db_read_time), avg(app_db_write_time), avg(app_analysis_time), avg(app_other_time)

And get these results:

Graph showing application-internal processing times for the problematic PHP page, to illustrate how NGINX logging can be used for request-level application performance monitoring

The graph shows that data analysis accounts both for the largest portion of the processing time and for the fluctuations in total response time. To drill down further, we can add additional timings to the code. We can also look through the log details to see whether certain types of requests are causing the longer response times.

Conclusion

While there are many sophisticated APM tools available to help you investigate performance issues with your applications, they are often expensive and complex. You might find that a simple and easy solution like the one described here, using the configurable NGINX access logging capabilities and a log analysis tool, can be quite easy and cost effective. You can use this approach for one‑off troubleshooting, or you can include application‑level timings in the NGINX access log all the time, to help alert you to issues.

与[转帖]Using NGINX Logging for Application Performance Monitoring相似的内容:

[转帖]Using NGINX Logging for Application Performance Monitoring

https://www.nginx.com/blog/using-nginx-logging-for-application-performance-monitoring/ The live activity monitoring dashboard and API in NGINX Plus tr

[转帖]Using NGINX and NGINX Plus as a Web Server

https://www.nginx.com/blog/more-fun-with-nginx-plus-health-checks-and-docker-containers/ 2019 update: NGINX has now passed Apache to become the most p

[转帖]Real-Time Web Applications with WebSocket and NGINX

https://www.nginx.com/blog/realtime-applications-nginx/ In the blog post NGINX as a WebSocket Proxy we discussed using NGINX to proxy WebSocket applic

[转帖]使用nginx的proxy_store缓存文件加速访问速度

https://www.qiansw.com/using-nginxs-proxystore-cache-file-to-accelerate-access-speed.html nginx的proxy_store可以将后端服务器的文件暂存在本地. 基于此,可以实现nginx的缓存后端服务器文件,加

[转帖]使用cwRsync在Windows的目录之间增量同步文件

https://www.qiansw.com/using-cwrsync-in-the-windows-directory-between-the-incremental-synchronization-file.html rsync 是 Linux 上的一款文件同步工具,他可以以其特有的算法,对两

[转帖]Redis benchmark

https://redis.io/docs/management/optimization/benchmarks/ Using the redis-benchmark utility on a Redis server Redis includes the redis-benchmark utili

[转帖]tidb 如何对 TiDB 进行 TPC-C 测试

https://docs.pingcap.com/zh/tidb/stable/benchmark-tidb-using-tpcc TPC-C 是一个对 OLTP(联机交易处理)系统进行测试的规范,使用一个商品销售模型对 OLTP 系统进行测试,其中包含五类事务: NewOrder – 新订单的生成

[转帖]使用 TiUP 扩容缩容 TiDB 集群

https://docs.pingcap.com/zh/tidb/stable/scale-tidb-using-tiup TiDB 集群可以在不中断线上服务的情况下进行扩容和缩容。 本文介绍如何使用 TiUP 扩容缩容集群中的 TiDB、TiKV、PD、TiCDC 或者 TiFlash 节点。如未

[转帖]TiUP 常见运维操作

https://docs.pingcap.com/zh/tidb/stable/maintain-tidb-using-tiup 本文介绍了使用 TiUP 运维 TiDB 集群的常见操作,包括查看集群列表、启动集群、查看集群状态、修改配置参数、关闭集群、销毁集群等。 查看集群列表 TiUP clus

[转帖]使用 TiUP 部署 TiDB 集群

https://docs.pingcap.com/zh/tidb/stable/production-deployment-using-tiup TiUP 是 TiDB 4.0 版本引入的集群运维工具,TiUP cluster 是 TiUP 提供的使用 Golang 编写的集群管理组件,通过 TiU