nginx调优
因为跟后端app的统计时间对不上,所以有点怀疑是不是nginx proxy这边出了问题。 一开始用的strace来进行检查,发现是 recvfrom()这个系统函数比较慢,但是要更深入的解析,那strace就没有办法了。
1 | strace -s 5000 -rp 1991 2>&1 | awk '$1 > 0.01' |
google了下下发现agentzh大神写了很多关于systemtap的小工具,于是就直接用上了进行了分析。 主要是两个部分,一个nginx的用户空间的调用,另外一个是内核空间的调用。 要调试内核空间的话一定要安装kernel-debuginfo包才行,大家根据自己的内核版本来安装吧。 https://github.com/agentzh/nginx-systemtap-toolkit 由于系统默认的systemtap版本比较低,所以要自己手动编译安装2.1以上的版本,我就装了比较新的2.2.1版本,安装这个需要先安装好kernel-devel包才行。
1 | wget http://sourceware.org/systemtap/ftp/releases/systemtap-2.2.1.tar.gz |
在都安装完systemtap和debuginfo包之后执行下面这个进行测试看看: stap -v -e ‘probe vfs.read {printf(“read performedn”); exit()}’ 具体的用法在https://github.com/agentzh/nginx-systemtap-toolkit都有介绍的,下面是简单演示了一下其中几个的用法
1 | [root@11 nginx-systemtap-toolkit]# ./ngx-active-reqs -p 23187 ERROR: MAXACTION exceeded near keyword at <input type="text" >:32:13 Tracing 23187 (/usr/local/nginx/sbin/nginx)... req "POST /bid?", r=0x11c4510, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.036s, buffered=0, conn: ssl=0, from=110.75.20.114, reqs=11, err=0, fd=272, buffered=0, sending request to upstream req "POST /bid?", r=0x10a0fe0, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.003s, buffered=0, conn: ssl=0, from=110.75.36.5, reqs=6, err=0, fd=56, buffered=0, sending request to upstream req "POST /bid?", r=0x11c4e50, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.011s, buffered=0, conn: ssl=0, from=110.75.20.115, reqs=23, err=0, fd=269, buffered=0, sending request to upstream req "POST /?", r=0x10a14e0, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.001s, buffered=0, conn: ssl=0, from=101.226.62.84, reqs=26585, err=0, fd=220, buffered=0, sending request to upstream req "POST /bid?", r=0x11af4f0, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.014s, buffered=0, conn: ssl=0, from=110.75.36.5, reqs=1, err=0, fd=316, buffered=0, sending request to upstream req "POST /bid?", r=0x11a2ef0, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.036s, buffered=0, conn: ssl=0, from=110.75.36.3, reqs=11, err=0, fd=184, buffered=0, sending request to upstream req "POST /bid?", r=0x10e9f30, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.017s, buffered=0, conn: ssl=0, from=110.75.36.5, reqs=2, err=0, fd=134, buffered=0, sending request to upstream req "POST /bid?", r=0x10bac20, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.019s, buffered=0, conn: ssl=0, from=110.75.20.112, reqs=3, err=0, fd=150, buffered=0, sending request to upstream req "POST /bid?", r=0x1180a10, keepalive=1, spdy=0, host=www.sina.com, status=0, time=0.006s, buffered=0, conn: ssl=0, from=110.75.36.4, reqs=6, err=0, fd=147, buffered=0, sending request to upstream WARNING: Number of errors: 1, skipped probes: 0 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run failed. \[man error::pass5\] |
上面这段信息并没有任何问题,所以需要更详细的分析了,只好看更深入的了。这里主要是通过火焰图来显示。要生成火焰图就还要再下载一个程序。
1 | git clone git://github.com/brendangregg/FlameGraph |
生成火焰图
1 | ./ngx-sample-bt -p 30763 -t 5 -k > c.bt ./stackcollapse-stap.pl ../nginx-systemtap-toolkit/b.bt > b.cbt ./flamegraph.pl b.cbt > b.svg |
生成的svg图片要用浏览器打开,可以看到每个方块调用的方法名和具体时间信息。 比如我这个几乎每个系统调用的时间都非常的短。 下面这个图就是展示了每个内核调用的时间,基本上大部分都是iptables的部分。但是每个时间还是相对比较短的。 看来要在nginx这边再优化的工作比较少了,只能在后端app上找原因了。