2014-11-12

nginx の worker_connections は worker 当たりの同時接続数だと思ってたけどどうも違うっぽい

nginx

※2014-12-18 追記※

はてブとかtwitterがついててちょっとビックリしてます．
そしてtwitterで貴重なご意見をいただきました．

1024以下でもう少し増やすとどうなるかなぁ。workerに偏りがあってエラー出てるのかも。例えば8*768とか。 / “nginx の worker_connections は worker 当たりの同時接続数だと思ってたけどど…” http://t.co/5aLNa0VeOy
— MATSUMOTO, Ryosuke (@matsumotory) 2014, 12月 16

なるほどー．
ということでworker_processes 8, worker_connections 768でざっくりですが早速試してみました．
結果，worker_connections are not enoughエラーは見られなくなったように思います．
# というのも ApacheBench でapr_socket_recv: Connection reset by peer (104)が頻発するようになってしまって自信がないです．
# この記事を書いたときにはこんなことはなく，同じ環境で確認を行っているのですが…うーん…

というわけで，早とちりだったようです．申し訳ありません．

もともとは

worker_connectionsは「worker 当たり」ではなく「nginx 全体」の同時接続数になっていると考えられる．

という風にまとめていましたが，

workerの偏りを考慮してworker_connectionsはある程度マージンを持たせて設定したほうがよさそう．

とまとめ直したいと思います．

※追記おわり※

ベンチマークを取っている時にどうも違うような挙動をしたので確認してみた．

worker_processes 8, worker_connections 256 のとき

いままでの理解

nginx の同時接続数は

worker_connections * worker_processes

なので

256 * 8 = 2048

になるんだと思ってた．

確認

この設定に対して同時接続数 1024 でリクエストしても当然問題なく処理されるものと思いきや，

[root@client ~]# ab -c 1024 -n 10240 'http://10.100.47.58/10k.dat'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.100.47.58 (be patient)
apr_socket_recv: Connection reset by peer (104)
Total of 544 requests completed

エラーになってしまった．
さらにエラーログに

2014/11/12 17:41:55 [alert] 13733#0: 256 worker_connections are not enough

が出力されておりworker_connectionsを使い切ったことがはっきりとわかる．

worker_processes 1, worker_connections 2048 のとき

[root@hosweb001 ~]# ab -c 1024 -n 10240 'http://10.100.47.58/10k.dat'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.100.47.58 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests


Server Software:        nginx/1.7.7
Server Hostname:        10.100.47.58
Server Port:            80

Document Path:          /10k.dat
Document Length:        10240 bytes

Concurrency Level:      1024
Time taken for tests:   1.003 seconds
Complete requests:      10240
Failed requests:        0
Write errors:           0
Total transferred:      107595930 bytes
HTML transferred:       105031680 bytes
Requests per second:    10206.19 [#/sec] (mean)
Time per request:       100.331 [ms] (mean)
Time per request:       0.098 [ms] (mean, across all concurrent requests)
Transfer rate:          104727.19 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    4   5.3      1      21
Processing:     3   20  42.7     13     620
Waiting:        2   17  42.3     12     617
Total:          7   24  44.2     13     630

Percentage of the requests served within a certain time (ms)
  50%     13
  66%     14
  75%     16
  80%     31
  90%     37
  95%     61
  98%     85
  99%    216
 100%    630 (longest request)

この設定だと問題なく処理されるので，さっきのエラーがカーネルパラメータや ulimit 等によるリソース不足によるものではないことがわかる．

まとめ

今までの理解ならどっちも問題なく処理できるか，どっちも失敗するかのはず．
「worker_processes 8, worker_connections 256」がダメで「worker_processes 1, worker_connections 2048」が OK ということは worker_connectionsは「worker 当たり」ではなく「nginx 全体」の同時接続数になっていると考えられる．

早とちりだったようです．追記をご覧ください．

2014-11-12

nginx 1.7.7 のベンチマーク

nginx ベンチマーク

検証作業とかちゃんとできてないなぁという反省から nginx のベンチマークをとってみることに．
バージョンは現時点の Mainline version である 1.7.7 ．
担当サービス的に同時接続数を重視した．目標は同時接続数 10,000 ．
検証に使った仮想サーバの環境は以下のとおり．

OS	CPU	Mem
CentOS 6.4 64bit	4 Core	4 GB

ベンチマーク用nginxのコンパイルオプション

./configure \
--prefix=/usr/local/nginx-1.7.7 \
--pid-path=/var/run/nginx.pid \
--lock-path=/var/run/nginx.lock \
--user=nginx \
--group=nginx

なるべくプレーンなものになるようにした．

チューニング

同時接続数を増やしながら設定を煮詰めていく．

同時接続数 600 でエラー発生

[root@client ~]# ab -c 600 -n 6000 'http://10.100.47.58/10k.dat'

<<< snip >>>

Complete requests:      6000
Failed requests:        47
   (Connect: 0, Receive: 0, Length: 47, Exceptions: 0)

<<< snip >>>

nginx のエラーログ

2014/11/11 18:19:50 [crit] 6448#0: *46675 open() "/usr/local/nginx-1.7.7/html/10k.dat" failed (24: Too many open files), client: 10.100.37.239, server: localhost, request: "GET /10k.dat HTTP/1.0", host: "10.100.47.58"

プロセスがオープンできるファイルの数が上限に達した模様．

制限値を確認してみる

[root@benchnginx ~]# ps -aefww | grep 'worker'
nginx     6448  6447  0 18:03 ?        00:00:05 nginx: worker process
root      6504  1928  0 18:21 pts/0    00:00:00 grep worker

[root@benchnginx ~]# cat /proc/6448/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             30507                30507                processes
Max open files            1024                 4096                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       30507                30507                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Max open files 1024 なので足りてなさそう．

設定変更

@@ -8,6 +8,7 @@

 #pid        logs/nginx.pid;

+worker_rlimit_nofile  102400;

 events {
     worker_connections  1024;

nginx.confのworker_rlimit_nofileを設定．

反映

[root@benchnginx logs]# /usr/local/nginx-1.7.7/sbin/nginx -s reload

[root@benchnginx logs]# ps -aefww | grep worker
nginx     8856  6447  2 19:18 ?        00:00:14 nginx: worker process
root      9929  1928  0 19:30 pts/0    00:00:00 grep worker

[root@benchnginx logs]# cat /proc/8856/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             30507                30507                processes
Max open files            102400               102400               files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       30507                30507                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

反映されたことを確認できた．

同時接続数 1200 でエラー発生

[root@client ~]# ab -c 1200 -n 12000 'http://10.100.47.58/10k.dat'

<<< snip >>>

Complete requests:      12000
Failed requests:        12000
   (Connect: 0, Receive: 0, Length: 11840, Exceptions: 160)

<<< snip >>>

nginx のエラーログ

[root@benchnginx logs]# grep -v ' 200 ' access.log | tail

※出力なし

messages

[root@benchnginx logs]# tail /var/log/messages

※出力なし

ListenOverflows

[root@benchnginx logs]# cat /proc/net/netstat |
> awk '
>     {
>         if(NR == 1) label = $21
>         if(NR == 2) value = $21
>     }
>     END {
>         printf "%s: %d\n", label, value
>     }
> '
ListenOverflows: 1742

ListenOverflowsが出てる → 接続リクエストを受けきれなくなった模様．

設定変更

@@ -11,7 +11,7 @@
 worker_rlimit_nofile  4096;

 events {
-    worker_connections  1024;
+    worker_connections  10240;
 }

worker_connectionsの値を大きくした．
- このディレクティブは worker あたりの接続数だと思っていたけど，どうも nginx 全体の接続数っぽいので要検証．

同時接続数 10000 に到達

[root@client ~]# ab -c 10000 -n 100000 'http://10.100.47.58/10k.dat'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.100.47.58 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/1.7.7
Server Hostname:        10.100.47.58
Server Port:            80

Document Path:          /10k.dat
Document Length:        10240 bytes

Concurrency Level:      10000
Time taken for tests:   14.741 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      1078278186 bytes
HTML transferred:       1052579436 bytes
Requests per second:    6783.58 [#/sec] (mean)
Time per request:       1474.148 [ms] (mean)
Time per request:       0.147 [ms] (mean, across all concurrent requests)
Transfer rate:          71431.50 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1  900 1691.1    370    9572
Processing:    15  458 202.6    492    1500
Waiting:        5  245 148.9    243    1465
Total:         18 1358 1719.1    876   10424

Percentage of the requests served within a certain time (ms)
  50%    876
  66%    904
  75%   1176
  80%   1212
  90%   3758
  95%   4126
  98%   9617
  99%   9695
 100%  10424 (longest request)

この結果がチューニングした時の基準になりそう．

少しだけ最適化

@@ -1,6 +1,8 @@

 #user  nobody;
-worker_processes  1;
+worker_processes  4;
+
+worker_cpu_affinity 1000 0100 0010 0001;

 #error_log  logs/error.log;
 #error_log  logs/error.log  notice;

CPU コア数に合わせた設定にしてみた．

最適化後の結果

[root@client ~]# ab -c 10000 -n 100000 'http://10.100.47.58/10k.dat'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.100.47.58 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/1.7.7
Server Hostname:        10.100.47.58
Server Port:            80

Document Path:          /10k.dat
Document Length:        10240 bytes

Concurrency Level:      10000
Time taken for tests:   12.525 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      1077834884 bytes
HTML transferred:       1052139634 bytes
Requests per second:    7983.89 [#/sec] (mean)
Time per request:       1252.523 [ms] (mean)
Time per request:       0.125 [ms] (mean, across all concurrent requests)
Transfer rate:          84036.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       78  517 678.3    382    3559
Processing:   125  696 158.7    699    1718
Waiting:      108  359 113.3    358    1502
Total:        279 1213 695.2   1116    4340

Percentage of the requests served within a certain time (ms)
  50%   1116
  66%   1162
  75%   1180
  80%   1185
  90%   1210
  95%   3757
  98%   4075
  99%   4156
 100%   4340 (longest request)

最適化前後の比較

最適化前10回の平均値

Requests per second:    6706.79 [#/sec] (mean)

Connection Times (ms)
               mean[+/-sd] median max
Connect:        804 1574.1 354     9406
Processing:     510  231.3 561     1916
Waiting:        272  146.0 284     1617
Total:         1313 1617.7 958    10715

最適化後10回の平均値

Requests per second:    8700.57 [#/sec] (mean)

Connection Times (ms)
               mean[+/-sd] median max
Connect:        413 319.8   375   3322
Processing:     695 179.3   688   2732
Waiting:        350 130.7   343   2281
Total:         1108 377.4  1093   4489

結果を見比べてみると
- 一秒当たりに処理できるリクエスト数が約30%向上した．
- 接続にかかる時間が大幅に短縮された．
  - 複数 worker で処理することにより待ち時間が減ったためと思われる．
  - これにより一番時間が掛かったリクエストも 10715ms から 4489ms になっている．
- Processing と Waiting では逆に若干の性能劣化が見られる．
  - 複数 worker へ処理を振り分ける分のオーバーヘッドかな？
  - 標準偏差は若干良くなっているので処理の安定性は高そう．

2014-10-15

SSL 3.0 の脆弱性 POODLE の確認と対応

脆弱性 nginx Apache ZWS

脆弱性の内容

<a href="http://japan.zdnet.com/security/analysis/35055155/">グーグルのセキュリティチーム、SSL 3.0の脆弱性「POODLE」を説明</a>

グーグルのセキュリティチーム、SSL 3.0の脆弱性「POODLE」を説明 - ZDNet Japan

SSL 3.0 で接続できるかの確認

openssl コマンドのs_client, -ssl3オプションを使って確認できる．

接続できるとき

[root@hogehoge ~]# openssl s_client -connect 127.0.0.1:443 -ssl3 | cat -n

<<< snip >>>

    48  ---
    49  New, TLSv1/SSLv3, Cipher is RC4-SHA
    50  Server public key is 2048 bit
    51  Secure Renegotiation IS NOT supported
    52  Compression: NONE
    53  Expansion: NONE
    54  SSL-Session:
    55      Protocol  : SSLv3
    56      Cipher    : RC4-SHA
    57      Session-ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    58      Session-ID-ctx:
    59      Master-Key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    60      Key-Arg   : None
    61      Krb5 Principal: None
    62      PSK identity: None
    63      PSK identity hint: None
    64      Start Time: 1413353391
    65      Timeout   : 7200 (sec)
    66      Verify return code: 0 (ok)
    67  ---

<<< snip >>>

接続できないとき

[root@hogehoge ~]# openssl s_client -connect 127.0.0.1:443 -ssl3 | cat -n

<<< snip >>>

     8  ---
     9  New, (NONE), Cipher is (NONE)
    10  Secure Renegotiation IS NOT supported
    11  Compression: NONE
    12  Expansion: NONE
    13  SSL-Session:
    14      Protocol  : SSLv3
    15      Cipher    : 0000
    16      Session-ID:
    17      Session-ID-ctx:
    18      Master-Key:
    19      Key-Arg   : None
    20      PSK identity: None
    21      PSK identity hint: None
    22      SRP username: None
    23      Start Time: 1413352529
    24      Timeout   : 7200 (sec)
    25      Verify return code: 0 (ok)
    26  ---

<<< snip >>>

対応

nginx

http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_protocols
ssl_protocolsディレクティブにSSLv3を指定しない．

ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

Apache

http://httpd.apache.org/docs/2.2/mod/mod_ssl.html#sslprotocol
SSLProtocolでSSLv3を無効にする．

SSLProtocol All -SSLv2 -SSLv3

ZWS (Zeus Web Server)

使ってる人いないと思うけど一応．
global.cfgのディレクティブtuning!support_ssl3で無効にできる．

tuning!support_ssl3 no

2014-08-12

.bash_profile とか .bashrc とかbash起動ファイルの読み込み順

Linux bash

このへんの読み込み順ってすぐ忘れてしまうのでメモ．

シェル

OSのユーザーのためにインタフェースを提供するソフトウェアでカーネルのサービスへのアクセスを提供する．
OSの内部(カーネル)とユーザーの間にある外殻(シェル)であることから，このように呼ばれる．

bashの起動タイプ

オプションの組み合わせにより以下のような起動タイプを持つ．
- ログインシェル
- 対話シェル
起動タイプは排他ではなさそうなので「対話的なログインシェル」や「非対話的な非ログインシェル」みたいなものもある．

ログインシェル

0 番目の引き数(通常はプログラム名)の最初の文字が-であるシェル．
下の-bashの部分とかがそれ．

[root@www ~]# w
 11:46:55 up 12 days, 21:47,  3 users,  load average: 0.00, 0.01, 0.05
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     tty1      30 7月14 12days  0.12s  0.12s -bash

または--loginオプション付きで起動されたシェル．

対話シェル

以下の条件をすべて満たして起動されたシェル．
- オプションでない引数がない．
- 標準入力と標準エラー出力がいずれも端末に接続されている．
  - isatty(3) で調べられる．
- -cオプションが指定されていない．
または-iオプション付きで起動されたシェル．
bash が対話的に動作している場合には，PS1 が設定され，$-にiが含まれる．

起動ファイルの読み込み順

対話的/非対話的なログインシェル
- 最初に/etc/profileを読み込み．
- 次に
  ~/.bash_profile
  ~/.bash_login
  ~/.profile
  の順番で探し，最初に見つかったファイルを読み込む．
  おそらくこの3ファイル読み込みは排他．
  - --noprofileオプションを使ってこの動作を行わないようにできる．
- ログインシェルの終了時に~/.bash_logoutファイルがあればこれを読み込む．
ログインシェルでない対話的なシェル
- ~/.bashrcを読み込み．
  - --norcオプションを使ってこの動作を行わないようにできる．
  - --rcfile fileオプションを使うと，~/.bashrcからfileに変更することができる．

実際に確認してみた

各起動ファイルの先頭にechoを入れて呼び出されたことをわかるようにする．
起動タイプの組み合わせ毎に実際の読み込み順を確認．
確認環境は CentOS 7

ログインシェル，対話

--login でログインシェル
-i で対話シェル

[root@www ~]# bash --login -i
called /etc/profile
called /etc/profile.d/*.sh
called ~/.bash_profile
called ~/.bashrc
called /etc/bashrc

ログインシェル，非対話

--login でログインシェル
-c で非対話シェル
shopt でログインシェルかどうかの確認

[root@www ~]# bash --login -c shopt | grep -E 'called|login_shell'
called /etc/profile
called ~/.bash_profile
called ~/.bashrc
called /etc/bashrc
login_shell     on

非ログインシェル，対話

-i で対話シェル
shopt でログインシェルかどうかの確認

[root@www ~]# bash -i -c shopt | grep -E 'called|login_shell'
called ~/.bashrc
called /etc/bashrc
called /etc/profile.d/*.sh
login_shell     off

非ログインシェル，非対話

-c で非対話シェル
shopt でログインシェルかどうかの確認

[root@www19312ui ~]# bash -c shopt | grep -E 'called|login_shell'
login_shell     off

参考

2014-08-05

CentOS 7 から標準になった systemd に慣れる

Linux CentOS 7 systemd

CentOS 7 になって大きく変わったものに systemd がある．今までの SystemV init と勝手が全然違うのでメモしておく．

概要

SystemV init/upstartに替わるもの．
SystemV init スクリプトと互換性のあるLinux用システム・サービスマネージャ．
サービスの起動を積極的に並行化する．
サービス起動時にソケットとD-Busを有効にし必要なサービスの開始とcgroupsによる管理ができる．
自動マウント，マウントポイントの維持，依存に基づいたサービスのコントロール．

ユニット

ユニットには以下のようなものがある．
- サービス (.service)
- マウントポイント (.mount)
- デバイス (.device)
- ソケット (.socket)
systemctl コマンド使用時は拡張子を含むユニットファイルの完全な名前を指定する．
- 例
  sshd.socket

やりたいこと	実行するコマンド
実行中ユニットの一覧	# systemctl list-units
失敗したユニットの一覧	# systemctl list-units --failed
インストールされているユニットの一覧	# systemctl list-unit-files
ユニットを実行	# systemctl start <unit>
ユニットを停止	# systemctl stop <unit>
ユニットを再起動	# systemctl restart <unit>
ユニットにリロードさせる	# systemctl reload <unit>
ユニットのステータスを確認	# systemctl status <unit>
ユニットが自動起動か確認	# systemctl is-enabled <unit>
ユニットを自動起動させる	# systemctl enable <unit>
ユニットを自動起動させない	# systemctl disable <unit>
ユニットのマニュアルを参照する	# systemctl help <unit>
systemd のリロード	# systemctl daemon-reload

参考

https://wiki.archlinux.org/index.php/Systemd_%28%E6%97%A5%E6%9C%AC%E8%AA%9E%29

2014-03-08

CentOS 6.3 /var/tmp 配下のファイル/ディレクトリが消える

Linux

tmpdirを/var/tmp/mysqlに設定していたホットスタンバイのMySQLで/var/tmp/mysqlが消えてエラーになっていたのでメモ．

何が起きていたか

テーブルの定義をホットスタンバイのMySQLで確認しようとしたところ下記のようなエラーが発生．

mysql> desc hogehoge;
ERROR 1 (HY000): Can't create/write to file '/var/tmp/mysql/#sql_2b8a_0.MYI' (Errcode: 2)

tmpdirに設定していた/var/tmp/mysqlが消えていることに気付く

tmpwatch が消していた

すこし調べたところデフォルトでcronに登録されている/etc/cron.daily/tmpwatchが/var/tmp配下のatime/mtime/ctimeが30日以上古いファイル/ディレクトリを削除していた．

flags=-umc
/usr/sbin/tmpwatch "$flags" 30d /var/tmp

/etc/cron.daily/tmpwatchから抜粋
-uはatimeを-mはmtimeを-cはctimeをチェックの対象とするオプション

ホットスタンバイで全く使われなかったためにこの対象になり消されてしまったみたい．ミドルウェアごとに分けた方がわかりやすいと思ってtmpdirを/var/tmp/mysqlに設定していたけど，素直にデフォルトの/tmpにした方が良かったということで．

2014-02-20

nginxのproxy_read_timeoutはupstreamにも注意

nginx

proxy_read_timeoutが思ったような挙動ではなかったので実験してみた．

想定していた挙動

こんな設定のnginxに対して

upstream server_pool_a {
    server web001:80 max_fails=3 fail_timeout=30s;
    server web002:80 max_fails=3 fail_timeout=30s;
}

proxy_read_timeout 60

無限ループするようなCGIをプロキシさせたら

web001にリクエスト
↓
60秒以内にレスポンスが完了しない
↓
504 Gateway Time-out 発生

となると思っていた．

実際の挙動

ところが実際には以下のような挙動だった．

web001にリクエスト
↓
60秒以内にレスポンスが完了しない
↓
web002にリクエスト
↓
60秒以内にレスポンスが完了しない
↓
504 Gateway Time-out 発生

upstream先のサーバでwhileしてプロセスの状況をみてみた．
web001にリクエストして60秒(proxy_read_timeout)後にweb002にもリクエストしていることがわかる．

web001

＊＊＊　前略　＊＊＊

2014年  2月 20日 木曜日 19:33:24 JST

2014年  2月 20日 木曜日 19:33:25 JST
10000    26423 20501  0 19:33 ?        00:00:00 /usr/local/bin/perl chk_loop.cgi

＊＊＊　後略　＊＊＊

web002

＊＊＊　前略　＊＊＊

2014年  2月 20日 木曜日 19:34:23 JST

2014年  2月 20日 木曜日 19:34:24 JST
10000    29708 25141  0 19:34 ?        00:00:00 /usr/local/bin/perl chk_loop.cgi

＊＊＊　後略　＊＊＊

まとめ

proxy_read_timeoutに設定した時間が経過した後，upstreamに他のサーバがいるときは順にリクエストを投げていく．
upstreamにあるサーバに順にリクエストしていき，すべてのサーバでproxy_read_timeoutした時にようやく504 Gateway Time-outが発生する．
処理が終わるまで動き続けるようなCGIだと当然リクエスト先で動き続けることになるので注意．
- また，1リクエストだと思っていたものが実際は複数リクエストになってしまっていて，意図しないリクエストが発生していることにも注意．
  - upstream先でCGIが確実にタイムアウト(強制終了)するようにして，それより長いproxy_read_timeoutを設定するとよいかも？

worker_processes 8, worker_connections 256 のとき

いままでの理解

確認

worker_processes 1, worker_connections 2048 のとき

まとめ

ベンチマーク用nginxのコンパイルオプション

チューニング

同時接続数 600 でエラー発生

nginx のエラーログ

制限値を確認してみる

設定変更

反映

同時接続数 1200 でエラー発生

nginx のエラーログ

messages

ListenOverflows

設定変更

同時接続数 10000 に到達

少しだけ最適化

最適化後の結果

最適化前後の比較

脆弱性の内容

SSL 3.0 で接続できるかの確認

接続できるとき

接続できないとき

対応

nginx

ZWS (Zeus Web Server)

シェル

bashの起動タイプ

ログインシェル

対話シェル

起動ファイルの読み込み順

実際に確認してみた

ログインシェル，対話

ログインシェル，非対話

非ログインシェル，対話

非ログインシェル，非対話

参考

概要

ユニット

関連操作

参考

何が起きていたか

tmpwatch が消していた

想定していた挙動

実際の挙動

まとめ