Ubuntu
無法弄清楚 NGINX / Puma 超時 110:連接超時
每隔一段時間,我的 Rails 伺服器就會停止響應。我跟踪日誌,什麼都沒有——只是一堆請求在最後一個完成後開始啟動……
使用
- Ubuntu
- 彪馬 2.15.3
- 紅寶石 2.2.1p85
- NGINX 1.4.6
- 主管(用於執行/重生 Rails 伺服器和 sidekiq)
Rails 日誌…
Completed 200 OK in 179ms (Views: 6.2ms | ActiveRecord: 165.7ms) Started OPTIONS "/api/v2/users/me" for 209.217.218.34 at 2015-12-16 21:43:47 +0000 Started OPTIONS "/api/v2/users/me" for 209.217.218.34 at 2015-12-16 21:44:21 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 21:45:11 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 21:45:27 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 21:48:13 +0000 Started OPTIONS "/api/v2/users/me" for 209.217.218.34 at 2015-12-16 21:49:19 +0000 Started GET "/" for 209.217.218.34 at 2015-12-16 22:11:58 +0000
NGINX 日誌顯示讀取響應標頭時發生的明顯超時
2015/12/16 21:48:43 [error] 1129#0: *1777 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 209.217.218.34, server: example.com, request: "GET / HTTP/1.1", upstream: "http://unix:/example_site/shared/tmp/sockets/puma.staging.sock/", host: "example.com" 2015/12/16 21:49:49 [error] 1129#0: *1783 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 209.217.218.34, server: example.com, request: "OPTIONS /api/v2/users/me HTTP/1.1", upstream: "http://unix:/example_site/shared/tmp/sockets/puma.staging.sock/api/v2/users/me", host: "example.com", referrer: "http://example.com/" 2015/12/16 22:12:28 [error] 1129#0: *1785 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 209.217.218.34, server: example.com, request: "GET / HTTP/1.1", upstream: "http://unix:/example_site/shared/tmp/sockets/puma.staging.sock/", host: "example.com"
NGINX 伺服器代理傳遞給 puma 伺服器,該伺服器有 2 個工作人員和 8 個執行緒。
我在其中一個執行緒上執行了 strace 命令。並且看到了這個…
[pid 32463] futex(0x7f0314c27a78, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000042> [pid 32463] clock_gettime(CLOCK_MONOTONIC, {454145, 535389460}) = 0 <0.000034> [pid 32463] write(16, "p32428\n", 7) = 7 <0.000036> [pid 32463] clock_gettime(CLOCK_MONOTONIC, {454145, 535541983}) = 0 <0.000013> [pid 32463] clock_gettime(CLOCK_MONOTONIC, {454145, 535578664}) = 0 <0.000012> [pid 32463] futex(0x7f0314c27a04, FUTEX_WAIT_BITSET_PRIVATE, 5195, {454150, 535578664}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000163> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000010> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454146, 119592840}) = 0 <0.000005> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454146, 119637474}) = 0 <0.000006> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454146, 119655894}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25969, {454147, 119655894}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000142> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000013> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454146, 531287803}) = 0 <0.000006> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454146, 531415055}) = 0 <0.000049> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454146, 531502589}) = 0 <0.000037> [pid 32470] futex(0x7f0314dc9464, FUTEX_WAIT_BITSET_PRIVATE, 25969, {454147, 531502589}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000166> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000009> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454147, 119925263}) = 0 <0.000005> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454147, 119965172}) = 0 <0.000014> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454147, 119996627}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25971, {454148, 119996627}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000142> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000046> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454147, 531936925}) = 0 <0.000033> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454147, 532077784}) = 0 <0.000042> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454147, 532160473}) = 0 <0.000036> [pid 32470] futex(0x7f0314dc9464, FUTEX_WAIT_BITSET_PRIVATE, 25971, {454148, 532160473}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000143> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000010> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454148, 120250641}) = 0 <0.000004> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454148, 120296848}) = 0 <0.000014> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454148, 120329244}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25973, {454149, 120329244}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000130> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000010> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454148, 532433899}) = 0 <0.000005> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454148, 532527808}) = 0 <0.000015> [pid 32470] clock_gettime(CLOCK_MONOTONIC, {454148, 532562815}) = 0 <0.000006> [pid 32470] futex(0x7f0314dc9464, FUTEX_WAIT_BITSET_PRIVATE, 25973, {454149, 532562815}, ffffffff <unfinished ...> [pid 32471] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000163> [pid 32471] futex(0x7f03148dc858, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000013> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454149, 120641924}) = 0 <0.000006> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454149, 120692457}) = 0 <0.000010> [pid 32471] clock_gettime(CLOCK_MONOTONIC, {454149, 120715154}) = 0 <0.000005> [pid 32471] futex(0x7f03148dc7e4, FUTEX_WAIT_BITSET_PRIVATE, 25975, {454150, 120715154}, ffffffff <unfinished ...> [pid 32470] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000141> [pid 32470] futex(0x7f0314dc94d8, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000009>
nginx.conf
user ubuntu; worker_processes auto; pid /run/nginx.pid; env PATH; events { worker_connections 768; } http { sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 70; types_hash_max_size 2048; ssl_session_cache shared:SSL:10m; ssl_session_timeout 10m; include /etc/nginx/mime.types; default_type application/octet-stream; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; gzip on; gzip_disable "msie6"; include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; }
站點配置文件
upstream app { server unix:/example_site/shared/tmp/sockets/puma.production.sock fail_timeout=0; } server { listen 80; listen 443 ssl; sendfile off; server_name example.com; ssl_certificate /etc/nginx/ssl/example.com.crt; ssl_certificate_key /etc/nginx/ssl/example.com.key; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # don’t use SSLv3 ref: POODLE root /example_site/current/public; try_files $uri/index.html $uri @app; location @app { proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_read_timeout 30; proxy_connect_timeout 60; proxy_redirect off; proxy_pass http://app; } error_page 500 502 503 504 /500.html; client_max_body_size 4G; keepalive_timeout 10; }
我對使用 strace 非常陌生,並且不確定我在這裡查看的內容以及這對解決問題是否有價值。
任何人都對發生的事情有任何想法,我還能做些什麼來弄清楚超時發生了什麼?
您看到的 nginx 錯誤來自 nginx 關閉連接,因為它花費了太長時間,或者上游伺服器不接受應有的連接,或者您的代理沒有轉發上游伺服器連接所需的某些標頭。
在沒有看到您的代理塊和 nginx 的相關設置的情況下,這裡有一些快速檢查的事情。
你可以做的快速的事情:
- 仔細檢查 /etc/nginx/nginx.conf 中的 keepalive_timeout。
- 在本地機器上為上游伺服器添加主機文件條目,並確保每個連接都完成並且不會超時。
- 確保您將 nginx 代理上的正確標頭轉發回您的上游伺服器。
這是 nginx 的代理傳遞指令的連結:Nginx http_proxy_module
更新:
嘗試執行
nginx -t
,看看它是否會告訴您配置是否有任何問題,因為現在我再看一遍後最突出的一件事是keepalive_timeout
嘗試將您的更改keepalive_timeout 70;
為keepalive_timeout 70s;