上傳大文件時偶爾出現 mod_jk 客戶端錯誤
**注意:**此問題末尾有一個 TL;DR。
我有一個夜間備份腳本,它已經可靠地工作了 6 年,兩個月前,它開始無法上傳到我的存檔伺服器(只有大約四分之一的嘗試成功)。一旦備份文件的大小超過大約 440MB(成功上傳需要 20-40 秒才能完成),這種情況就開始發生。其他較小的文件 (>200MB) 每次都成功上傳。故障的開始與有問題的伺服器的任何配置更改或更新不一致。
以下是我正在上傳的文件的詳細資訊:
-rw-r--r-- 1 backup backup 486948951 Jun 12 13:20 snapshot-20190612.zip
這些文件通過 curl 上傳到我的伺服器
PUT
(請參見下面的命令)。源伺服器是弗吉尼亞州的 Amazon Linux 實例,目標伺服器是俄亥俄州的 Amazon Linux 實例(均位於美國東部地區)。處理上傳的 Web 服務是Apache HTTPD 2.4.39 + mod24_ssl/OpenSSL 1.0.2k -> mod_jk 1.2.42 -> Apache Tomcat 8.5.40 -> Artifactory 5.3.2實例。
我在其中一個錯誤發生之前和之後擷取了jk_status頁面(之前/之後在動畫 GIF 中以突出差異)。該錯誤被mod_jk解釋為客戶端錯誤。
我的 mod/jk workers.properties文件:
worker.list=main_ajp13_worker # Set properties for main_ajp13_worker (ajp13) worker.main_ajp13_worker.type=ajp13 worker.main_ajp13_worker.host=localhost worker.main_ajp13_worker.port=8009
以下是上傳失敗後的控制台輸出(已清除授權詳細資訊),然後是成功上傳:
-bash-4.2$ curl -v -k -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip * Trying 123.222.100.132... * TCP_NODELAY set * Connected to repo.mydomain.org (123.222.100.132) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.mydomain.org * start date: May 28 00:00:00 2019 GMT * expire date: May 27 23:59:59 2021 GMT * issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA * SSL certificate verify ok. * Server auth using Basic with user 'backup' > PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1 > Host: repo.mydomain.org > Authorization: Basic DELETED_TOKEN > User-Agent: curl/7.61.1 > Accept: */* > X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 > X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 > Content-Length: 486948951 > Expect: 100-continue > < HTTP/1.1 100 Continue * OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104 * Closing connection 0 curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104 -bash-4.2$ curl -v -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip * Trying 123.222.100.132... * TCP_NODELAY set * Connected to repo.mydomain.org (123.222.100.132) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.mydomain.org * start date: May 28 00:00:00 2019 GMT * expire date: May 27 23:59:59 2021 GMT * issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA * SSL certificate verify ok. * Server auth using Basic with user 'backup' > PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1 > Host: repo.mydomain.org > Authorization: Basic DELETED_TOKEN > User-Agent: curl/7.61.1 > Accept: */* > X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 > X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 > Content-Length: 486948951 > Expect: 100-continue > < HTTP/1.1 100 Continue * OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104 * Closing connection 0 curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
通過第二次(或者通常是第三次或第四次)嘗試上傳,它最終會成功:
-bash-4.2$ curl -v -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip * Trying 123.222.100.132... * TCP_NODELAY set * Connected to repo.mydomain.org (123.222.100.132) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.mydomain.org * start date: May 28 00:00:00 2019 GMT * expire date: May 27 23:59:59 2021 GMT * issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA * SSL certificate verify ok. * Server auth using Basic with user 'backup' > PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1 > Host: repo.mydomain.org > Authorization: Basic DELETED_TOKEN > User-Agent: curl/7.61.1 > Accept: */* > X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 > X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 > Content-Length: 486948951 > Expect: 100-continue > < HTTP/1.1 100 Continue * We are completely uploaded and fine < HTTP/1.1 201 201 < Date: Wed, 12 Jun 2019 13:22:46 GMT < Server: Apache < Strict-Transport-Security: max-age=15768000 < X-Artifactory-Id: 9e49bf90cd4b7700:7ba45cf1:16b1db56e4d:-8000 < Location: https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip < Transfer-Encoding: chunked < Content-Type: application/vnd.org.jfrog.artifactory.storage.itemcreated+json;charset=ISO-8859-1 < { "repo" : "resource-snapshot", "path" : "/snapshot/20190612/snapshot-20190612.zip", "created" : "2019-06-12T13:23:05.805Z", "createdBy" : "backup", "downloadUri" : "https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip", "mimeType" : "application/zip", "size" : "486948951", "checksums" : { "sha1" : "6b9525bf9bb572ec7f70d0d474cb6533e0091bd3", "md5" : "6da0c444ad1b53dc1cf06784e204cb10" }, "originalChecksums" : { "sha1" : "6b9525bf9bb572ec7f70d0d474cb6533e0091bd3", "md5" : "6da0c444ad1b53dc1cf06784e204cb10" }, "uri" : "https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip" * Connection #0 to host repo.mydomain.org left intact }
伺服器日誌不是特別有用。以下是上傳失敗時的相關日誌:
/var/log/httpd/ssl_request_log
[24/Jun/2019:12:17:45 +0000] 52.1.100.190 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1" -
/var/log/httpd/repo.mydomain.org-ssl_access_log
140.1.100.101 - - [24/Jun/2019:12:17:45 +0000] "PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1" 400 -
catalina.out
沒有日誌條目。
/var/log/httpd/mod_jk.log
這些是 mod/jk 日誌級別設置為info時的日誌:
[Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] ajp_read_into_msg_buff::jk_ajp_common.c (1553): (main_ajp13_worker) receiving data from client failed. Connection aborted or network problems [Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] ajp_process_callback::jk_ajp_common.c (2105): (main_ajp13_worker) Reading from client aborted or client network problems [Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] ajp_service::jk_ajp_common.c (2778): (main_ajp13_worker) sending request to tomcat failed (unrecoverable), because of client read error (attempt=1) [Mon Jun 24 12:18:05 2019] [15679:140534117611584] [info] jk_handler::mod_jk.c (2988): Aborting connection for worker=main_ajp13_worker
這是調試級別的 240 萬個 mod_jk 日誌條目的相關部分,從 HTTP 100 Continue 開始並顯示文件上傳的最開始(您可以看到 zip 文件的幻數,50 4B 03 04 ):
[Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0160 32 65 32 00 A0 08 00 09 34 39 36 32 37 38 36 30 - 2e2.....49627860 [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0170 33 00 00 06 45 78 70 65 63 74 00 00 0C 31 30 30 - 3...Expect...100 [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0180 2D 63 6F 6E 74 69 6E 75 65 00 0A 00 0F 41 4A 50 - -continue....AJP [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0190 5F 52 45 4D 4F 54 45 5F 50 4F 52 54 00 00 05 33 - _REMOTE_PORT...3 [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01a0 32 39 37 30 00 0A 00 0E 41 4A 50 5F 4C 4F 43 41 - 2970....AJP_LOCA [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01b0 4C 5F 41 44 44 52 00 00 0C 31 37 32 2E 33 31 2E - L_ADDR...172.31. [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01c0 31 33 2E 39 36 00 0A 00 10 4A 4B 5F 4C 42 5F 41 - 13.96....JK_LB_A [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01d0 43 54 49 56 41 54 49 4F 4E 00 00 03 41 43 54 00 - CTIVATION...ACT. [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01e0 0A 00 0F 53 53 4C 5F 43 4C 49 45 4E 54 5F 43 45 - ...SSL_CLIENT_CE [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 01f0 52 54 00 00 0F 53 53 4C 5F 43 4C 49 45 4E 54 5F - RT...SSL_CLIENT_ [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0200 43 45 52 54 00 FF 00 00 00 00 00 00 00 00 00 00 - CERT............ [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_send_request::jk_ajp_common.c (1782): (main_ajp13_worker) request body to send 496278603 - request body to resend 0 [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_send_request::jk_ajp_common.c (1886): (main_ajp13_worker) sending 8186 bytes of request body [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): sending to ajp13 pos=4 len=8192 max=8192 [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0000 12 34 1F FC 1F FA 50 4B 03 04 14 00 00 00 08 00 - .4....PK........ [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0010 84 61 D8 4E 73 67 93 AF 7C 0A 00 00 E0 B0 00 00 - .a.Nsg..|....... [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0020 0A 00 1C 00 62 61 63 6B 75 70 2E 6C 6F 67 55 54 - ....backup.logUT [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0030 09 00 03 18 BE 10 5D 90 BD 10 5D 75 78 0B 00 01 - ......]...]ux... [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0040 04 1A 00 00 00 04 1A 00 00 00 ED 5C 4D 6F DC C8 - ...........\Mo.. [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0050 11 BD E7 57 34 74 4A 80 D8 80 77 17 8B 45 80 1C - ...W4tJ...w..E.. .... [lots of encoded data trimmed out here, the next part shows re-filling the 8k buffer] .... [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): received from ajp13 pos=0 len=3 max=8192E D7 E9 E6 DD BB - 8...U?.er.>..... [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): 00000 06 1F FA 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................. [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): sending to ajp13 pos=4 len=8192 max=8192 2E 40 41 4C 94 - ..$z~YW.?1G.@AL. [Mon Jun 24 14:37:08 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 0000 12 34 1F FC 1F FA C9 62 3E 48 C5 70 E9 05 E1 8D - .4.....b>H.p.... .... [lots of encoded data trimmed out here, the next part shows where re-filling the 8k buffer fails] .... [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 03e0 2D E5 36 F3 8C 12 8C 51 69 66 21 AE A5 58 BE 6E - -.6....Qif!..X.n [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_send_message::jk_ajp_common.c (1267): 03f0 57 F3 F4 EC CB E9 49 9B 59 44 AB FE 66 6E 67 DC - W.....I.YD..fng. [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): received from ajp13 pos=0 len=3 max=8192 [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_connection_tcp_get_message::jk_ajp_common.c (1462): 0000 06 1F FA 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] ajp_read_into_msg_buff::jk_ajp_common.c (1553): (main_ajp13_worker) receiving data from client failed. Connection aborted or network problems [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] ajp_process_callback::jk_ajp_common.c (2105): (main_ajp13_worker) Reading from client aborted or client network problems [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] ajp_service::jk_ajp_common.c (2778): (main_ajp13_worker) sending request to tomcat failed (unrecoverable), because of client read error (attempt=1) [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_reset_endpoint::jk_ajp_common.c (851): (main_ajp13_worker) resetting endpoint with socket 24 (socket shutdown) [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_abort_endpoint::jk_ajp_common.c (821): (main_ajp13_worker) aborting endpoint with socket 24 [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] jk_shutdown_socket::jk_connect.c (932): About to shutdown socket 24 [127.0.0.1:38678 -> 127.0.0.1:8009] [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] jk_shutdown_socket::jk_connect.c (1005): shutting down the read side of socket 24 [127.0.0.1:38678 -> 127.0.0.1:8009] [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] jk_shutdown_socket::jk_connect.c (1016): Shutdown socket 24 [127.0.0.1:38678 -> 127.0.0.1:8009] and read 265 lingering bytes in 0 sec. [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [debug] ajp_done::jk_ajp_common.c (3287): recycling connection pool for worker main_ajp13_worker and socket -1 [Mon Jun 24 14:37:28 2019] [17826:139891516274752] [info] jk_handler::mod_jk.c (2988): Aborting connection for worker=main_ajp13_worker [Mon Jun 24 14:38:24 2019] [17825:139891516274752] [debug] wc_shutdown::jk_worker.c (390): Shutting down worker main_ajp13_worker [Mon Jun 24 14:38:24 2019] [17825:139891516274752] [debug] jk_shm_close::jk_shm.c (745): Closed shared memory /var/log/httpd/mod_jk.shm.17823 childs=9 [Mon Jun 24 14:38:24 2019] [17877:139891516274752] [debug] wc_shutdown::jk_worker.c (390): Shutting down worker main_ajp13_worker
查看AJPv3 文件,我們看到了 8k 塊的解釋:
容器從請求中請求更多數據(如果主體太大而無法放入發送的第一個數據包或請求被分塊時)。伺服器將發回一個正文數據包,其中包含的數據量是 request_length、最大發送正文大小 (8186 (8 Kbytes - 6)) 和實際剩餘要從請求正文發送的字節數中的最小值。如果 body 中沒有更多數據(即 servlet 容器試圖讀取超過 body 的末尾),伺服器將發回一個“空”數據包,這是一個有效負載長度為 0 的 body 數據包。( 0x12,0x34,0x00,0x00)
在上述日誌中,我統計了重新填充 8k 緩衝區的次數(36,856 次),發現在錯誤發生之前傳輸了 301,703,216 字節(287 MB,或 62%)的文件。在反复嘗試時,這個百分比變化很大,但通常超過 300MB,有時會成功完成所有 440MB。
我對以上日誌的解讀
SSL 握手成功,錯誤發生在 HTTP 100 Continue 之後,所以我懷疑該文件在mod_jk.log中的以下消息之前已部分發送:
- 從客戶端接收數據失敗。連接中止或網路問題
- 由於客戶端讀取錯誤(嘗試=1),向 tomcat 發送請求失敗(不可恢復)
由於catalina.out沒有錯誤,並且 mod_jk 抱怨客戶端讀取錯誤,我相信這排除了 Tomcat 本身(以及 Web 應用程序Artifactory)。
我通過Google搜尋看到的大多數關於上述 mod_jk 消息的評論都表明它們是暫時的網路錯誤,但我不買它,因為這些是同一區域中 EC2 實例之間的文件傳輸。我也將此解釋為與連接超時沒有任何關係,因為(我認為)會產生“連接在 n 毫秒後超時”消息。
更改 mod_jk 和 Tomcat 配置
我嘗試過的其他事情(這些變化是累積的,結果沒有明顯的變化):
- 將 mod_jk 升級到最新版本 (1.2.46)
- 將工作包大小從 8,196(預設)增加到 16,384
- 工作數據包大小從 16,384 增加到 32,768
- 將 JkShmSize 從 1kB(預設)增加到 1GB(哎呀,值以 kB 為單位)
- 將 JkShmSize 從 1GB 減少到 1MB
- 將工作人員 JkShmSize 返回到預設值
- 將工作包大小返回為預設值
- 將工作重試次數從 2(預設)增加到 5
- 將 reply_timeout 從永遠(預設)減少到 1000 毫秒(這實際上在成功上傳期間導致底層服務出現一些問題)
- 返回的工人重試預設值
- 將 reply_timeout 返回到預設值
- 將 socket_keepalive 從 false(預設)更改為 true
- 將 socket_keepalive 恢復為預設值
- 將連接器協議從 org.apache.coyote.http11.Http11NioProtocol 更改為 org.apache.coyote.http11.Http11Nio2Protocol
- 將 Tomcat 連接超時從 20 秒 (20000) 增加到 60 秒 (60000)
- 將 Tomcat connectionUploadTimeout 設置為 60 秒 (60000)
- 將 Tomcat disableUploadTimeout 設置為 false
- 將 connectionUploadTimeout 和 disableUploadTimeout 返回為預設值
- 通過 -H “Transfer-Encoding:chunked” 強制分塊傳輸
- 將worker屬性設置為:retries=2,socket_timeout=100,reply_timeout=30000
- 將工作人員屬性恢復為預設值。
- 在 JkOptions 中添加了 +DisableReuse。
- 將 JkOptions 恢復為預設值。
- 為 repo 和 repo-access 虛擬主機創建了單獨的工作人員。
- 刪除了非 SSL 虛擬主機條目,只留下兩個 SSL 虛擬主機。
注意:分離工作人員至少隔離了工作人員消息,所以我現在知道錯誤完全發生在 repo web 服務上,而不是 repo-access web 服務上:
[Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] ajp_read_into_msg_buff::jk_ajp_common.c (1550): (repo) receiving data from client failed. Connection aborted or network problems [Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] ajp_process_callback::jk_ajp_common.c (2102): (repo) Reading from client aborted or client network problems [Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] ajp_service::jk_ajp_common.c (2775): (repo) sending request to tomcat failed (unrecoverable), because of client read error (attempt=1) [Sat Jun 29 16:16:55 2019] [6174:139690136819776] [info] jk_handler::mod_jk.c (2984): Aborting connection for worker=repo
更改 HTTP(S) 客戶端
我能夠使用 wget 而不是 curl 重現該問題,這意味著這不是客戶端軟體的一些深奧問題:
-bash-4.2$ wget -v --tries=1 --user=backup --password=PASSWORD --header=X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 --header=X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 --method=PUT --body-file=snapshot-20190612.zip https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip --2019-06-24 19:59:16-- https://repo.mydomain.org/resource-snapshot/snapshot/20190612/snapshot-20190612.zip Resolving repo.mydomain.org (repo.mydomain.org)... 123.222.100.132 Connecting to repo.mydomain.org (repo.mydomain.org)|123.222.100.132|:443... connected. HTTP request sent, awaiting response... 401 401 Authentication selected: Basic realm="Artifactory Realm" Reusing existing connection to repo.mydomain.org:443. Giving up.
在本地主機上測試
我已經在存檔伺服器(https://localhost:443/)上測試了從本地主機上傳,這需要將
-k
標誌添加到 curl 以跳過證書主機名驗證,並且每次上傳都成功(11秒)。但是,添加--limit-rate 20M
會導致每次上傳都失敗(它適用於--limit-rate 25M
)。繞過 Apache HTTPD、mod_jk 和 SSL 有效
如果我繞過 Apache HTTPD、mod_jk 和 SSL 並直接通過埠 8080,上傳總是會成功:
[ec2-user@repo ~]$ curl -v -k --limit-rate 1024K -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip http://localhost:8080/resource-snapshot/snapshot/20190612/snapshot-20190612.zip * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 8080 (#0) * Server auth using Basic with user 'backup' > PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1 > Host: localhost:8080 > Authorization: Basic DELETED_TOKEN > User-Agent: curl/7.61.1 > Accept: */* > X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 > X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 > Content-Length: 496278603 > Expect: 100-continue > < HTTP/1.1 100 * We are completely uploaded and fine < HTTP/1.1 201 < Server: Artifactory/5.3.2 ...
僅繞過SSL失敗
Going through Apache HTTPD + mod_jk without SSL, I am still able to replicate the problem. This means that the problem has nothing to do with SSL.
[ec2-user@repo ~]$ curl -v -k --limit-rate 1024K -u backup:PASSWORD -H X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 -H X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 -T snapshot-20190612.zip http://localhost/resource-snapshot/snapshot/20190612/snapshot-20190612.zip * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 8080 (#0) * Server auth using Basic with user 'backup' > PUT /resource-snapshot/snapshot/20190612/snapshot-20190612.zip HTTP/1.1 > Host: localhost:8080 > Authorization: Basic DELETED_TOKEN > User-Agent: curl/7.61.1 > Accept: */* > X-Checksum-Sha1:6b9525bf9bb572ec7f70d0d474cb6533e0091bd3 > X-Checksum-Md5:6da0c444ad1b53dc1cf06784e204cb10 > Content-Length: 496278603 > Expect: 100-continue > < HTTP/1.1 100 Continue * Recv failure: Connection reset by peer * Closing connection 0 curl: (56) Recv failure: Connection reset by peer
Bypassing HTTPD and mod_jk
I tried bypassing Apache HTTPD and mod_jk by uploading directly to Tomcat via port 8443 (still using SSL). These requests succeeded every time. This leads me to believe that the problem is with Apache HTTPD and/or mod_jk, and is not caused by “transient network errors” or a firewall.
TL;DR
The upload failures only occur when going through HTTPD + mod_jk, not when communicating with Tomcat directly. SSL has been eliminated as a potential source of the problem. This is not a network problem because the error is reproducible when uploading from localhost. The issue appears to be related only to the amount of time it takes to transfer the file, not the size of the file.
我的問題
我應該為 HTTPD 和/或mod_jk、workers.properties甚至Tomcat尋找任何配置選項來消除這些問題嗎?
或者,有沒有辦法告訴mod_jk嘗試以容錯方式處理這些失敗的緩衝區重新填充,以便上傳可以成功完成?
您是否嘗試將 Apache/modSSL 降級到 2.4.38?
自從我們升級到 2.4.39 後,我們遇到了文件上傳問題。