TCP 連接過多導致斷開連接
我有一個使用 TCP 連接執行的遊戲伺服器。伺服器隨機斷開使用者。我認為它與伺服器的 TCP 設置有關。
在本地開發環境中,編寫的程式碼可以處理 8000 多個並髮使用者,而不會出現任何斷開連接或錯誤(在本地主機上)。
但在實際部署的 Centos 5 64 位伺服器中,伺服器創建這些斷開連接獨立於並發 tcp 連接量。
伺服器似乎無法處理吞吐量。
netstat -s -t IcmpMsg: InType0: 31 InType3: 87717 InType4: 699 InType5: 2 InType8: 1023781 InType11: 7211 OutType0: 1023781 OutType3: 603 Tcp: 8612766 active connections openings 14255236 passive connection openings 12174 failed connection attempts 319225 connection resets received 723 connections established 6351090913 segments received 6180297746 segments send out 45791634 segments retransmited 0 bad segments received. 1664280 resets sent TcpExt: 46244 invalid SYN cookies received 3745 resets received for embryonic SYN_RECV sockets 327 ICMP packets dropped because they were out-of-window 1 ICMP packets dropped because socket was locked 11475281 TCP sockets finished time wait in fast timer 140 time wait sockets recycled by time stamp 1569 packets rejects in established connections because of timestamp 103783714 delayed acks sent 6929 delayed acks further delayed because of locked socket Quick ack mode was activated 6210096 times 1806 times the listen queue of a socket overflowed 1806 SYNs to LISTEN sockets ignored 1080380601 packets directly queued to recvmsg prequeue. 31441059 packets directly received from backlog 5272599307 packets directly received from prequeue 324498008 packets header predicted 1143146 packets header predicted and directly queued to user 3217838883 acknowledgments not containing data received 1027969883 predicted acknowledgments 395 times recovered from packet loss due to fast retransmit 257420 times recovered from packet loss due to SACK data 5843 bad SACKs received Detected reordering 29 times using FACK Detected reordering 12 times using SACK Detected reordering 1 times using reno fast retransmit Detected reordering 809 times using time stamp 1602 congestion windows fully recovered 1917 congestion windows partially recovered using Hoe heuristic TCPDSACKUndo: 8196226 7850525 congestion windows recovered after partial ack 139681 TCP data loss events TCPLostRetransmit: 26 10139 timeouts after reno fast retransmit 2802678 timeouts after SACK recovery 86212 timeouts in loss state 273698 fast retransmits 19494 forward retransmits 2637236 retransmits in slow start 33381883 other TCP timeouts TCPRenoRecoveryFail: 92 19488 sack retransmits failed 7 times receiver scheduled too late for direct processing 6354641 DSACKs sent for old packets 333 DSACKs sent for out of order packets 20615579 DSACKs received 2724 DSACKs for out of order packets received 123034 connections reset due to unexpected data 91876 connections reset due to early user close 169244 connections aborted due to timeout 28736 times unabled to send RST due to no memory IpExt: InMcastPkts: 2
讓我想到的是,這些似乎很有問題。
123034 connections reset due to unexpected data 91876 connections reset due to early user close 28736 times unabled to send RST due to no memory
我該如何解決這些錯誤?我需要進行 TCP 調整嗎?
**編輯:**一些 sysctl 資訊:
sysctl -A | grep net | grep mem net.ipv4.udp_wmem_min = 4096 net.ipv4.udp_rmem_min = 4096 net.ipv4.udp_mem = 772704 1030272 1545408 net.ipv4.tcp_rmem = 4096 87380 4194304 net.ipv4.tcp_wmem = 4096 16384 4194304 net.ipv4.tcp_mem = 196608 262144 393216 net.ipv4.igmp_max_memberships = 20 net.core.optmem_max = 20480 net.core.rmem_default = 129024 net.core.wmem_default = 129024 net.core.rmem_max = 131071 net.core.wmem_max = 131071
編輯: 2 個檢測到的乙太網卡的 ethtool 資訊:
Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! Duplex: Half Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: no
你增加FD限額嗎?您可以在此處獲取一些資訊http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
如果您的意思是客戶端在沒有預期的 FIN、ACK、RST 通信的情況下斷開連接,那麼“伺服器會隨機斷開使用者連接”,我將首先解決半雙工介面,特別是如果您的開發環境中的兩個 NIC 都是全雙工的。eth1 介面在 Auto-negotiation=on 時處於半雙工狀態通常是由以下任一原因引起的:
- 交換機和伺服器之間的自動協商失敗。
- 禁用自動協商的交換機,明確設置埠的速度和雙工。
我在情況 #2 中更頻繁地看到它,但這可能是因為自從我故意發現自動協商失敗檢查以來已經十多年了。當一側為自動且另一側為硬編碼(或無法響應)時,乙太網自動協商行為是讓自動側進入半雙工模式。
簡單來說,Eth1 處於半雙工狀態導致伺服器只通過介面發送或接收數據,而不是發送和接收。硬編碼端仍將處於全雙工模式,並在從伺服器接收數據的同時嘗試向伺服器發送數據。然而,伺服器會認為這是一個衝突,因為它假定一個衝突域,其中全雙工消除了衝突域。伺服器將使用退避算法來安排重傳。如果伺服器繼續遇到它認為是衝突的情況,它將繼續增加等待重新傳輸數據的時間。
因此,擁有帶全雙工合作夥伴的半雙工很容易導致客戶端斷開連接、吞吐量或性能問題、延遲峰值和其他各種問題。