在具有大量連接和高流量小數據包的千兆網路上提高 TCP 性能

February 19, 2016

我正在嘗試通過“具有大量連接和高流量小數據包的千兆網路”來提高我的 TCP 吞吐量。我的伺服器作業系統是 Ubuntu 11.10 Server 64bit。

大約有 50.000 個（並且還在增長）客戶端通過 TCP 套接字（都在同一個埠上）連接到我的伺服器。

我 95% 的數據包大小為 1-150 字節（TCP 標頭和有效負載）。其餘 5% 從 150 到 4096+ 字節不等。

使用下面的配置，我的伺服器可以處理高達 30 Mbps（全雙工）的流量。

您能否建議最佳實踐來根據我的需要調整作業系統？

我的/etc/sysctl.cong樣子是這樣的：

kernel.pid_max = 1000000
net.ipv4.ip_local_port_range = 2500 65000
fs.file-max = 1000000
#
net.core.netdev_max_backlog=3000
net.ipv4.tcp_sack=0
#
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.somaxconn = 2048
#
net.ipv4.tcp_rmem = 4096 87380 16777216 
net.ipv4.tcp_wmem = 4096 65536 16777216
#
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_mem = 50576   64768   98152
#
net.core.wmem_default = 65536
net.core.rmem_default = 65536
net.ipv4.tcp_window_scaling=1
#
net.ipv4.tcp_mem= 98304 131072 196608
#
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_rfc1337 = 1
net.ipv4.ip_forward = 0
net.ipv4.tcp_congestion_control=cubic
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
#
net.ipv4.tcp_orphan_retries = 1
net.ipv4.tcp_fin_timeout = 25
net.ipv4.tcp_max_orphans = 8192

這是我的限制：

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 193045
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1000000

$$ ADDED $$ 我的網卡如下：

$ dmesg | grep Broad
[    2.473081] Broadcom NetXtreme II 5771x 10Gigabit Ethernet Driver bnx2x 1.62.12-0 (2011/03/20)
[    2.477808] bnx2x 0000:02:00.0: eth0: Broadcom NetXtreme II BCM57711E XGb (A0) PCI-E x4 5GHz (Gen2) found at mem fb000000, IRQ 28, node addr d8:d3:85:bd:23:08
[    2.482556] bnx2x 0000:02:00.1: eth1: Broadcom NetXtreme II BCM57711E XGb (A0) PCI-E x4 5GHz (Gen2) found at mem fa000000, IRQ 40, node addr d8:d3:85:bd:23:0c

$$ ADDED 2 $$

ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

$$ ADDED 3 $$

sudo ethtool -S eth0|grep -vw 0
NIC statistics:
     [1]: rx_bytes: 17521104292
     [1]: rx_ucast_packets: 118326392
     [1]: tx_bytes: 35351475694
     [1]: tx_ucast_packets: 191723897
     [2]: rx_bytes: 16569945203
     [2]: rx_ucast_packets: 114055437
     [2]: tx_bytes: 36748975961
     [2]: tx_ucast_packets: 194800859
     [3]: rx_bytes: 16222309010
     [3]: rx_ucast_packets: 109397802
     [3]: tx_bytes: 36034786682
     [3]: tx_ucast_packets: 198238209
     [4]: rx_bytes: 14884911384
     [4]: rx_ucast_packets: 104081414
     [4]: rx_discards: 5828
     [4]: rx_csum_offload_errors: 1
     [4]: tx_bytes: 35663361789
     [4]: tx_ucast_packets: 194024824
     [5]: rx_bytes: 16465075461
     [5]: rx_ucast_packets: 110637200
     [5]: tx_bytes: 43720432434
     [5]: tx_ucast_packets: 202041894
     [6]: rx_bytes: 16788706505
     [6]: rx_ucast_packets: 113123182
     [6]: tx_bytes: 38443961940
     [6]: tx_ucast_packets: 202415075
     [7]: rx_bytes: 16287423304
     [7]: rx_ucast_packets: 110369475
     [7]: rx_csum_offload_errors: 1
     [7]: tx_bytes: 35104168638
     [7]: tx_ucast_packets: 184905201
     [8]: rx_bytes: 12689721791
     [8]: rx_ucast_packets: 87616037
     [8]: rx_discards: 2638
     [8]: tx_bytes: 36133395431
     [8]: tx_ucast_packets: 196547264
     [9]: rx_bytes: 15007548011
     [9]: rx_ucast_packets: 98183525
     [9]: rx_csum_offload_errors: 1
     [9]: tx_bytes: 34871314517
     [9]: tx_ucast_packets: 188532637
     [9]: tx_mcast_packets: 12
     [10]: rx_bytes: 12112044826
     [10]: rx_ucast_packets: 84335465
     [10]: rx_discards: 2494
     [10]: tx_bytes: 36562151913
     [10]: tx_ucast_packets: 195658548
     [11]: rx_bytes: 12873153712
     [11]: rx_ucast_packets: 89305791
     [11]: rx_discards: 2990
     [11]: tx_bytes: 36348541675
     [11]: tx_ucast_packets: 194155226
     [12]: rx_bytes: 12768100958
     [12]: rx_ucast_packets: 89350917
     [12]: rx_discards: 2667
     [12]: tx_bytes: 35730240389
     [12]: tx_ucast_packets: 192254480
     [13]: rx_bytes: 14533227468
     [13]: rx_ucast_packets: 98139795
     [13]: tx_bytes: 35954232494
     [13]: tx_ucast_packets: 194573612
     [13]: tx_bcast_packets: 2
     [14]: rx_bytes: 13258647069
     [14]: rx_ucast_packets: 92856762
     [14]: rx_discards: 3509
     [14]: rx_csum_offload_errors: 1
     [14]: tx_bytes: 35663586641
     [14]: tx_ucast_packets: 189661305
     rx_bytes: 226125043936
     rx_ucast_packets: 1536428109
     rx_bcast_packets: 351
     rx_discards: 20126
     rx_filtered_packets: 8694
     rx_csum_offload_errors: 11
     tx_bytes: 548442367057
     tx_ucast_packets: 2915571846
     tx_mcast_packets: 12
     tx_bcast_packets: 2
     tx_64_byte_packets: 35417154
     tx_65_to_127_byte_packets: 2006984660
     tx_128_to_255_byte_packets: 373733514
     tx_256_to_511_byte_packets: 378121090
     tx_512_to_1023_byte_packets: 77643490
     tx_1024_to_1522_byte_packets: 43669214
     tx_pause_frames: 228

關於 SACK 的一些資訊：何時關閉 TCP SACK？

問題可能是您的網卡上有太多中斷。如果頻寬不是問題，那麼頻率就是問題：
打開網卡上的發送/接收緩衝區
ethtool -g eth0
將向您顯示目前設置（256 或 512 個條目）。您可能可以將這些提高到 1024、2048 或 3172。更多可能沒有意義。這只是一個環形緩衝區，只有在伺服器無法足夠快地處理傳入數據包時才會填滿。
如果緩衝區開始填滿，流量控制是告訴路由器或交換機減速的另一種方法：
在伺服器和它所連接的交換機/路由器埠上打開流量控制輸入/輸出。
ethtool -a eth0
可能會顯示：
Pause parameters for eth0:
Autonegotiate:  on
RX:             on
TX:             on
檢查 /var/log/messages 以獲取 eth0 的目前設置。檢查類似的東西：
eth0：鏈路以 1000 Mbps、全雙工、流量控制 tx 和 rx 啟動
如果您沒有看到 tx 和 rx，您的網路管理員必須調整交換機/路由器上的值。在啟用接收/傳輸流控制的 Cisco 上。
**注意：**更改這些值會使您的連結在很短的時間內（少於 1 秒）關閉和打開。
如果這一切都沒有幫助 - 您還可以將網卡的速度降低到 100 MBit（在交換機/路由器埠上執行相同操作）
ethtool -s eth0 autoneg off && ethtool -s eth0 speed 100
但在你的情況下，我會說 - 提高 NIC 環形緩衝區中的接收緩衝區。

引用自：https://serverfault.com/questions/357799

在具有大量連接和高流量小數據包的千兆網路上提高 TCP 性能

相關問答

從 Ubuntu 恢復開始網路連接？

linux netns 中的網際網路無法訪問，nexthop 錯誤

如何找出網路介面丟棄數據包的原因？

將 OpenVPN 連接限制為僅一個程序

nginx 不接受來自外部 IP 的連接

將 CPU 頻率設置為硬體最低限制 - 會損害硬體嗎？