打開 100 萬個連接最多可達到 469K

December 27, 2016

我需要配置伺服器來處理超過一百萬個打開的 websocket 連接（最好是兩百萬）。
我使用了這篇博文中的配置：
sysctl -w fs.file-max=12000500
sysctl -w fs.nr_open=20000500
ulimit -n 20000500
sysctl -w net.ipv4.tcp_mem='10000000 10000000 10000000'
sysctl -w net.ipv4.tcp_rmem='1024 4096 16384'
sysctl -w net.ipv4.tcp_wmem='1024 4096 16384'
sysctl -w net.core.rmem_max=16384
sysctl -w net.core.wmem_max=16384
但是，我的應用程序在達到 469219 個連接後停止應用新連接。我錯過了什麼？我真的認為作業系統配置中缺少某些東西。我們的主應用程序是用 Java 編寫的（使用 Tomcat 伺服器），但使用 NodeJS 伺服器也得到了非常相似的結果。
我們使用 16GB RAM 的 Ubuntu。
編輯：在峰值系統使用大約 12Gb 的 14.7Gb。
更新：
所以最後我有了 32GB 的工作站。通過增加 RAM 空間解決了問題。目前使用 18GB Java 堆，我能夠處理 567K WS 連接。對於更高的數字，我需要更多的客戶:-)

不一定是答案，但太大而無法發表評論。
tcp_mem (since Linux 2.4)
     This  is  a vector of 3 integers: [low, pressure, high].  These bounds, measured in units of the system page size, are used by
     TCP to track its memory usage.  The defaults are calculated at boot time from the amount of available memory.  (TCP  can  only
     use  low memory for this, which is limited to around 900 megabytes on 32-bit systems.  64-bit systems do not suffer this limi-
     tation.)

     low       TCP doesnât regulate its memory allocation when the number of pages it has allocated globally is below this  number.

     pressure  When the amount of memory allocated by TCP exceeds this number of pages, TCP moderates its memory consumption.  This
               memory pressure state is exited once the number of pages allocated falls below the low mark.

     high      The maximum number of pages, globally, that TCP will allocate.  This value overrides any other limits imposed by the
               kernel.
請注意以下事項：
這些界限，以系統頁面大小為單位測量
將該值設置10000000 10000000 10000000為向核心說明為 TCP 使用 39062 MiB 記憶體。幾乎是你所擁有的三倍。
第二個問題是 TCP 的 3 個值rmem，wmem您可以設置定義最小值、預設值和最大值。鑑於您的 tcp_mem 配置聲明您永遠不會進入“記憶體節省”模式，我想您實際上是在每個套接字分配 4-16k 之間的某個位置。
所以，如果我是核心並且我看到瞭如此瘋狂的設置，我可能也不會表現得那麼可預測。
嘗試將該值降低到您可以實際使用的值，然後再試一次。
最後，我會指出，如果你認真地相信，你就生活在一個夢想的世界裡：
核心將輕鬆支持 200 萬個連接。
Node 或 java 將輕鬆支持 200 萬個連接。
即使在最好的情況下（使用 epoll 集），一個 epoll 集中的 200 萬個條目也是昂貴的。工人或 prefork 模型永遠不會發生這種情況。
您需要更均勻地分配此負載。您可能至少需要另外 10 個節點才能獲得任何值得使用者稱之為服務的東西。

引用自：https://serverfault.com/questions/822045

打開 100 萬個連接最多可達到 469K

相關問答

linux netns 中的網際網路無法訪問，nexthop 錯誤

網路介面周期性下降，速度更改為 0 核心錯誤

將 CPU 頻率設置為硬體最低限制 - 會損害硬體嗎？

負載時網路延遲減少

如何分類 linux 磁碟 IO 系統範圍的“掛起”

從一個 IP 地址開始發送並從另一個 IP 地址接收