Virtual-Machines

更改 /proc/sys/net/ipv4/tcp_tw_reuse 的值是否危險?

  • June 19, 2015

我們有幾個最近轉換為虛擬機的生產系統。我們的應用程序經常訪問 MySQL 數據庫,並且對於每個查詢,它都會創建一個連接、查詢並斷開該連接。

這不是適當的查詢方式(我知道),但我們有一些似乎無法繞過的限制。無論如何,問題是這樣的:雖然機器是物理主機,但程序執行良好。轉換為虛擬機後,我們注意到與數據庫的間歇性連接問題。在某個時刻,TIME_WAIT 中有 24000+ 個套接字連接(在物理主機上,我看到的最多是 17000 - 不好,但不會引起問題)。

我希望重用這些連接,這樣我們就不會看到連接問題,所以:

問題:

將tcp_tw_reuse的值設置為1可以嗎?有哪些明顯的危險?有什麼理由我不應該這樣做嗎?

此外,是否有任何其他方法可以讓系統(RHEL/CentOS)防止如此多的連接進入 TIME_WAIT,或者讓它們被重用?

最後,改變 tcp_tw_recycle 會做什麼,這對我有幫助嗎?

提前,謝謝!

您可以安全地減少停機時間,但您可能會遇到網路上連接不正確、丟包或抖動的問題。我不會從 1 秒開始調整,從 15 到 30 秒開始,然後按照自己的方式進行調整。

此外,您確實需要修復您的應用程序。

RFC 1185在第 3.2 節中有一個很好的解釋:

當 TCP 連接關閉時,TIME-WAIT 狀態下 2*MSL 的延遲會佔用套接字對 4 分鐘(參見第 3.5 節)

$$ Postel81 $$. 建立在 TCP 上的應用程序關閉一個連接並打開一個新連接(例如,使用 Stream 模式的 FTP 數據傳輸連接)必須每次都選擇一個新的套接字對。這種延遲有兩個不同的目的:

 (a)  Implement the full-duplex reliable close handshake of TCP. 

      The proper time to delay the final close step is not really 
      related to the MSL; it depends instead upon the RTO for the 
      FIN segments and therefore upon the RTT of the path.* 
      Although there is no formal upper-bound on RTT, common 
      network engineering practice makes an RTT greater than 1 
      minute very unlikely.  Thus, the 4 minute delay in TIME-WAIT 
      state works satisfactorily to provide a reliable full-duplex 
      TCP close.  Note again that this is independent of MSL 
      enforcement and network speed. 

      The TIME-WAIT state could cause an indirect performance 
      problem if an application needed to repeatedly close one 
      connection and open another at a very high frequency, since 
      the number of available TCP ports on a host is less than 
      2**16.  However, high network speeds are not the major 
      contributor to this problem; the RTT is the limiting factor 
      in how quickly connections can be opened and closed. 
      Therefore, this problem will no worse at high transfer 
      speeds. 

 (b)  Allow old duplicate segements to expire. 

      Suppose that a host keeps a cache of the last timestamp 
      received from each remote host.  This can be used to reject 
      old duplicate segments from earlier incarnations of the 

*注意:可以說,發送 FIN 的一方知道它需要多大程度的可靠性,因此它應該能夠確定 FIN 接收方的 TIME-WAIT 延遲的長度。這可以通過在 FIN 段中使用適當的 TCP 選項來完成。

      connection, if the timestamp clock can be guaranteed to have 
      ticked at least once since the old conennection was open. 
      This requires that the TIME-WAIT delay plus the RTT together 
      must be at least one tick of the sender's timestamp clock. 

      Note that this is a variant on the mechanism proposed by 
      Garlick, Rom, and Postel (see the appendix), which required 
      each host to maintain connection records containing the 
      highest sequence numbers on every connection.  Using 
      timestamps instead, it is only necessary to keep one quantity 
      per remote host, regardless of the number of simultaneous 
      connections to that host.

引用自:https://serverfault.com/questions/234534