Linux

NFS 客戶端間歇性性能問題(積壓隊列等待?)

  • April 1, 2019

我對其中一個 NFS 客戶端(Ubuntu 16.04 LTS)有一個奇怪的問題。過去幾天我一直在努力嘗試調試它,但到目前為止還沒有成功。掛載分區後,幾天后一切正常,客戶端和伺服器之間的傳輸速度為 1 Gbps。幾天后速度下降到不到 10 mbps,即使是簡單的目錄列表也需要幾秒鐘,I/O 等待是 100%

我注意到的是積壓等待,特別是對於寫操作來說非常高:

root@srv:~# mountstats /mnt/data
Stats for 192.168.0.15:/mnt/data mounted on /mnt/data:
 NFS mount options: rw,vers=4.0,rsize=16384,wsize=16384,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none
 NFS server capabilities: caps=0xffdf,wtmult=512,dtsize=16384,bsize=0,namlen=255
 NFSv4 capability flags: bm0=0xfdffbfff,bm1=0xf9be3e,bm2=0x0,acl=0x3,pnfs=notconfigured
 NFS security flavor: 1  pseudoflavor: 0

NFS byte counts:
 applications read 8168679407142 bytes via read(2)
 applications wrote 4833000353435 bytes via write(2)
 applications read 0 bytes via O_DIRECT read(2)
 applications wrote 0 bytes via O_DIRECT write(2)
 client read 4218977852758 bytes via NFS READ
 client wrote 4832098253207 bytes via NFS WRITE

RPC statistics:
 561421762 RPC requests sent, 561421608 RPC replies received (1 XIDs not found)
 average backlog queue length: 0

READ:
       263822474 ops (46%)     0 retrans (0%)  0 major timeouts
       avg bytes sent per op: 184      avg bytes received per op: 16051
       backlog wait: 8.772689  RTT: 27.972131  total execute time: 36.752241 (milliseconds)
WRITE:
       295296111 ops (52%)     0 retrans (0%)  0 major timeouts
       avg bytes sent per op: 16567    avg bytes received per op: 132
       backlog wait: 62468603019.791718        RTT: 78.030143  total execute time: 62468603097.830574 (milliseconds)

沒有錯誤,沒有警告,我嘗試使用“echo 1 > /proc/sys/vm/block_dump”進行調試(過去曾為我創造了奇蹟),但這次沒有任何與 NFS 相關的可見內容。知道如何進一步調試並查看導致極高積壓等待的原因嗎?

以防萬一有人遇到同樣的問題,我能找到的唯一解決方法是強制使用 NFS3 而不是 NFS4。現在問題已經消失了。

引用自:https://serverfault.com/questions/958398