限制 Linux 後台刷新(臟頁)
Linux 上的後台刷新發生在有太多寫入數據未決(可通過 /proc/sys/vm/dirty_background_ratio 調整)或達到未決寫入超時(/proc/sys/vm/dirty_expire_centisecs)時。除非達到另一個限制 (/proc/sys/vm/dirty_ratio),否則可能會記憶體更多寫入的數據。進一步的寫入將阻塞。
理論上,這應該創建一個後台程序寫出臟頁,而不會干擾其他程序。實際上,它確實會干擾任何進行非記憶體讀取或同步寫入的程序。很糟糕。這是因為後台刷新實際上以 100% 的設備速度寫入,此時任何其他設備請求都會被延遲(因為路上的所有隊列和寫入記憶體都已填滿)。
有沒有辦法限制刷新過程每秒執行的請求數量,或者以其他方式有效地優先考慮其他設備 I/O?
在使用 sysbench 進行大量基準測試後,我得出了以下結論:
為了生存(在性能方面)一種情況
- 一個邪惡的複製過程淹沒了臟頁
- 並且存在硬體寫入記憶體(可能也沒有)
- 每秒同步讀取或寫入 (IOPS) 至關重要
只需轉儲所有電梯、隊列和臟頁記憶體。臟頁的正確位置是在該硬體寫入記憶體的 RAM 中。
盡可能低地調整dirty_ratio(或新的dirty_bytes),但要注意順序吞吐量。在我的特定情況下,15 MB 是最佳的 (
echo 15000000 > dirty_bytes
)。這更像是一種破解而不是解決方案,因為千兆字節的 RAM 現在僅用於讀取記憶體而不是臟記憶體。為了讓臟記憶體在這種情況下正常工作,Linux 核心後台刷新器需要對底層設備接受請求的速度進行平均,並相應地調整後台刷新。不容易。
用於比較的規格和基準:
在將零寫入磁碟時進行測試
dd
,sysbench 顯示出巨大的成功,將 10 個執行緒 fsync 寫入 16 kB 從 33 IOPS 提高到 700 IOPS(空閒限制:1500 IOPS)和單執行緒從 8 到 400 IOPS。在沒有負載的情況下,IOPS 不受影響(~1500)並且吞吐量略有下降(從 251 MB/s 到 216 MB/s)。
dd
稱呼:dd if=/dev/zero of=dumpfile bs=1024 count=20485672
對於 sysbench,test_file.0 準備好不稀疏:
dd if=/dev/zero of=test_file.0 bs=1024 count=10485672
sysbench 呼叫 10 個執行緒:
sysbench --test=fileio --file-num=1 --num-threads=10 --file-total-size=10G --file-fsync-all=on --file-test-mode=rndwr --max-time=30 --file-block-size=16384 --max-requests=0 run
sysbench 呼叫一個執行緒:
sysbench --test=fileio --file-num=1 --num-threads=1 --file-total-size=10G --file-fsync-all=on --file-test-mode=rndwr --max-time=30 --file-block-size=16384 --max-requests=0 run
較小的塊大小顯示出更大的數字。
–file-block-size=4096 1 GB 髒字節:
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 0 1 files, 10Gb each 10Gb total file size Block size 4Kb Number of random requests for random IO: 0 Read/Write ratio for combined random IO test: 1.50 Calling fsync() after each write operation. Using synchronous I/O mode Doing random write test Threads started! Time limit exceeded, exiting... Done. Operations performed: 0 Read, 30 Write, 30 Other = 60 Total Read 0b Written 120Kb Total transferred 120Kb (3.939Kb/sec) 0.98 Requests/sec executed Test execution summary: total time: 30.4642s total number of events: 30 total time taken by event execution: 30.4639 per-request statistics: min: 94.36ms avg: 1015.46ms max: 1591.95ms approx. 95 percentile: 1591.30ms Threads fairness: events (avg/stddev): 30.0000/0.00 execution time (avg/stddev): 30.4639/0.00
–file-block-size=4096 15 MB 髒字節:
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 0 1 files, 10Gb each 10Gb total file size Block size 4Kb Number of random requests for random IO: 0 Read/Write ratio for combined random IO test: 1.50 Calling fsync() after each write operation. Using synchronous I/O mode Doing random write test Threads started! Time limit exceeded, exiting... Done. Operations performed: 0 Read, 13524 Write, 13524 Other = 27048 Total Read 0b Written 52.828Mb Total transferred 52.828Mb (1.7608Mb/sec) 450.75 Requests/sec executed Test execution summary: total time: 30.0032s total number of events: 13524 total time taken by event execution: 29.9921 per-request statistics: min: 0.10ms avg: 2.22ms max: 145.75ms approx. 95 percentile: 12.35ms Threads fairness: events (avg/stddev): 13524.0000/0.00 execution time (avg/stddev): 29.9921/0.00
–file-block-size=4096 在空閒系統上具有 15 MB 的髒字節:
sysbench 0.4.12:多執行緒系統評估基準
Running the test with following options: Number of threads: 1 Extra file open flags: 0 1 files, 10Gb each 10Gb total file size Block size 4Kb Number of random requests for random IO: 0 Read/Write ratio for combined random IO test: 1.50 Calling fsync() after each write operation. Using synchronous I/O mode Doing random write test Threads started! Time limit exceeded, exiting... Done. Operations performed: 0 Read, 43801 Write, 43801 Other = 87602 Total Read 0b Written 171.1Mb Total transferred 171.1Mb (5.7032Mb/sec) 1460.02 Requests/sec executed Test execution summary: total time: 30.0004s total number of events: 43801 total time taken by event execution: 29.9662 per-request statistics: min: 0.10ms avg: 0.68ms max: 275.50ms approx. 95 percentile: 3.28ms Threads fairness: events (avg/stddev): 43801.0000/0.00 execution time (avg/stddev): 29.9662/0.00
測試系統:
Adaptec 5405Z(即 512 MB 帶保護的寫記憶體)
英特爾至強 L5520
6GB RAM @ 1066MHz
主機板 Supermicro X8DTN(5520 晶片組)
12 個希捷梭子魚 1 TB 磁碟
- Linux 軟體 RAID 10 中的 10
核心 2.6.32
文件系統 xfs
Debian 不穩定
總而言之,我現在確信這種配置在空閒、高負載甚至滿負載的情況下都能很好地處理數據庫流量,否則這些流量本來會被順序流量餓死的。順序吞吐量高於兩個千兆鏈路無論如何都可以提供的吞吐量,因此減少一點也沒問題。