RAID 性能突然變慢
我們最近注意到我們的數據庫查詢執行時間比平時要長得多。經過一番調查,看起來我們的磁碟讀取速度非常慢。
過去,我們遇到過類似的問題,原因是 RAID 控制器在 BBU 上啟動重新學習週期並切換到直寫。這次似乎不是這種情況。
在幾天的時間裡,我跑
bonnie++
了幾次。結果如下:22-82 M/s 的讀取速度看起來非常糟糕。在原始設備上執行
dd
幾分鐘顯示讀取速度在 15.8 MB/s 到 225 MB/s 之間(請參閱下面的更新)。iotop
並不表示任何其他程序競爭 IO,所以我不確定為什麼讀取速度如此可變。RAID 卡是 MegaRAID SAS 9280,在 RAID10 中具有 12 個 SAS 驅動器(15k,300GB),具有 XFS 文件系統(在 RAID1 中配置的兩個 SSD 上的作業系統)。我沒有看到任何 SMART 警報,並且陣列似乎沒有降級。
我也執行過
xfs_check
,似乎沒有任何 XFS 一致性問題。這裡的下一個調查步驟應該是什麼?
伺服器規格
Ubuntu 12.04.5 LTS 128GB RAM Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
輸出
xfs_repair -n
:Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 3 - agno = 2 - agno = 0 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting.
輸出
megacli -AdpAllInfo -aAll
:Versions ================ Product Name : LSI MegaRAID SAS 9280-4i4e Serial No : SV24919344 FW Package Build: 12.12.0-0124 Mfg. Data ================ Mfg. Date : 12/06/12 Rework Date : 00/00/00 Revision No : 04B Battery FRU : N/A Image Versions in Flash: ================ FW Version : 2.130.363-1846 BIOS Version : 3.25.00_4.12.05.00_0x05180000 Preboot CLI Version: 04.04-020:#%00009 WebBIOS Version : 6.0-51-e_47-Rel NVDATA Version : 2.09.03-0039 Boot Block Version : 2.02.00.00-0000 BOOT Version : 09.250.01.219 Pending Images in Flash ================ None PCI Info ================ Controller Id : 0000 Vendor Id : 1000 Device Id : 0079 SubVendorId : 1000 SubDeviceId : 9282 Host Interface : PCIE ChipRevision : B4 Link Speed : 0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 5003048001c1e47f 1 0000000000000000 2 0000000000000000 3 0000000000000000 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b005a6cbc0 BBU : Present Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 512MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Absent Temperature sensor for controller : Absent Settings ================ Current Time : 14:58:51 7/11, 2016 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 4 Delay Among Spinup Groups : 2s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 120 Auto Enhanced Import : No Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No POST delay : 90 seconds BIOS Error Handling : Stop On Errors Current Boot Mode :Normal Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 80 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 350 MB Device Present ================ Virtual Drives : 2 Degraded : 0 Offline : 0 Physical Devices : 16 Disks : 14 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : Yes Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : No Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : No Block SSD Write Disk Cache Change: No Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : No Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes NCQ : No Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : No Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 256kB Flush Time : 4 seconds Write Policy : WB Read Policy : Adaptive Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : 0 Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : No Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : No BreakMirror RAID Support : No Disable Join Mirror : No Enable Shield State : No Time taken to detect CME : 60s
輸出
megacli -AdpBbuCmd -GetBbuSTatus -aAll
:BBU status for Adapter: 0 BatteryType: iBBU Voltage: 4068 mV Current: 0 mA Temperature: 30 C Battery State: Optimal BBU Firmware Status: Charging Status : Charging Voltage : OK Temperature : OK Learn Cycle Requested : No Learn Cycle Active : No Learn Cycle Status : OK Learn Cycle Timeout : No I2c Errors Detected : No Battery Pack Missing : No Battery Replacement required : No Remaining Capacity Low : No Periodic Learn Required : No Transparent Learn : No No space to cache offload : No Pack is about to fail & should be replaced : No Cache Offload premium feature required : No Module microcode update required : No GasGuageStatus: Fully Discharged : No Fully Charged : No Discharging : Yes Initialized : Yes Remaining Time Alarm : No Discharge Terminated : No Over Temperature : No Charging Terminated : No Over Charged : No Relative State of Charge: 88 % Charger System State: 49169 Charger System Ctrl: 0 Charging current: 512 mA Absolute state of charge: 87 % Max Error: 4 % Exit Code: 0x00
輸出
megacli -LDInfo -Lall -aAll
:Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 111.281 GB Sector Size : 512 Mirror Data : 111.281 GB State : Optimal Strip Size : 256 KB Number Of Drives : 2 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Is VD Cached: No Virtual Drive: 1 (Target Id: 1) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 1.633 TB Sector Size : 512 Mirror Data : 1.633 TB State : Optimal Strip Size : 256 KB Number Of Drives per span:2 Span Depth : 6 Default Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Is VD Cached: No
**更新:**根據 Andrew 的建議,我跑
dd
了幾分鐘,看看我會在原始磁碟讀取上獲得什麼樣的吞吐量:dd if=/dev/sdb of=/dev/null bs=256k 19701+0 records in 19700+0 records out 5164236800 bytes (5.2 GB) copied, 202.553 s, 25.5 MB/s
其他執行的結果,吞吐量變化很大:
18706857984 bytes (19 GB) copied, 1181.51 s, 15.8 MB/s 20923023360 bytes (21 GB) copied, 388.137 s, 53.9 MB/s 21205876736 bytes (21 GB) copied, 55.5997 s, 381 MB/s 25391005696 bytes (25 GB) copied, 153.903 s, 165 MB/s
**更新 2:**輸出
megacli -PDlist -aall
:https ://gist.github.com/danpelota/3fca1e5f90a1f358c2d52a49bfb08ef0
正如 Michal在他的評論中指出的那樣,這個問題是一個“預故障”磁碟。來自 megaraid 控制器和 smartctl 的診斷中沒有危險信號
SMART Health Status:
,OK
但smartctl
在每個磁碟上執行顯示了巨大的非中等錯誤計數(我編寫了一個快速 bash 腳本來循環遍歷每個磁碟 ID)。以下是完整輸出中的相關位:# Ran this for each individual disk on the /dev/sdb array: smartctl -a -d megaraid,18 /dev/sdb Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 7950078 0 0 7950078 7950078 660.801 0 write: 0 0 0 0 0 363.247 0 verify: 12 0 0 12 12 0.002 0 Non-medium error count: 3253718
除此驅動器(磁碟 ID 18)外,其他所有驅動器均顯示非中等錯誤計數為 0。我確定了磁碟,將其更換為新磁碟,然後又恢復了 3gbps 的讀取速度。
顯示的錯誤日誌(如果可用)顯示在單獨的行中:
- 寫錯誤計數器
- 讀取錯誤計數器
- 驗證錯誤計數器(僅在非零時顯示)
- 非中等錯誤計數器(僅顯示一個數字)。這表示除寫入、讀取或驗證錯誤之外的可恢復事件的數量。
- 錯誤事件保存在“最後 n 個錯誤事件”日誌頁面中。保存的錯誤事件記錄數(即“n”)是特定於供應商的(例如,Hitachi 10K300 型號磁碟最多保存 23 條記錄)。每個錯誤事件記錄的內容都是 ASCII 格式和特定於供應商的。與每個錯誤事件記錄相關的參數程式碼表示錯誤事件發生的相對時間。較高的參數程式碼表示錯誤事件發生的時間較晚。如果設備不支持此日誌頁面,則輸出“不支持錯誤事件記錄”。如果支持此日誌頁面並且有錯誤事件記錄,則每個記錄都以“錯誤事件:”為前綴,其中是參數程式碼。