Linux
未知原因導致 iowait 高
我很難找出導致伺服器上 iowait 高的原因。
這是日誌
iostat -xm 5 5
Linux 2.6.32-358.6.1.el6.x86_64 (prod-1.localdomain) 09/28/2013 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 6.98 0.05 3.72 3.54 0.00 85.71 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.08 120.88 30.27 1.72 0.96 0.48 92.20 0.34 10.67 3.79 12.13 sda 7.63 37.19 8.96 4.89 0.35 0.16 76.40 0.16 11.63 2.19 3.04 avg-cpu: %user %nice %system %iowait %steal %idle 5.41 0.00 6.20 37.65 0.00 50.74 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 109.80 186.20 1.40 3.75 0.43 45.66 98.21 519.80 5.33 100.00 sda 33.20 3.40 18.00 2.00 0.37 0.02 40.32 0.07 3.41 3.17 6.34 avg-cpu: %user %nice %system %iowait %steal %idle 5.55 0.00 7.42 30.06 0.00 56.97 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 196.00 0.00 3.91 0.00 40.85 100.41 506.01 5.10 100.00 sda 0.00 2.40 1.80 2.60 0.05 0.02 30.91 0.01 2.95 2.73 1.20 avg-cpu: %user %nice %system %iowait %steal %idle 5.71 0.00 7.04 31.76 0.00 55.49 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 100.00 189.00 1.20 3.72 0.40 44.33 95.32 514.88 5.26 100.00 sda 33.20 4.20 19.20 5.20 0.39 0.04 35.80 0.02 1.01 0.79 1.92 avg-cpu: %user %nice %system %iowait %steal %idle 61.93 0.00 10.08 14.99 0.00 12.99 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.40 185.40 1.40 3.76 0.01 41.31 83.22 431.16 5.28 98.62 sda 33.20 5.40 9.60 4.00 0.21 0.04 37.65 0.02 1.24 1.04 1.42
如您所見,所有指標都正常,除了 await 和 %util 非常高。所以我認為 /dev/sdb 可能有問題。
但
smartctl
報告沒有有用的資訊。smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.6.1.el6.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital RE4 Serial ATA Device Model: WDC WD2003FYYS-02W0B1 Serial Number: WD-WMAY04093732 LU WWN Device Id: 5 0014ee 05877b196 Firmware Version: 01.01D02 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sat Sep 28 09:05:30 2013 ICT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (29160) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 283) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 9100 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 42 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7373 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 40 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 31 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 10 194 Temperature_Celsius 0x0022 123 107 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
我被困在這裡,不知道下一步如何解決問題。
任何幫助將不勝感激!
更新:
@邁克爾漢普頓
我的自測日誌,沒有興趣資訊。
smartctl -l selftest /dev/sdb
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.6.1.el6.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 7380
@Mife
我的
pidstat -d 1 30
結果。Linux 2.6.32-358.6.1.el6.x86_64 (cass-23_120.localdomain) 09/28/2013 _x86_64_ (16 CPU) 05:57:43 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:44 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:45 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:46 PM 1555 736.00 0.00 0.00 java 05:57:46 PM 16698 0.00 4.00 0.00 java 05:57:46 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:47 PM 552 0.00 68.00 0.00 jbd2/sda3-8 05:57:47 PM 1555 352.00 0.00 0.00 java 05:57:47 PM 16698 0.00 12.00 0.00 java 05:57:47 PM 18074 0.00 4.00 0.00 java 05:57:47 PM 19295 1564.00 0.00 0.00 java 05:57:47 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:48 PM 1554 3128.00 8.00 4.00 xinetd 05:57:48 PM 1570 840.00 0.00 0.00 gmond 05:57:48 PM 2183 0.00 4.00 0.00 java 05:57:48 PM 2394 64.00 0.00 0.00 rsync 05:57:48 PM 2395 324.00 0.00 0.00 ssh 05:57:48 PM 13280 28.00 0.00 0.00 downloadm_new.s 05:57:48 PM 19295 1724.00 0.00 0.00 java 05:57:48 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:49 PM 19295 1744.00 0.00 0.00 java 05:57:49 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:50 PM 1083 0.00 8.00 0.00 flush-8:0 05:57:50 PM 1086 0.00 8.00 0.00 java 05:57:50 PM 2183 0.00 12.00 0.00 java 05:57:50 PM 13280 388.00 0.00 0.00 downloadm_new.s 05:57:50 PM 18074 0.00 4.00 0.00 java 05:57:50 PM 19295 1728.00 0.00 0.00 java 05:57:50 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:51 PM 2183 0.00 4.00 0.00 java 05:57:51 PM 2400 8.00 0.00 0.00 sleep 05:57:51 PM 18074 0.00 4.00 0.00 java 05:57:51 PM 19295 1680.00 0.00 0.00 java 05:57:51 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:52 PM 552 0.00 28.00 0.00 jbd2/sda3-8 05:57:52 PM 1112 0.00 4.00 0.00 jbd2/sda4-8 05:57:52 PM 2183 0.00 8.00 0.00 java 05:57:52 PM 16698 0.00 4.00 0.00 java 05:57:52 PM 18074 0.00 4.00 0.00 java 05:57:52 PM 19295 1672.00 0.00 0.00 java 05:57:52 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:53 PM 1555 376.00 20.00 0.00 java 05:57:53 PM 1570 792.00 0.00 0.00 gmond 05:57:53 PM 19295 1568.00 8.00 0.00 java 05:57:53 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:54 PM 3734 844.00 188.00 0.00 java 05:57:54 PM 19295 1672.00 0.00 0.00 java 05:57:54 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:55 PM 1083 0.00 20.00 0.00 flush-8:0 05:57:55 PM 3734 2620.00 1156.00 0.00 java 05:57:55 PM 4327 0.00 8.00 0.00 java 05:57:55 PM 9677 0.00 8.00 0.00 java 05:57:55 PM 16613 0.00 8.00 0.00 java 05:57:55 PM 19295 1272.00 8.00 0.00 java 05:57:55 PM 19426 0.00 8.00 0.00 java 05:57:55 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:56 PM 3734 3592.00 1200.00 0.00 java 05:57:56 PM 19295 332.00 0.00 0.00 java 05:57:56 PM 19426 0.00 4.00 0.00 java 05:57:56 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:57 PM 552 0.00 36.00 0.00 jbd2/sda3-8 05:57:57 PM 2405 1068.00 32.00 0.00 java 05:57:57 PM 3734 2972.00 828.00 0.00 java 05:57:57 PM 5457 0.00 8.00 0.00 java 05:57:57 PM 9677 28424.00 144.00 20.00 java 05:57:57 PM 16698 0.00 8.00 0.00 java 05:57:57 PM 18074 0.00 4.00 0.00 java 05:57:57 PM 19295 0.00 4.00 0.00 java 05:57:57 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:58 PM 2183 0.00 4.00 0.00 java 05:57:58 PM 2222 0.00 4.00 0.00 pidstat 05:57:58 PM 2405 500.00 0.00 0.00 java 05:57:58 PM 3734 4016.00 720.00 0.00 java 05:57:58 PM 5457 0.00 8.00 0.00 java 05:57:58 PM 16698 0.00 4.00 0.00 java 05:57:58 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:57:59 PM 1112 0.00 8.00 0.00 jbd2/sda4-8 05:57:59 PM 3734 4572.00 372.00 0.00 java 05:57:59 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:00 PM 1083 0.00 32.00 0.00 flush-8:0 05:58:00 PM 2405 496.00 0.00 0.00 java 05:58:00 PM 3734 5412.00 4.00 0.00 java 05:58:00 PM 5457 0.00 16.00 0.00 java 05:58:00 PM 11681 0.00 8.00 0.00 java 05:58:00 PM 14824 0.00 8.00 0.00 java 05:58:00 PM 16698 0.00 12.00 0.00 java 05:58:00 PM 17694 0.00 8.00 0.00 java 05:58:00 PM 18074 0.00 12.00 0.00 java 05:58:00 PM 18129 0.00 8.00 0.00 java 05:58:00 PM 19542 0.00 8.00 0.00 java 05:58:00 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:01 PM 3734 3888.00 0.00 0.00 java 05:58:01 PM 3813 8.00 12.00 0.00 java 05:58:01 PM 13280 28.00 0.00 0.00 downloadm_new.s 05:58:01 PM 18074 0.00 8.00 0.00 java 05:58:01 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:02 PM 552 0.00 44.00 0.00 jbd2/sda3-8 05:58:02 PM 1129 0.00 16.00 0.00 jbd2/sdb1-8 05:58:02 PM 2405 256.00 0.00 0.00 java 05:58:02 PM 3734 1200.00 1128.00 0.00 java 05:58:02 PM 16698 0.00 4.00 0.00 java 05:58:02 PM 18074 0.00 8.00 0.00 java 05:58:02 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:03 PM 1570 1172.00 0.00 0.00 gmond 05:58:03 PM 2183 0.00 4.00 0.00 java 05:58:03 PM 2405 256.00 0.00 0.00 java 05:58:03 PM 2442 92.00 0.00 0.00 rsync 05:58:03 PM 2443 916.00 0.00 0.00 ssh 05:58:03 PM 3734 576.00 0.00 0.00 java 05:58:03 PM 9677 0.00 4.00 0.00 java 05:58:03 PM 13280 56.00 8996.00 0.00 downloadm_new.s 05:58:03 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:04 PM 2183 0.00 4.00 0.00 java 05:58:04 PM 2405 256.00 0.00 0.00 java 05:58:04 PM 2443 8.00 0.00 0.00 ssh 05:58:04 PM 3734 2032.00 16.00 0.00 java 05:58:04 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:05 PM 1083 0.00 4.00 0.00 flush-8:0 05:58:05 PM 2405 224.00 0.00 0.00 java 05:58:05 PM 2446 160.00 0.00 0.00 sleep 05:58:05 PM 3734 5344.00 648.00 0.00 java 05:58:05 PM 3813 0.00 8.00 0.00 java 05:58:05 PM 13280 1016.00 0.00 0.00 downloadm_new.s 05:58:05 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:06 PM 2405 16.00 0.00 0.00 java 05:58:06 PM 3734 6196.00 344.00 0.00 java 05:58:06 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:07 PM 552 0.00 8.00 0.00 jbd2/sda3-8 05:58:07 PM 2405 112.00 0.00 0.00 java 05:58:07 PM 3734 3532.00 0.00 0.00 java 05:58:07 PM 16698 0.00 4.00 0.00 java 05:58:07 PM 18074 0.00 4.00 0.00 java 05:58:07 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:08 PM 1570 1172.00 0.00 0.00 gmond 05:58:08 PM 2183 0.00 4.00 0.00 java 05:58:08 PM 2405 352.00 0.00 0.00 java 05:58:08 PM 3734 4588.00 0.00 0.00 java 05:58:08 PM 16698 0.00 8.00 0.00 java 05:58:08 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:09 PM 2222 0.00 4.00 0.00 pidstat 05:58:09 PM 2405 368.00 0.00 0.00 java 05:58:09 PM 3734 1720.00 0.00 0.00 java 05:58:09 PM 16698 0.00 4.00 0.00 java 05:58:09 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:10 PM 1083 0.00 8.00 0.00 flush-8:0 05:58:10 PM 2405 480.00 0.00 0.00 java 05:58:10 PM 3734 40.00 16.00 0.00 java 05:58:10 PM 17768 0.00 8.00 0.00 java 05:58:10 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:11 PM 2405 608.00 0.00 0.00 java 05:58:11 PM 3734 264.00 0.00 0.00 java 05:58:11 PM 19426 0.00 4.00 0.00 java 05:58:11 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:12 PM 1129 0.00 24.00 0.00 jbd2/sdb1-8 05:58:12 PM 2405 240.00 0.00 0.00 java 05:58:12 PM 18074 0.00 8.00 0.00 java 05:58:12 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command 05:58:13 PM 1570 1172.00 0.00 0.00 gmond 05:58:13 PM 2183 0.00 4.00 0.00 java 05:58:13 PM 2405 128.00 0.00 0.00 java 05:58:13 PM 18074 0.00 4.00 0.00 java Average: PID kB_rd/s kB_wr/s kB_ccwr/s Command Average: 552 0.00 6.13 0.00 jbd2/sda3-8 Average: 1083 0.00 2.40 0.00 flush-8:0 Average: 1086 0.00 0.27 0.00 java Average: 1112 0.00 0.40 0.00 jbd2/sda4-8 Average: 1129 0.00 1.33 0.00 jbd2/sdb1-8 Average: 1554 104.16 0.27 0.13 xinetd Average: 1570 171.43 0.00 0.00 gmond Average: 2183 0.00 1.60 0.00 java Average: 2222 0.00 0.27 0.00 pidstat Average: 2405 178.49 1.07 0.00 java Average: 2446 5.33 0.00 0.00 sleep Average: 3734 1778.49 220.45 0.00 java Average: 3813 0.27 0.67 0.00 java Average: 4327 0.00 0.27 0.00 java Average: 5457 0.00 1.07 0.00 java Average: 9677 946.52 5.19 0.67 java Average: 11681 0.00 0.27 0.00 java Average: 13280 50.48 299.57 0.00 downloadm_new.s Average: 14824 0.00 0.27 0.00 java Average: 16613 0.00 0.27 0.00 java Average: 16698 0.00 2.13 0.00 java Average: 17694 0.00 0.27 0.00 java Average: 17768 0.00 0.27 0.00 java Average: 18074 0.00 2.13 0.00 java Average: 18129 0.00 0.27 0.00 java Average: 19295 498.04 0.67 0.00 java Average: 19426 0.00 0.53 0.00 java Average: 19542 0.00 0.27 0.00 java
@kworr
這是我的掛載選項
/dev/sdb1
。% mount | grep sdb /dev/sdb1 on /backup type ext4 (rw,noatime,commit=100)
Update2 硬碟的預期 IOPS 數。
7,200 rpm SATA drives HDD ~75-100 IOPS[2] SATA 3 Gb/s 10,000 rpm SATA drives HDD ~125-150 IOPS[2] SATA 3 Gbit/s 10,000 rpm SAS drives HDD ~140 IOPS[2] SAS 15,000 rpm SAS drives HDD ~175-210 IOPS[2] SAS
這裡發生了很多事情,但是 pid 3734,一個 java 程序似乎是你的罪魁禍首。您應該了解它在做什麼,傳遞給它的參數是什麼,它的父 pid 是什麼,以及它的用途。
在 1 秒樣本的 30 秒時間段內,java 使用 1778.49 讀取 kb/秒,還有其他 java 程序,pids 9677 和 19295 分別使用 946.52 和 498.04 讀取 kb/秒。
我無法告訴您他們所做的是對還是錯,但是您的高 I/O 主要是由於這些 java 程序。