Redhat

使用 0% SWAPON 查找 99.99% iowait 的根本原因

  • January 16, 2014

使用者和 DBA 抱怨我們的 OEL 伺服器上的“Oracle 速度很慢”。從作業系統的角度來看,我發現的唯一一件事是有一些奇怪的 IOWAIT 統計數據來自iotop.

輸出iotop

Total DISK READ: 27.24 M/s | Total DISK WRITE: 2.32 M/s
 TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
10374 be/4 root      190.28 K/s    0.00 B/s  0.00 % 99.99 % clBackup -child 22862 -j ~jt 202777:7:1 -cn xxxxxx12844 be/4 xxxxxx     0.00 B/s  303.15 K/s  0.00 % 99.99 % ora_dbw0_oaprod
14460 be/4 oracleuser   251.55 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
6795 be/4 oracleuser  1012.65 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
4336 be/4 oracleuser   812.70 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
17725 be/4 oracleuser   193.50 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
14456 be/4 oracleuser   109.65 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
12831 be/4 oracleuser    51.60 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
9756 be/4 oracleuser    83.85 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
24916 be/4 oracleuser  1128.75 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
19701 be/4 oracleuser   361.20 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
27920 be/4 oracleuser   432.15 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
16132 be/4 oracleuser    90.30 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
27967 be/4 oracleuser    64.50 K/s    0.00 B/s  0.00 % 97.87 % oracleoaprod (LOCAL=NO)
16615 be/4 oracleuser    64.50 K/s    0.00 B/s  0.00 % 97.17 % oracleoaprod (LOCAL=NO)
4465 be/4 oracleuser     7.46 M/s    0.00 B/s  0.00 % 97.15 % oracleoaprod (LOCAL=NO)
28044 be/4 oracleuser    14.51 M/s    0.00 B/s  0.00 % 97.02 % oracleoaprod (DESCRIPTION~(ADDRESS=(PROTOCOL=beq)))32283 be/4 oracleuser    77.40 K/s    0.00 B/s  0.00 % 95.48 % oracleoaprod (LOCAL=NO)
12851 be/4 oracleuser    19.35 K/s  590.18 K/s  0.00 % 91.77 % ora_lgwr_oaprod
12846 be/4 oracleuser     0.00 B/s 1077.15 K/s  0.00 % 91.41 % ora_dbw1_oaprod
23153 be/4 oracleuser    96.75 K/s    0.00 B/s  0.00 % 72.37 % oracleoaprod (LOCAL=NO)
27710 be/4 oracleuser    19.35 K/s    0.00 B/s  0.00 % 41.50 % oracleoaprod (LOCAL=NO)
25775 be/4 oracleuser    51.60 K/s    0.00 B/s  0.00 % 30.11 % oracleoaprod (LOCAL=NO)
13323 be/4 oracleuser    19.35 K/s   51.60 K/s  0.00 % 21.98 % oracleoaprod (LOCAL=NO)
24345 be/4 oracleuser    12.90 K/s    0.00 B/s  0.00 % 19.34 % oracleoaprod (LOCAL=NO)
12853 be/4 oracleuser     0.00 B/s   38.70 K/s  0.00 % 11.72 % ora_ckpt_oaprod
7234 be/4 oracleuser     6.45 K/s    0.00 B/s  0.00 %  7.52 % oracleoaprod (LOCAL=NO)
17820 be/4 apps     0.00 B/s    9.68 K/s  0.00 %  0.00 % rwrun P_CONC_REQUEST_ID=8~2211170.out desformat=XML20562 be/4 apps     0.00 B/s    3.23 K/s  0.00 %  0.00 % java -DCLIENT_PROCESSID=2~.GSMSvcComponentContainer 5849 be/4 apps     3.23 K/s    0.00 B/s  0.00 %  0.00 % FNDLIBR
7232 be/4 apps     0.00 B/s    3.23 K/s  0.00 %  0.00 % RVCTP
   1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init [5]
   2 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
   3 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
   4 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
   5 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
   6 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
   7 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/1]
   8 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % a[migration/2]
   9 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
  10 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/2]
  11 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
  12 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
  13 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/3]
  14 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/4]

輸出sar

# sar 1 7
   Linux 2.6.18-371.3.1.0.1.el5    01/16/2014

   10:13:41 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
   10:13:42 AM       all     65.32      0.00      2.56     22.08      0.00     10.04
   10:13:43 AM       all     65.94      0.00      2.50     23.02      0.00      8.55
   10:13:44 AM       all     65.15      0.00      2.06     24.17      0.00      8.62
   10:13:45 AM       all     62.16      0.00      2.06     26.06      0.00      9.73
   10:13:46 AM       all     54.00      0.00      1.81     31.96      0.00     12.23
   10:13:47 AM       all     51.03      0.00      1.62     35.17      0.00     12.18
   10:13:48 AM       all     51.97      0.00      1.25     27.61      0.00     19.18
   Average:          all     59.37      0.00      1.98     27.15      0.00     11.50

所有磁碟都來自NetApp除了LogVol00

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup01-LogVol00
                      97G   76G   17G  83% /
/dev/cciss/c0d0p1      99M   32M   63M  34% /boot
tmpfs                 127G  500M  126G   1% /dev/shm
/dev/mapper/mpath4p1  5.4T  3.2T  2.0T  62% /oracle/x1
/dev/mapper/mpath6p1  6.3T  4.3T  1.7T  72% /oracle/x2
/dev/mapper/mpath1p1  184G  188M  174G   1% /oracle/x1/db/apps_st/redo
/dev/mapper/mpath2p1  184G  188M  174G   1% /oracle/x1/db/apps_st/redo02

我想這只是缺乏可用的 iops。對於大型數據庫伺服器,我始終建議使用 SSD 儲存或更大的 SAS 陣列(最好是本地儲存)。

引用自:https://serverfault.com/questions/567717