多少中斷是太多了？

July 27, 2017

在 AWS 實例 x1.32xlarge（128 核）上，我們每秒會收到很多中斷。

以下是中斷/秒中最高的 CPU：

Interrupts Top CPUs
CPU0: 140838.0
CPU1: 77867.0
CPU4: 66495.0
CPU6: 59941.0
CPU3: 39096.0
CPU2: 31532.0
CPU7: 30861.0
CPU5: 26042.0
CPU8: 4168.0
CPU12: 3026.0
CPU10: 2793.0

以下是最高的中斷/s/CPU：

Interrupts above 10k/s
HYP [Hypervisor callback interrupts] [CPU0] = 46902.0/sec
49 [xen-percpu-ipi resched0] [CPU0] = 43437.0/sec
RES [Rescheduling interrupts] [CPU0] = 41512.0/sec
HYP [Hypervisor callback interrupts] [CPU2] = 26638.0/sec
HYP [Hypervisor callback interrupts] [CPU8] = 22875.0/sec
HYP [Hypervisor callback interrupts] [CPU12] = 20813.0/sec
55 [xen-percpu-ipi resched1] [CPU2] = 20749.0/sec
RES [Rescheduling interrupts] [CPU2] = 19568.0/sec
73 [xen-percpu-ipi resched4] [CPU8] = 16400.0/sec
RES [Rescheduling interrupts] [CPU8] = 15677.0/sec
HYP [Hypervisor callback interrupts] [CPU6] = 14226.0/sec
85 [xen-percpu-ipi resched6] [CPU12] = 14060.0/sec
RES [Rescheduling interrupts] [CPU12] = 13271.0/sec
HYP [Hypervisor callback interrupts] [CPU14] = 12173.0/sec
HYP [Hypervisor callback interrupts] [CPU4] = 11887.0/sec
HYP [Hypervisor callback interrupts] [CPU10] = 10500.0/sec

當該機器上執行的應用程序處於顯著負載下時，就會發生這種情況。網路流量比較大，執行緒很多。

我的問題是：50K/150K 中斷/秒太多了嗎？我們如何解釋這個數字？是否有最大中斷/秒？

更新：

這裡是top輸出的一瞥：

Tasks: 825 total,   3 running, 822 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.6%us,  3.4%sy,  0.0%ni, 83.6%id,  0.0%wa,  0.0%hi,  2.3%si,  0.0%st
Mem:  2014742856k total, 40059184k used, 1974683672k free,   162036k buffers
Swap:        0k total,        0k used,        0k free,  3159112k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                              
32936 ec2-user  20   0 77.3g  11g  29m S 1759.7  0.6   1780:36 java                                                                                                                                               
32118 ec2-user  20   0 64.2g  10g  26m S 1036.9  0.6  62:31.08 java                                                                                                                                               
    3 root      20   0     0    0    0 R 70.4  0.0  14:54.84 ksoftirqd/0                                                                                                                                          
   12 root      20   0     0    0    0 S 21.2  0.0   6:06.47 ksoftirqd/1                                                                                                                                          
   16 root      20   0     0    0    0 S 15.2  0.0   4:33.28 ksoftirqd/2                                                                                                                                          
   20 root      20   0     0    0    0 S 12.2  0.0   3:34.12 ksoftirqd/3                                                                                                                                          
   28 root      20   0     0    0    0 S 11.9  0.0   3:24.96 ksoftirqd/5                                                                                                                                          
   24 root      20   0     0    0    0 S 11.6  0.0   3:26.54 ksoftirqd/4                                                                                                                                          
   32 root      20   0     0    0    0 S 10.2  0.0   3:23.56 ksoftirqd/6                                                                                                                                          
   36 root      20   0     0    0    0 S 10.2  0.0   3:28.80 ksoftirqd/7

更新2：

大多數中斷來自網路網卡隊列，這允許將負載分散到其他核心： https ://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq .html

如果不知道您的應用程序做了什麼以及它產生的負載，就無法判斷您的系統是否有“太多中斷”正在進行。
您可以使用top來檢查system負載值。如果它很高，則意味著很大一部分 CPU 負載發生在核心上下文中。反過來，這可能是中斷風暴的跡象。

引用自：https://serverfault.com/questions/865532

多少中斷是太多了？

相關問答

Samba 和 luks 一起加密磁碟：儘管 CPU 資源充足，但性能損失巨大，單獨 LUKS 和 samba 可以按預期工作

重啟後伺服器突然有很高的softirq cpu使用率

最大性能的 MySQL 設置/可以特定於特定數據庫或使用者？

CPU 核心使用率不均

禁用 CPU 管理

軟體中斷 CPU 時間高且持續增長