Ubuntu

記憶體不足調試 Ubuntu / AWS

  • December 24, 2019

我有一個經常引發Out of memory錯誤的新伺服器。

這導致多個程序被 殺死oom-killer,我不知道為什麼。


它位於 AWS (EC2) 上,具有 4GB RAM (t2.medium),在 Ubuntu 16.04.2 LTS 上執行帶有 MySQL 和 PHP 的簡單 Apache Web 伺服器。

我使用 EBS 卷 (GP2) 添加了一個 4GB SWAP 分區。

可用記憶體量似乎很好:

free -h
             total        used        free      shared  buff/cache   available
Mem:           3.9G        201M         49M         13M        3.6G        3.6G
Swap:          4.0G         79M        3.9G

這似乎在一天中沒有太大變化(used隨機檢查的最高數量是 250M)。

到目前為止,我還沒有更改vm.overcommit_設置,因為我想在這樣做之前知道問題是什麼:

vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

的輸出dmesg -T是:

[Mon Feb 13 13:09:00 2017] sessionclean invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[Mon Feb 13 13:09:00 2017] sessionclean cpuset=/ mems_allowed=0
[Mon Feb 13 13:09:00 2017] CPU: 1 PID: 22353 Comm: sessionclean Not tainted 4.4.0-59-generic #80-Ubuntu
[Mon Feb 13 13:09:00 2017] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/09/2016
[Mon Feb 13 13:09:00 2017]  0000000000000286 0000000031d57236 ffff88005ce33af0 ffffffff813f7583
[Mon Feb 13 13:09:00 2017]  ffff88005ce33cc8 ffff880107ac0000 ffff88005ce33b60 ffffffff8120ad5e
[Mon Feb 13 13:09:00 2017]  ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
[Mon Feb 13 13:09:00 2017] Call Trace:
[Mon Feb 13 13:09:00 2017]  [<ffffffff813f7583>] dump_stack+0x63/0x90
[Mon Feb 13 13:09:00 2017]  [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
[Mon Feb 13 13:09:00 2017]  [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
[Mon Feb 13 13:09:00 2017]  [<ffffffff811920ee>] ? oom_unkillable_task+0x9e/0xd0
[Mon Feb 13 13:09:00 2017]  [<ffffffff81192b49>] out_of_memory+0x219/0x460
[Mon Feb 13 13:09:00 2017]  [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[Mon Feb 13 13:09:00 2017]  [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
[Mon Feb 13 13:09:00 2017]  [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
[Mon Feb 13 13:09:00 2017]  [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[Mon Feb 13 13:09:00 2017]  [<ffffffff811c1670>] ? handle_mm_fault+0xce0/0x1820
[Mon Feb 13 13:09:00 2017]  [<ffffffff810805a0>] _do_fork+0x80/0x360
[Mon Feb 13 13:09:00 2017]  [<ffffffff81080929>] SyS_clone+0x19/0x20
[Mon Feb 13 13:09:00 2017]  [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[Mon Feb 13 13:09:00 2017] Mem-Info:
[Mon Feb 13 13:09:00 2017] active_anon:17107 inactive_anon:20026 isolated_anon:0
                           active_file:424384 inactive_file:433967 isolated_file:0
                           unevictable:914 dirty:36 writeback:0 unstable:0
                           slab_reclaimable:71336 slab_unreclaimable:6907
                           mapped:12390 shmem:3580 pagetables:4224 bounce:0
                           free:25546 free_pcp:0 free_cma:0
[Mon Feb 13 13:09:00 2017] Node 0 DMA free:15872kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:8kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:20kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 3745 3934 3934 3934
[Mon Feb 13 13:09:00 2017] Node 0 DMA32 free:66512kB min:7528kB low:9408kB high:11292kB active_anon:63604kB inactive_anon:72976kB active_file:1691880kB inactive_file:1723756kB unevictable:3244kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:3244kB dirty:124kB writeback:0kB mapped:48740kB shmem:13476kB slab_reclaimable:156484kB slab_unreclaimable:23920kB kernel_stack:2784kB pagetables:15124kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 189 189 189
[Mon Feb 13 13:09:00 2017] Node 0 Normal free:19800kB min:380kB low:472kB high:568kB active_anon:4824kB inactive_anon:7128kB active_file:5648kB inactive_file:12108kB unevictable:412kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:193908kB mlocked:412kB dirty:20kB writeback:0kB mapped:812kB shmem:844kB slab_reclaimable:128840kB slab_unreclaimable:3708kB kernel_stack:816kB pagetables:1772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[Mon Feb 13 13:09:00 2017] lowmem_reserve[]: 0 0 0 0 0
[Mon Feb 13 13:09:00 2017] Node 0 DMA: 4*4kB (MEH) 4*8kB (MEH) 1*16kB (H) 2*32kB (ME) 4*64kB (UME) 3*128kB (UME) 3*256kB (UME) 2*512kB (ME) 3*1024kB (MEH) 1*2048kB (E) 2*4096kB (M) = 15872kB
[Mon Feb 13 13:09:00 2017] Node 0 DMA32: 16314*4kB (UME) 172*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66632kB
[Mon Feb 13 13:09:00 2017] Node 0 Normal: 4161*4kB (UE) 147*8kB (UME) 0*16kB 0*32kB 1*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 19804kB
[Mon Feb 13 13:09:00 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Mon Feb 13 13:09:00 2017] 864746 total pagecache pages
[Mon Feb 13 13:09:00 2017] 2212 pages in swap cache
[Mon Feb 13 13:09:00 2017] Swap cache stats: add 62420, delete 60208, find 121128/130192
[Mon Feb 13 13:09:00 2017] Free swap  = 4103680kB
[Mon Feb 13 13:09:00 2017] Total swap = 4194300kB
[Mon Feb 13 13:09:00 2017] 1081245 pages RAM
[Mon Feb 13 13:09:00 2017] 0 pages HighMem/MovableOnly
[Mon Feb 13 13:09:00 2017] 70004 pages reserved
[Mon Feb 13 13:09:00 2017] 0 pages cma reserved
[Mon Feb 13 13:09:00 2017] 0 pages hwpoisoned
[Mon Feb 13 13:09:00 2017] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[Mon Feb 13 13:09:00 2017] [  379]     0   379    10183     1130      22       3      771             0 systemd-journal
[Mon Feb 13 13:09:00 2017] [  435]     0   435    25742       48      18       3        0             0 lvmetad
[Mon Feb 13 13:09:00 2017] [  478]     0   478    10704      531      23       3      247         -1000 systemd-udevd
[Mon Feb 13 13:09:00 2017] [  906]     0   906     4030      662      11       3       27             0 dhclient
[Mon Feb 13 13:09:00 2017] [ 1032]     0  1032     1306       30       8       3        0             0 iscsid
[Mon Feb 13 13:09:00 2017] [ 1033]     0  1033     1431      877       8       3        0           -17 iscsid
[Mon Feb 13 13:09:00 2017] [ 1035]     0  1035     1100      308       8       3        0             0 acpid
[Mon Feb 13 13:09:00 2017] [ 1037]   104  1037    64099      599      27       3      196             0 rsyslogd
[Mon Feb 13 13:09:00 2017] [ 1043]     0  1043    58710      315      17       4       41             0 lxcfs
[Mon Feb 13 13:09:00 2017] [ 1047]   107  1047    10724      334      24       3       25          -900 dbus-daemon
[Mon Feb 13 13:09:00 2017] [ 1055]     0  1055     6511      409      19       3       25             0 atd
[Mon Feb 13 13:09:00 2017] [ 1065]     0  1065    71269       64      41       3     2782             0 accounts-daemon
[Mon Feb 13 13:09:00 2017] [ 1069]     0  1069     7155      502      18       3       27             0 systemd-logind
[Mon Feb 13 13:09:00 2017] [ 1071]     0  1071    16380      545      36       3      153         -1000 sshd
[Mon Feb 13 13:09:00 2017] [ 1073]     0  1073     6932      502      19       3       38             0 cron
[Mon Feb 13 13:09:00 2017] [ 1101]     0  1101     3344       51      11       3        1             0 mdadm
[Mon Feb 13 13:09:00 2017] [ 1114]     0  1114    69272      148      39       3       45             0 polkitd
[Mon Feb 13 13:09:00 2017] [ 1202]     0  1202     4868      428      14       3       37             0 irqbalance
[Mon Feb 13 13:09:00 2017] [ 1221]   113  1221    27509      437      24       3       97             0 ntpd
[Mon Feb 13 13:09:00 2017] [ 1236]     0  1236     3619      402      11       3        0             0 agetty
[Mon Feb 13 13:09:00 2017] [ 1238]     0  1238     3665      361      12       3        0             0 agetty
[Mon Feb 13 13:09:00 2017] [ 1389]     0  1389   100324     4842     154       3     1167             0 apache2
[Mon Feb 13 13:09:00 2017] [ 7047]   114  7047    88165      925      64       3      437             0 opendkim
[Mon Feb 13 13:09:00 2017] [30354]     0 30354    16352      526      23       3       82             0 master
[Mon Feb 13 13:09:00 2017] [30356]   112 30356    16999      306      25       3      169             0 qmgr
[Mon Feb 13 13:09:00 2017] [30361]   112 30361    20306      889      30       3      174             0 tlsmgr
[Mon Feb 13 13:09:00 2017] [27212]     0 27212     6011      664      17       3       23             0 vsftpd
[Mon Feb 13 13:09:00 2017] [30271]     0 30271     4981      711      14       3       73             0 mysqld_safe
[Mon Feb 13 13:09:00 2017] [20955]     0 20955    23842     1213      51       3      229             0 sshd
[Mon Feb 13 13:09:00 2017] [20957]  1001 20957    11312      608      26       3      218             0 systemd
[Mon Feb 13 13:09:00 2017] [20959]  1001 20959    52158       95      36       4      391             0 (sd-pam)
[Mon Feb 13 13:09:00 2017] [21053]  1001 21053    23908      629      49       3      309             0 sshd
[Mon Feb 13 13:09:00 2017] [21054]  1001 21054     5382      862      15       3      434             0 bash
[Mon Feb 13 13:09:00 2017] [21565]     0 21565    23842     1305      51       3      136             0 sshd
[Mon Feb 13 13:09:00 2017] [21595]  1001 21595    23842      728      48       3      143             0 sshd
[Mon Feb 13 13:09:00 2017] [21596]  1001 21596     5378      820      15       3      452             0 bash
[Mon Feb 13 13:09:00 2017] [21844]     0 21844    23842     1424      50       3       20             0 sshd
[Mon Feb 13 13:09:00 2017] [21874]  1001 21874    23924      840      50       3      137             0 sshd
[Mon Feb 13 13:09:00 2017] [21875]  1001 21875     5378      724      16       3      524             0 bash
[Mon Feb 13 13:09:00 2017] [21891]     0 21891    13972      727      32       3      149             0 sudo
[Mon Feb 13 13:09:00 2017] [21892]     0 21892    12751      612      30       3      105             0 su
[Mon Feb 13 13:09:00 2017] [21893]     0 21893     5030      861      15       3       74             0 bash
[Mon Feb 13 13:09:00 2017] [22092]   116 22092   154426    22725     107       4      608             0 mysqld
[Mon Feb 13 13:09:00 2017] [22093]     0 22093     6203      277      18       3       10             0 logger
[Mon Feb 13 13:09:00 2017] [22166]   112 22166    16869     1066      25       3        0             0 pickup
[Mon Feb 13 13:09:00 2017] [22256]    33 22256   100903     4062     155       3     1028             0 apache2
[Mon Feb 13 13:09:00 2017] [22272]    33 22272   101113     6090     157       3      919             0 apache2
[Mon Feb 13 13:09:00 2017] [22273]    33 22273   101016     4352     156       3     1034             0 apache2
[Mon Feb 13 13:09:00 2017] [22285]    33 22285   100915     4301     155       3     1030             0 apache2
[Mon Feb 13 13:09:00 2017] [22290]    33 22290   100994     3732     153       3     1036             0 apache2
[Mon Feb 13 13:09:00 2017] [22291]    33 22291   101505     5021     157       3     1031             0 apache2
[Mon Feb 13 13:09:00 2017] [22292]    33 22292   100991     3457     153       3     1048             0 apache2
[Mon Feb 13 13:09:00 2017] [22313]    33 22313   100886     3437     153       3     1049             0 apache2
[Mon Feb 13 13:09:00 2017] [22314]    33 22314   100374     1309     144       3     1139             0 apache2
[Mon Feb 13 13:09:00 2017] [22315]    33 22315   100890     3193     151       3     1054             0 apache2
[Mon Feb 13 13:09:00 2017] [22316]    33 22316   101015     5788     160       3      986             0 apache2
[Mon Feb 13 13:09:00 2017] [22317]    33 22317   100380     1658     144       3     1127             0 apache2
[Mon Feb 13 13:09:00 2017] [22318]    33 22318   100901     3771     155       3     1046             0 apache2
[Mon Feb 13 13:09:00 2017] [22319]    33 22319   100374     1309     144       3     1139             0 apache2
[Mon Feb 13 13:09:00 2017] [22321]    33 22321   100900     3686     155       3     1045             0 apache2
[Mon Feb 13 13:09:00 2017] [22326]    33 22326   101024     4417     155       3     1039             0 apache2
[Mon Feb 13 13:09:00 2017] [22327]    33 22327   100378     1309     144       3     1139             0 apache2
[Mon Feb 13 13:09:00 2017] [22328]    33 22328   101533     6627     161       3      853             0 apache2
[Mon Feb 13 13:09:00 2017] [22329]     0 22329    12235      727      29       3       11             0 cron
[Mon Feb 13 13:09:00 2017] [22330]     0 22330     1127      195       8       3        0             0 sh
[Mon Feb 13 13:09:00 2017] [22331]     0 22331     1127      202       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22332]     0 22332     1127       26       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22334]     0 22334     4144      185       9       3        0             0 sort
[Mon Feb 13 13:09:00 2017] [22335]     0 22335     4144      186      10       3        0             0 sort
[Mon Feb 13 13:09:00 2017] [22336]     0 22336     1127       26       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22342]     0 22342     1127      301       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] [22353]     0 22353     1127       29       8       3        0             0 sessionclean
[Mon Feb 13 13:09:00 2017] Out of memory: Kill process 22092 (mysqld) score 11 or sacrifice child
[Mon Feb 13 13:09:00 2017] Killed process 22092 (mysqld) total-vm:617704kB, anon-rss:76544kB, file-rss:14356kB

和內容/proc/meminfo

MemTotal:        4044964 kB
MemFree:           69100 kB
MemAvailable:    3665524 kB
Buffers:         1509532 kB
Cached:          1959440 kB
SwapCached:         6288 kB
Active:          2181068 kB
Inactive:        1524292 kB
Active(anon):     133556 kB
Inactive(anon):   119404 kB
Active(file):    2047512 kB
Inactive(file):  1404888 kB
Unevictable:        3656 kB
Mlocked:            3656 kB
SwapTotal:       4194300 kB
SwapFree:        4112600 kB
Dirty:               184 kB
Writeback:             0 kB
AnonPages:        235104 kB
Mapped:            59700 kB
Shmem:             14192 kB
Slab:             220372 kB
SReclaimable:     192240 kB
SUnreclaim:        28132 kB
KernelStack:        3936 kB
PageTables:        17116 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6216780 kB
Committed_AS:    1189376 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:     14336 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      137216 kB
DirectMap2M:     4188160 kB

我不太熟悉oom-killer,所以我想知道這是否是由於total_vm,該列加起來為 2937127 - 如果乘以 4 kB 頁面,則為 11.2 GB。

我有 4 GB 的物理 RAM 和 4 GB 的 SWAP 空間(總共 8 個),如果乘以 1.5(由於 vm.overcommit_ratio 50),將得到 12 GB。


相比之下,舊伺服器從未出現任何記憶體問題,並且還執行了更多服務。

它還有 4GB RAM 和 4GB SWAP,以及完全相同的vm.overcommit_設置……唯一真正的區別是它是物理伺服器,並且執行 CentOS (6.8)


我錯過了什麼嗎?因為這台伺服器看起來應該有足夠的 RAM 可用,並且沒有觸及 SWAP 空間。

只是為了提供答案,在這種情況下,這是由於Ubuntu 錯誤 #1655842 造成的。

引用自:https://serverfault.com/questions/832302