Linux
莫名其妙的記憶體洩漏。什麼在這個系統上使用了 ~10GB 的記憶體?
執行大約 18 小時後,該系統使用了約 10GB 的記憶體,導致當我們執行我們的日常任務時觸發 OOM-killer:
# free -h total used free shared buffers cached Mem: 14G 9.4G 5.3G 400K 27M 59M -/+ buffers/cache: 9.3G 5.4G Swap: 0B 0B 0B # cat /proc/meminfo MemTotal: 15400928 kB MemFree: 5567028 kB Buffers: 28464 kB Cached: 60816 kB SwapCached: 0 kB Active: 321464 kB Inactive: 59156 kB Active(anon): 291464 kB Inactive(anon): 316 kB Active(file): 30000 kB Inactive(file): 58840 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 40 kB Writeback: 0 kB AnonPages: 291380 kB Mapped: 14356 kB Shmem: 400 kB Slab: 364596 kB SReclaimable: 18856 kB SUnreclaim: 345740 kB KernelStack: 1832 kB PageTables: 3720 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7700464 kB Committed_AS: 313224 kB VmallocTotal: 34359738367 kB VmallocUsed: 35976 kB VmallocChunk: 34359678732 kB HardwareCorrupted: 0 kB AnonHugePages: 231424 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 9598976 kB DirectMap2M: 6260736 kB
但是,程序似乎並沒有使用大量的記憶體:
# top -o %MEM -n 1 top - 15:07:00 up 18:28, 1 user, load average: 0.00, 0.01, 0.05 Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie %Cpu(s): 23.7 us, 4.8 sy, 0.0 ni, 71.4 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem: 15400928 total, 9838560 used, 5562368 free, 29764 buffers KiB Swap: 0 total, 0 used, 0 free. 62760 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1333 root 20 0 5763204 274132 5352 S 0.0 1.8 7:00.19 java 1466 newrelic 20 0 251484 4884 2056 S 0.0 0.0 0:56.41 nrsysmond 16804 root 20 0 105636 4212 3224 S 0.0 0.0 0:00.00 sshd 16876 root 20 0 21420 3908 1764 S 0.0 0.0 0:00.03 bash 16858 ubuntu 20 0 21456 3828 1684 S 0.0 0.0 0:00.05 bash 770 root 20 0 10216 2868 576 S 0.0 0.0 0:00.02 dhclient 1 root 20 0 33700 2216 624 S 0.0 0.0 0:35.50 init 16875 root 20 0 63664 2084 1612 S 0.0 0.0 0:00.00 sudo 16857 ubuntu 20 0 105636 1860 880 S 0.0 0.0 0:00.01 sshd 16920 root 20 0 23688 1528 1064 R 0.0 0.0 0:00.00 top 16803 postfix 20 0 27400 1492 1216 S 0.0 0.0 0:00.00 pickup 976 root 20 0 43444 1100 748 S 0.0 0.0 0:00.00 systemd-logind 572 root 20 0 51480 1048 308 S 0.0 0.0 0:00.53 systemd-udevd 1840 ntp 20 0 31448 1044 448 S 0.0 0.0 0:02.94 ntpd 990 syslog 20 0 255836 924 76 S 0.0 0.0 0:00.13 rsyslogd 1167 root 20 0 61372 828 148 S 0.0 0.0 0:00.00 sshd 945 message+ 20 0 39212 788 416 S 0.0 0.0 0:00.12 dbus-daemon 1323 root 20 0 20692 676 0 S 0.0 0.0 0:40.92 wrapper 1230 root 20 0 19320 588 244 S 0.0 0.0 0:04.57 irqbalance 1538 root 20 0 25336 500 188 S 0.0 0.0 0:00.18 master 567 root 20 0 19604 480 96 S 0.0 0.0 0:00.34 upstart-udev-br 1175 root 20 0 23648 404 156 S 0.0 0.0 0:00.08 cron 1005 root 20 0 15272 348 88 S 0.0 0.0 0:00.08 upstart-file-br
臨時和共享記憶體文件系統基本上是空的:
# df -h Filesystem Size Used Avail Use% Mounted on udev 7.4G 12K 7.4G 1% /dev tmpfs 1.5G 384K 1.5G 1% /run /dev/xvda1 9.8G 6.7G 2.7G 72% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 7.4G 0 7.4G 0% /run/shm none 100M 0 100M 0% /run/user /dev/xvda15 104M 4.7M 99M 5% /boot/efi /dev/xvdb 64G 1.1G 60G 2% /mnt
smem
說它正在被核心使用:# smem -tw Area Used Cache Noncache firmware/hardware 0 0 0 kernel image 0 0 0 kernel dynamic memory 9525544 92468 9433076 userspace memory 311064 15648 295416 free memory 5564320 5564320 0 ---------------------------------------------------------- 15400928 5672436 9728492
但
slabtop
沒有幫助:# slabtop -o -s c Active / Total Objects (% used) : 2915263 / 2937006 (99.3%) Active / Total Slabs (% used) : 60745 / 60745 (100.0%) Active / Total Caches (% used) : 68 / 103 (66.0%) Active / Total Size (% used) : 356086.71K / 360884.30K (98.7%) Minimum / Average / Maximum Object : 0.01K / 0.12K / 14.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 2226784 2226784 100% 0.07K 39764 56 159056K Acpi-ParseExt 273408 272598 99% 0.25K 8544 32 68352K kmalloc-256 8568 8560 99% 4.00K 1071 8 34272K kmalloc-4096 52320 52320 100% 0.50K 1635 32 26160K kmalloc-512 1988 1975 99% 8.00K 497 4 15904K kmalloc-8192 58044 53370 91% 0.19K 2764 21 11056K kmalloc-192 150016 141356 94% 0.06K 2344 64 9376K kmalloc-64 5016 3504 69% 0.96K 152 33 4864K ext4_inode_cache 7280 6834 93% 0.57K 260 28 4160K inode_cache 20265 20067 99% 0.19K 965 21 3860K dentry 1760 1721 97% 2.00K 110 16 3520K kmalloc-2048 19800 19800 100% 0.11K 550 36 2200K sysfs_dir_cache 2112 1966 93% 1.00K 66 32 2112K kmalloc-1024 305 260 85% 6.00K 61 5 1952K task_struct 14616 14242 97% 0.09K 348 42 1392K kmalloc-96 2125 2092 98% 0.63K 85 25 1360K proc_inode_cache 2324 2324 100% 0.55K 83 28 1328K radix_tree_node 9828 9828 100% 0.10K 252 39 1008K buffer_head 1400 1400 100% 0.62K 56 25 896K sock_inode_cache 54 39 72% 12.00K 27 2 864K nvidia_stack_cache 975 975 100% 0.81K 25 39 800K task_xstate 690 515 74% 1.06K 23 30 736K signal_cache
到目前為止,我能夠解決此問題的唯一方法是重新啟動。10GB 記憶體藏在哪裡?
smem
幫助我跟踪核心的問題,我相信 NVIDIA 驅動程序是罪魁禍首。升級到 367.35 後情況看起來不錯。參考:
我正在執行一個具有 32GB 記憶體的機器,顯著的區別是 DirectMap4k 值;
DirectMap4k: 493076 kB DirectMap2M: 7862272 kB DirectMap1G: 27262976 kB
對比你的;
DirectMap4k: 11182080 kB DirectMap2M: 4677632 kB
這可能是一個起點。Google搜尋表明這個值可能會受到從主機分配給 VPS 的影響……你是在虛擬伺服器中執行這台機器嗎?
可能是主機伺服器沒有足夠的 RAM 並且弄亂了
/proc/meminfo
.另外,我會粘貼 的輸出
smem -tw
,因為這可能確定記憶體洩漏是在核心還是在應用程序中;# smem -tw Area Used Cache Noncache firmware/hardware 0 0 0 kernel image 0 0 0 kernel dynamic memory 11297432 10738716 558716 userspace memory 6144832 1182184 4962648 free memory 15470032 15470032 0 ---------------------------------------------------------- 32912296 27390932 5521364