Linux
Ext4 核心崩潰
我得到這樣的核心恐慌:
EXT4-fs error (device md2): ext4_ext_find_extent: bad header/extent in inode #97911179: invalid magic - magic 5f69, entries 28769, max 26988(0), depth 24939(0) ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/ext4/extents.c:1973 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 6 Modules linked in: iptable_filter ipt_REDIRECT ip_nat_ftp ip_conntrack_ftp iptable_nat ip_nat ip_tables xt_state ip_conntrack_netbios_ns ip_conntrack nfnetlink netconsole ipt_iprange xt_tcpudp autofs4 hwmon_vid coretemp cpufreq_ondemand acpi_cpufreq freq_table mperf x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi ext3 jbd dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac lp joydev sg shpchp parport_pc parport r8169 mii serio_raw tpm_tis tpm tpm_bios i2c_i801 i2c_core pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache raid10 raid456 xor raid0 sata_nv aacraid 3w_9xxx 3w_xxxx sata_sil sata_via ahci libata sd_mod scsi_mod raid1 ext4 jbd2 crc16 uhci_hcd ohci_hcd ehci_hcd Pid: 9374, comm: httpd Not tainted 2.6.18-308.20.1.el5debug 0000001 RIP: 0010:[<ffffffff8806ccda>] [<ffffffff8806ccda>] :ext4:ext4_ext_put_in_cache+0x21/0x6a RSP: 0018:ffff8101c2df7678 EFLAGS: 00010246 RAX: 00000000fffffbf1 RBX: ffff810758115dc8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810758115958 RBP: ffff810758115958 R08: 0000000000000002 R09: 0000000000000000 R10: ffff8101c2df75a0 R11: 0000000000000100 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000 FS: 00002ab948d31f70(0000) GS:ffff81081f4ba4c8(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000001de9e4e0 CR3: 000000014ae88000 CR4: 00000000000006a0 Process httpd (pid: 9374, threadinfo ffff8101c2df6000, task ffff8101cdf74d80) Stack: 000181070000040f ffff810758115dc8 ffff8103f15d7ff4 ffff8107581157f0 ffff810758115958 000000000000040f 0000000000000000 ffffffff8806f621 ffff8101c2df76d8 ffff8101c2df7738 0000000000000000 ffff81034900c310 Call Trace: [<ffffffff8806f621>] :ext4:ext4_ext_get_blocks+0x258/0x16f3 [<ffffffff80013994>] poison_obj+0x26/0x2f [<ffffffff800331e2>] cache_free_debugcheck+0x20b/0x21a [<ffffffff8805b4ac>] :ext4:ext4_get_blocks+0x43/0x1d2 [<ffffffff8805b4cf>] :ext4:ext4_get_blocks+0x66/0x1d2 [<ffffffff8805c16a>] :ext4:ext4_get_block+0xa7/0xe6 [<ffffffff8805c3be>] :ext4:ext4_block_truncate_page+0x215/0x4f1 [<ffffffff8806e832>] :ext4:ext4_ext_truncate+0x65/0x909 [<ffffffff8805b4f9>] :ext4:ext4_get_blocks+0x90/0x1d2 [<ffffffff8805ccfc>] :ext4:ext4_truncate+0x91/0x53b [<ffffffff80041e5d>] pagevec_lookup+0x17/0x1e [<ffffffff8002d3cf>] truncate_inode_pages_range+0x1f3/0x2d5 [<ffffffff8803b78b>] :jbd2:jbd2_journal_stop+0x1f1/0x201 [<ffffffff8805f3c1>] :ext4:ext4_da_write_begin+0x1ea/0x25b [<ffffffff80010896>] generic_file_buffered_write+0x151/0x6c3 [<ffffffff800174b1>] __generic_file_aio_write_nolock+0x36c/0x3b9 [<ffffffff800482ab>] do_sock_read+0xcf/0x110 [<ffffffff80022d49>] generic_file_aio_write+0x69/0xc5 [<ffffffff88056c0a>] :ext4:ext4_file_write+0xcb/0x215 [<ffffffff8001936b>] do_sync_write+0xc7/0x104 [<ffffffff8000d418>] dnotify_parent+0x1f/0x7b [<ffffffff800efead>] do_readv_writev+0x26e/0x291 [<ffffffff800a8192>] autoremove_wake_function+0x0/0x2e [<ffffffff80035b9f>] do_setitimer+0x62a/0x692 [<ffffffff8002e6a5>] mntput_no_expire+0x19/0x8d [<ffffffff80049aa0>] sys_chdir+0x55/0x62 [<ffffffff800178c6>] vfs_write+0xce/0x174 [<ffffffff800181ba>] sys_write+0x45/0x6e [<ffffffff80060116>] system_call+0x7e/0x83 Code: 0f 0b 68 3e 27 08 88 c2 b5 07 eb fe 48 8d 9f 08 05 00 00 48 RIP [<ffffffff8806ccda>] :ext4:ext4_ext_put_in_cache+0x21/0x6a RSP <ffff8101c2df7678> <0>Kernel panic - not syncing: Fatal exception <0>Rebooting in 1 seconds..
我的系統是 CentOS 5.8 64 位。
/dev/md2 /home ext4 rw,noatime,nodiratime,usrjquota=aquota.user,grpjquota=aquota.group,usrquota,grpquota,jqfmt=vfsv0 0 0
核心:2.6.18-308.20.1.el5debug
md2 : active raid1 sdc3[0] sdd3[1] 2914280100 blocks super 1.0 [2/2] [UU] [>....................] resync = 0.2% (7252288/2914280100) finish=13468.3min speed=3595K/sec /dev/md2 2,7T 1,8T 908G 67% /home
我怎樣才能解決這個問題 ?
即使重新同步我的陣列,解除安裝文件系統,然後檢查並修復所有錯誤,它在大約一周的時間內都可以正常工作,然後出現小的 ext4 錯誤,最後它開始核心恐慌。
- 等待 RAID 重新同步完成。既然已經啟動了,如果能跑完就最好了。
- 以單使用者模式重新啟動系統,解除安裝
/home
分區(如果已安裝)並執行e2fsck -f /dev/md2
, 以確保文件系統是自洽的。- 如果在那之後再次發生這種情況,則很可能表明硬體損壞或真正的核心錯誤。如果您找不到任何前者的證據,請確保您正在執行帶有
yum update
.已知問題)。編輯:如果沒有令人信服的證據表明目前核心存在一些已知問題,我不會對核心進行回歸。如果您的核心正在記錄連續的、漸進的 FS 損壞,那麼對我來說,這非常強烈地表明硬體問題。
您是否對 sdc 和 sdd 光碟執行了 smartctl 檢查?你說“光碟很好”,但你沒有說你是怎麼知道的。
如果磁碟真的很好,那麼我注意到您只使用一個分區
sdc
並sdd
提供元設備 - 值得檢查的是分區表不重疊。我知道當分區重疊幾個塊時會引起問題,因為一個文件系統底部的超級塊一直踩在另一個文件系統最頂部的塊上。**編輯2:**感謝您的
smartctl
輸出。不幸的是,“健康檢查已通過”輸出相當沒有意義,因為這兩張光碟都沒有經過測試(“No self-tests have been logged
”)。嘗試 asmartctl -t long /dev/sdc
,完成後,同樣 forsdd
,然後看看會smartctl
說什麼。
您是文件系統損壞的受害者。但是,您的配置令人討厭,
/etc/fstab
這會阻止對該文件系統進行任何類型的檢查。請閱讀man fstab
,特別是關於第六個欄位的本節:The sixth field (fs_passno). This field is used by the fsck(8) program to determine the order in which filesystem checks are done at reboot time. The root filesystem should be specified with a fs_passno of 1, and other filesystems should have a fs_passno of 2. Filesystems within a drive will be checked sequentially, but filesystems on different drives will be checked at the same time to utilize parallelism available in the hardware. If the sixth field is not present or zero, a value of zero is returned and fsck will assume that the filesystem does not need to be checked.
所以,你已經告訴系統永遠不要使用
fsck
這個文件系統。請離線檢查並修復您的文件系統 (fsck
)。它已經損壞了。