Linux

mysql崩潰。硬碟壞了還是硬體壞了?

  • March 15, 2017

我已經在 1 週內看到高負載和 mysql 崩潰 2 次。這可能是原因嗎?任何的想法?

   Jan  3 09:49:19 HOST kernel: [2272100.568769]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
   Jan  3 09:49:19 HOST kernel: [2272100.569023] ata2.00: status: { DRDY ERR }
   Jan  3 09:49:19 HOST kernel: [2272100.569089] ata2.00: error: { UNC }
   Jan  3 09:49:19 HOST kernel: [2272100.577394] ata2.00: configured for UDMA/133
   Jan  3 09:49:19 HOST kernel: [2272100.577418] ata2: EH complete
   Jan  3 09:49:26 HOST kernel: [2272107.699341] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   Jan  3 09:49:26 HOST kernel: [2272107.699569] ata2.00: BMDMA stat 0x25
   Jan  3 09:49:26 HOST kernel: [2272107.699643] ata2.00: failed command: READ DMA EXT
   Jan  3 09:49:26 HOST kernel: [2272107.699713] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
   Jan  3 09:49:26 HOST kernel: [2272107.699715]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
   Jan  3 09:49:26 HOST kernel: [2272107.699966] ata2.00: status: { DRDY ERR }
   Jan  3 09:49:26 HOST kernel: [2272107.700030] ata2.00: error: { UNC }
   Jan  3 09:49:26 HOST kernel: [2272107.708509] ata2.00: configured for UDMA/133
   Jan  3 09:49:26 HOST kernel: [2272107.708534] ata2: EH complete
   Jan  3 09:49:33 HOST kernel: [2272114.833522] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   Jan  3 09:49:33 HOST kernel: [2272114.833603] ata2.00: BMDMA stat 0x25
   Jan  3 09:49:33 HOST kernel: [2272114.833669] ata2.00: failed command: READ DMA EXT
   Jan  3 09:49:33 HOST kernel: [2272114.833737] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
   Jan  3 09:49:33 HOST kernel: [2272114.833739]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
   Jan  3 09:49:33 HOST kernel: [2272114.833992] ata2.00: status: { DRDY ERR }
   Jan  3 09:49:33 HOST kernel: [2272114.834056] ata2.00: error: { UNC }
   Jan  3 09:49:33 HOST kernel: [2272114.842578] ata2.00: configured for UDMA/133
   Jan  3 09:49:33 HOST kernel: [2272114.842604] ata2: EH complete
   Jan  3 09:49:40 HOST kernel: [2272121.959563] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
   Jan  3 09:49:40 HOST kernel: [2272121.959644] ata2.00: BMDMA stat 0x25
   Jan  3 09:49:40 HOST kernel: [2272121.959708] ata2.00: failed command: READ DMA EXT
   Jan  3 09:49:40 HOST kernel: [2272121.959778] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
   Jan  3 09:49:40 HOST kernel: [2272121.959780]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
   Jan  3 09:49:40 HOST kernel: [2272121.961337] ata2.00: status: { DRDY ERR }
   Jan  3 09:49:40 HOST kernel: [2272121.961400] ata2.00: error: { UNC }
   Jan  3 09:49:40 HOST kernel: [2272121.968673] ata2.00: configured for UDMA/133
   Jan  3 09:49:40 HOST kernel: [2272121.968701] sd 1:0:0:0: [sda] Unhandled sense code
   Jan  3 09:49:40 HOST kernel: [2272121.968706] sd 1:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
   Jan  3 09:49:40 HOST kernel: [2272121.968714] sd 1:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
   Jan  3 09:49:40 HOST kernel: [2272121.968723] Descriptor sense data with sense descriptors (in hex):
   Jan  3 09:49:40 HOST kernel: [2272121.968729]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
   Jan  3 09:49:40 HOST kernel: [2272121.968743]         35 f1 7f 78
   Jan  3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
   Jan  3 09:49:40 HOST kernel: [2272121.968759] sd 1:0:0:0: [sda] CDB: Read(10): 28 00 35 f1 7f 78 00 00 38 00
   Jan  3 09:49:40 HOST kernel: [2272121.968778] ata2: EH complete
Jan  3 09:47:45 HOST kernel: [2272007.394223]  [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394232]  [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394241]  [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394253]  [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394262]  [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394266]  [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan  3 09:47:45 HOST kernel: [2272007.394270]  [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394274]  [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan  3 09:47:45 HOST kernel: [2272007.394278]  [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan  3 09:47:45 HOST kernel: [2272007.394282]  [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394287]  [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan  3 09:47:45 HOST kernel: [2272007.394295]  [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394299]  [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan  3 09:47:45 HOST kernel: [2272007.394303]  [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan  3 09:47:45 HOST kernel: [2272007.394307]  [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394311]  [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394315]  [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan  3 09:47:45 HOST kernel: [2272007.394319] INFO: task mysqld:1241 blocked for more than 120 seconds.
Jan  3 09:47:45 HOST kernel: [2272007.394389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  3 09:47:45 HOST kernel: [2272007.394581] mysqld        D ffff88004dda2f40     0  1241   3454    0 0x00000000
Jan  3 09:47:45 HOST kernel: [2272007.394585]  ffff88007df63958 0000000000000082 0000000000000000 00000000ffffffff
Jan  3 09:47:45 HOST kernel: [2272007.394590]  ffff8800ffffffff 0000000000055c14 ffff88007df638e8 ffffffff8112806e
Jan  3 09:47:45 HOST kernel: [2272007.394594]  000000000001b900 ffff88004dda3508 ffff88007df63fd8 000000000001e9c0
Jan  3 09:47:45 HOST kernel: [2272007.394598] Call Trace:
Jan  3 09:47:45 HOST kernel: [2272007.394601]  [<ffffffff8112806e>] ? find_get_page+0x1e/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394608]  [<ffffffffa006d0bd>] do_get_write_access+0x29d/0x510 [jbd2]
Jan  3 09:47:45 HOST kernel: [2272007.394612]  [<ffffffff81097db0>] ? wake_bit_function+0x0/0x50
Jan  3 09:47:45 HOST kernel: [2272007.394618]  [<ffffffffa006d481>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan  3 09:47:45 HOST kernel: [2272007.394629]  [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394643]  [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394653]  [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394664]  [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394677]  [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394683]  [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan  3 09:47:45 HOST kernel: [2272007.394690]  [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394697]  [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan  3 09:47:45 HOST kernel: [2272007.394704]  [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan  3 09:47:45 HOST kernel: [2272007.394712]  [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394719]  [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan  3 09:47:45 HOST kernel: [2272007.394730]  [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394738]  [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan  3 09:47:45 HOST kernel: [2272007.394744]  [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan  3 09:47:45 HOST kernel: [2272007.394751]  [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394757]  [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394764]  [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan  3 09:47:52 HOST kernel: [2272013.885915] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan  3 09:47:52 HOST kernel: [2272013.885998] ata2.00: BMDMA stat 0x25

恭喜,您擁有經典的 URE。您的錯誤消息甚至明確說明了這一點。

   Jan  3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed

讓您的數據中心更換有缺陷的磁碟。

我看到多條“DRDY ERR”消息,這些消息僅與硬碟驅動器故障有關。你有沒有跑去fsck -cc找壞扇區並標記它們?

注意:確保您啟動到另一個作業系統,因為您確實不應該在已安裝的分區上執行 fsck。並備份備份備份!

引用自:https://serverfault.com/questions/462680