Hard-Drive
/dev/md1 上的 DegradedArray 事件
今天早上我收到了這樣一條消息:
This is an automatically generated mail message from mdadm running on A DegradedArray event had been detected on md device /dev/md1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] md1 : active raid1 sdb3[2](F) sda3[1] 1860516800 blocks [2/1] [_U] md0 : active raid1 sdb1[0] sda1[1] 499904 blocks [2/2] [UU] unused devices: <none>
這是否意味著 1 個硬碟驅動器不再工作?我該如何解決這個問題?我應該要求數據中心更換硬碟嗎?我可以嘗試重新添加失去的設備嗎?如果是,我應該執行什麼命令並且重新添加是否安全?我只是不希望我的伺服器離線。
serv397:/var/log# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb3[2](F) sda3[1] 1860516800 blocks [2/1] [_U] md0 : active raid1 sdb1[0] sda1[1] 499904 blocks [2/2] [UU] unused devices: <none> serv397:/var/log# mdadm -D /dev/md1 /dev/md1: Version : 0.90 Creation Time : Sun Apr 29 22:51:51 2012 Raid Level : raid1 Array Size : 1860516800 (1774.33 GiB 1905.17 GB) Used Dev Size : 1860516800 (1774.33 GiB 1905.17 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sat Feb 23 09:26:39 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 UUID : ec02d5ce:8554d4ad:7792c71e:7dc17aa4 Events : 0.11225668 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 3 1 active sync /dev/sda3 2 8 19 - faulty spare /dev/sdb3 kern.log Feb 23 09:00:58 triton1017 kernel: [24015352.812156] __ratelimit: 134 callbacks suppressed Feb 23 09:00:58 triton1017 kernel: [24015352.812165] mdadm: sending ioctl 1261 to a partition! Feb 23 09:00:58 triton1017 kernel: [24015352.812172] mdadm: sending ioctl 1261 to a partition! mdam: [ 1.929981] mdadm: sending ioctl 1261 to a partition! [ 1.930211] mdadm: sending ioctl 800c0910 to a partition! [ 1.930241] mdadm: sending ioctl 800c0910 to a partition! [ 1.944515] md: md0 stopped. [ 1.945700] md: bind<sda1> [ 1.945944] md: bind<sdb1> [ 1.947709] raid1: raid set md0 active with 2 out of 2 mirrors [ 1.947784] md0: detected capacity change from 0 to 511901696 [ 1.948516] md0: unknown partition table [ 1.984932] md: md1 stopped. [ 1.986131] md: bind<sda3> [ 1.986332] md: bind<sdb3> [ 1.987377] raid1: raid set md1 active with 2 out of 2 mirrors [ 1.987421] md1: detected capacity change from 0 to 1905169203200 [ 1.988287] md1: unknown partition table [ 2.164118] kjournald starting. Commit interval 5 seconds [ 2.164130] EXT3-fs: mounted filesystem with ordered data mode. [ 3.181350] udev[346]: starting version 164 [ 3.644863] input: PC Speaker as /devices/platform/pcspkr/input/input3 [ 3.654062] Error: Driver 'pcspkr' is already registered, aborting... [ 3.663045] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 [ 3.810284] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 3.812865] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 [ 3.860102] [drm] Initialized drm 1.1.0 20060810 [ 3.884550] hda-intel: no codecs found! [ 3.884672] HDA Intel 0000:01:05.1: setting latency timer to 64 [ 3.925197] [drm] radeon defaulting to userspace modesetting. [ 3.925973] pci 0000:01:05.0: setting latency timer to 64 [ 3.926082] [drm] Initialized radeon 1.32.0 20080528 for 0000:01:05.0 on minor 0 [ 4.123784] Adding 1998840k swap on /dev/sda2. Priority:-1 extents:1 across:1998840k [ 4.126482] Adding 1998840k swap on /dev/sdb2. Priority:-2 extents:1 across:1998840k [ 4.332550] EXT3 FS on md1, internal journal [ 5.247285] alloc irq_desc for 25 on node -1 [ 5.247287] alloc kstat_irqs on node -1 [ 5.247299] tg3 0000:02:00.0: irq 25 for MSI/MSI-X [ 5.275326] ADDRCONF(NETDEV_UP): eth0: link is not ready
試圖閱讀:
sudo mdadm --re-add /dev/md1 /dev/sdb3 mdadm: Cannot open /dev/sdb3: Device or resource busy sudo mdadm --remove /dev/md1 /dev/sdb3 mdadm: hot removed /dev/sdb3 from /dev/md1 sudo mdadm --add /dev/md1 /dev/sdb3 mdadm: re-added /dev/sdb3 /var/log# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb3[2] sda3[1] 1860516800 blocks [2/1] [_U] [>....................] recovery = 0.1% (2849024/1860516800) finish=455.9min speed=67898K/sec md0 : active raid1 sdb1[0] sda1[1] 499904 blocks [2/2] [UU] unused devices: <none>
重新同步沒有解決問題:
triton1017:/var/log# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb3[2](S) sda3[1] 1860516800 blocks [2/1] [_U] md0 : active raid1 sdb1[0] sda1[1] 499904 blocks [2/2] [UU] unused devices: <none> triton1017:/var/log# mdadm -D /dev/md1 /dev/md1: Version : 0.90 Creation Time : Sun Apr 29 22:51:51 2012 Raid Level : raid1 Array Size : 1860516800 (1774.33 GiB 1905.17 GB) Used Dev Size : 1860516800 (1774.33 GiB 1905.17 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sat Feb 23 18:14:08 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : ec02d5ce:8554d4ad:7792c71e:7dc17aa4 Events : 0.11245156 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 3 1 active sync /dev/sda3 2 8 19 - spare /dev/sdb3 kern.log files shows the following: Feb 23 14:55:19 triton1017 kernel: [24036613.378608] ata1.00: error: { UNC } Feb 23 14:55:19 triton1017 kernel: [24036613.398590] ata1.00: configured for UDMA/133 Feb 23 14:55:19 triton1017 kernel: [24036613.398627] ata1: EH complete Feb 23 14:55:21 triton1017 kernel: [24036616.262518] ata1.00: exception Emask 0x0 SAct 0x1dfbe SErr 0x0 action 0x0 Feb 23 14:55:21 triton1017 kernel: [24036616.262525] ata1.00: irq_stat 0x40000008 Feb 23 14:55:21 triton1017 kernel: [24036616.262531] ata1.00: failed command: READ FPDMA QUEUED Feb 23 14:55:21 triton1017 kernel: [24036616.262539] ata1.00: cmd 60/80:28:00:5a:b4/00:00:75:00:00/40 tag 5 ncq 65536 in Feb 23 14:55:21 triton1017 kernel: [24036616.262540] res 41/40:80:38:5a:b4/00:00:75:00:00/00 Emask 0x409 (media error) <F> Feb 23 14:57:16 triton1017 kernel: [24036730.503323] ata1.00: status: { DRDY ERR } Feb 23 14:57:16 triton1017 kernel: [24036730.503328] ata1.00: error: { UNC } Feb 23 14:57:16 triton1017 kernel: [24036730.523346] ata1.00: configured for UDMA/133 Feb 23 14:57:16 triton1017 kernel: [24036730.523356] ata1: EH complete Feb 23 14:57:17 triton1017 kernel: [24036732.116026] INFO: task mysqld:6067 blocked for more than 120 seconds. Feb 23 14:57:17 triton1017 kernel: [24036732.116032] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 23 14:57:17 triton1017 kernel: [24036732.116040] mysqld D 0000000000000002 0 6067 938 0x00000000 Feb 23 14:57:17 triton1017 kernel: [24036732.116049] ffffffff814891f0 0000000000000086 0000000000000000 00000000ffffffff Feb 23 14:57:17 triton1017 kernel: [24036732.117353] ffff880016dcfc00 0000000000015780 000000000000f9e0 ffff8805c4c65fd8 Feb 23 14:57:17 triton1017 kernel: [24036732.117367] 0000000000015780 0000000000015780 ffff880618825bd0 ffff880618825ec8 Feb 23 14:57:17 triton1017 kernel: [24036732.117380] Call Trace: Feb 23 14:57:17 triton1017 kernel: [24036732.117391] [<ffffffff810168f3>] ? read_tsc+0xa/0x20 Feb 23 14:57:17 triton1017 kernel: [24036732.117400] [<ffffffff8110e656>] ? sync_buffer+0x0/0x40 Feb 23 14:57:17 triton1017 kernel: [24036732.117408] [<ffffffff812fbb4a>] ? io_schedule+0x73/0xb7 Feb 23 14:57:17 triton1017 kernel: [24036732.117419] [<ffffffff8110e691>] ? sync_buffer+0x3b/0x40 Feb 23 14:57:17 triton1017 kernel: [24036732.117426] [<ffffffff812fbf5a>] ? __wait_on_bit_lock+0x3f/0x84 Feb 23 14:57:17 triton1017 kernel: [24036732.117433] [<ffffffff8110e656>] ? sync_buffer+0x0/0x40 Feb 23 14:57:17 triton1017 kernel: [24036732.117441] [<ffffffff812fc00a>] ? out_of_line_wait_on_bit_lock+0x6b/0x77 Feb 23 14:57:17 triton1017 kernel: [24036732.117451] [<ffffffff81065070>] ? wake_bit_function+0x0/0x23 Feb 23 14:57:17 triton1017 kernel: [24036732.117459] [<ffffffff8110ea83>] ? sync_dirty_buffer+0x29/0x93 Feb 23 14:57:17 triton1017 kernel: [24036732.117474] [<ffffffffa018ce04>] ? journal_dirty_data+0xd1/0x1b0 [jbd] Feb 23 14:57:17 triton1017 kernel: [24036732.117486] [<ffffffffa01a3f1f>] ? ext3_journal_dirty_data+0xf/0x34 [ext3] Feb 23 14:57:17 triton1017 kernel: [24036732.117499] [<ffffffffa01a23f9>] ? walk_page_buffers+0x65/0x8b [ext3] Feb 23 14:57:17 triton1017 kernel: [24036732.117510] [<ffffffffa01a3f44>] ? journal_dirty_data_fn+0x0/0x13 [ext3] Feb 23 14:57:17 triton1017 kernel: [24036732.117521] [<ffffffffa01a5a66>] ? ext3_ordered_write_end+0x73/0x10f [ext3] Feb 23 14:57:17 triton1017 kernel: [24036732.117532] [<ffffffffa01b0bbb>] ? ext3_xattr_get+0x1ef/0x271 [ext3] Feb 23 14:57:17 triton1017 kernel: [24036732.117542] [<ffffffff810b517e>] ? generic_file_buffered_write+0x18d/0x278 Feb 23 14:57:17 triton1017 kernel: [24036732.117552] [<ffffffff810b561a>] ? __generic_file_aio_write+0x25f/0x293 Feb 23 14:57:17 triton1017 kernel: [24036732.117560] [<ffffffff810b56a7>] ? generic_file_aio_write+0x59/0x9f Feb 23 14:57:17 triton1017 kernel: [24036732.117569] [<ffffffff810eef1a>] ? do_sync_write+0xce/0x113 Feb 23 14:57:17 triton1017 kernel: [24036732.117577] [<ffffffff81103a85>] ? mntput_no_expire+0x23/0xee Feb 23 14:57:17 triton1017 kernel: [24036732.117584] [<ffffffff81065042>] ? autoremove_wake_function+0x0/0x2e Feb 23 14:57:17 triton1017 kernel: [24036732.117593] [<ffffffff812fce69>] ? _spin_lock_bh+0x9/0x25 Feb 23 14:57:17 triton1017 kernel: [24036732.117600] [<ffffffff810ef86c>] ? vfs_write+0xa9/0x102 Feb 23 14:57:17 triton1017 kernel: [24036732.117607] [<ffffffff810ef91c>] ? sys_pwrite64+0x57/0x77 Feb 23 14:57:17 triton1017 kernel: [24036732.117615] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b Feb 23 14:57:17 triton1017 kernel: [24036732.117622] INFO: task flush-9:1:1456 blocked for more than 120 seconds. Feb 23 14:57:17 triton1017 kernel: [24036732.117628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 23 14:57:17 triton1017 kernel: [24036732.117636] flush-9:1 D 0000000000000000 0 1456 2 0x00000000 Feb 23 14:57:17 triton1017 kernel: [24036732.117645] ffffffff814891f0 0000000000000046 0000000000000000 0000000000000001 Feb 23 14:57:17 triton1017 kernel: [24036732.117659] 0000000000000086 ffffffff8104a45a 000000000000f9e0 ffff88061905dfd8 Feb 23 14:57:17 triton1017 kernel: [24036732.117672] 0000000000015780 0000000000015780 ffff8806190646a0 ffff880619064998 Feb 23 14:57:17 triton1017 kernel: [24036732.117683] Call Trace: Feb 23 14:57:17 triton1017 kernel: [24036732.117691] [<ffffffff8104a45a>] ? try_to_wake_up+0x289/0x29b Feb 23 14:57:17 triton1017 kernel: [24036732.117701] [<ffffffff8119255f>] ? radix_tree_tag_clear+0x93/0xf1 Feb 23 14:57:17 triton1017 kernel: [24036732.117709] [<ffffffff8110e656>] ? sync_buffer+0x0/0x40 Feb 23 14:57:17 triton1017 kernel: [24036732.117716] [<ffffffff812fbb4a>] ? io_schedule+0x73/0xb7 Feb 23 14:57:17 triton1017 kernel: [24036732.117724] [<ffffffff8110e691>] ? sync_buffer+0x3b/0x40 Feb 23 14:57:17 triton1017 kernel: [24036732.117731] [<ffffffff812fbf5a>] ? __wait_on_bit_lock+0x3f/0x84 Feb 23 14:57:17 triton1017 kernel: [24036732.117738] [<ffffffff8110e656>] ? sync_buffer+0x0/0x40 Feb 23 14:57:17 triton1017 kernel: [24036732.117745] [<ffffffff812fc00a>] ? out_of_line_wait_on_bit_lock+0x6b/0x77 Feb 23 14:57:17 triton1017 kernel: [24036732.117753] [<ffffffff81065070>] ? wake_bit_function+0x0/0x23 Feb 23 14:57:17 triton1017 kernel: [24036732.117762] [<ffffffff8110fa23>] ? __block_write_full_page+0x159/0x2ac
您可以嘗試使用以下命令將失敗的成員重新添加到 mdadm 陣列:
sudo mdadm --re-add /dev/md1 /dev/sdb3
如果您遇到資源或設備繁忙錯誤,您可以嘗試以下操作:
sudo mdadm --remove /dev/md1 /dev/sdb3 sudo mdadm --add /dev/md1 /dev/sdb3
如果您嘗試過它們並遇到錯誤,請發布錯誤消息以獲得幫助。
該磁碟實際上是有缺陷的。把它換掉。更換好磁碟後重新同步。