Ubuntu

Linux:添加分區時重建軟體 Raid 1 失敗

  • January 23, 2015

昨天我遇到了一個軟體 Raid 問題,必須更換一個磁碟。我使用從陣列中刪除了分區

mdadm /dev/mdx -r /dev/sdbx

在託管中心更換故障驅動器後,我將分區表應用到新磁碟(sdb 是壞設備)

sgdisk -R /dev/sdb /dev/sda 

給它一個新的ID:

sgdisk -G /dev/sdb

然後我再次使用以下方法添加了所有分區:

mdadm /dev/mdx -r /dev/sdbx

這對所有分區都很順利,除了一個,幾個小時後大約在 60% 後退出 這是目前的 RAID 狀態:

cat /proc/mdstat 
Personalities : [raid1] 
md5 : active raid1 sda6[0] sdb6[2](S)
     2633910528 blocks super 1.2 [2/1] [U_]

md4 : active raid1 sda5[0] sdb5[2]
     16768896 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sda4[0] sdb4[2]
     2096064 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[2]
     268304192 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[2]
     523968 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[2]
     8384448 blocks super 1.2 [2/2] [UU]

unused devices: <none>

在 syslog 中,我可以看到如下消息:

n 23 14:24:04 rescue kernel: [11163.329021] ata1.00: exception Emask 0x0 SAct 0xf00000 SErr 0x0 action 0x0
Jan 23 14:24:04 rescue kernel: [11163.376449] ata1.00: configured for UDMA/133
Jan 23 14:24:04 rescue kernel: [11163.376475] sd 0:0:0:0: [sda] Unhandled sense code
Jan 23 14:24:04 rescue kernel: [11163.376477] sd 0:0:0:0: [sda]  
Jan 23 14:24:04 rescue kernel: [11163.376479] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 23 14:24:04 rescue kernel: [11163.376481] sd 0:0:0:0: [sda]  
Jan 23 14:24:04 rescue kernel: [11163.376483] Sense Key : Medium Error [current] [descriptor]
Jan 23 14:24:04 rescue kernel: [11163.376486] Descriptor sense data with sense descriptors (in hex):
Jan 23 14:24:04 rescue kernel: [11163.376487]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Jan 23 14:24:04 rescue kernel: [11163.376495]         ce 1f 0d 58 
Jan 23 14:24:04 rescue kernel: [11163.376498] sd 0:0:0:0: [sda]  
Jan 23 14:24:04 rescue kernel: [11163.376501] Add. Sense: Unrecovered read error - auto reallocate failed
Jan 23 14:24:04 rescue kernel: [11163.376503] sd 0:0:0:0: [sda] CDB: 
Jan 23 14:24:04 rescue kernel: [11163.376504] Read(16): 88 00 00 00 00 00 ce 1f 0b 80 00 00 04 00 00 00
Jan 23 14:24:04 rescue kernel: [11163.376513] end_request: I/O error, dev sda, sector 3458141528

Jan 23 14:35:22 rescue kernel: [11840.396206] ata1.00: configured for UDMA/133
Jan 23 14:35:22 rescue kernel: [11840.396212] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396216] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396220] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396223] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396230] ata1: EH complete
Jan 23 14:35:52 rescue kernel: [11870.888343] ata1.00: exception Emask 0x0 SAct 0x40000007 SErr 0x0 action 0x6 frozen
Jan 23 14:35:52 rescue kernel: [11870.945207] ata1.00: cmd 60/00:08:80:c3:58/04:00:ce:00:00/40 tag 1 ncq 524288 in
Jan 23 14:35:52 rescue kernel: [11870.945207]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 23 14:35:52 rescue kernel: [11870.982487] ata1.00: cmd 60/80:10:00:c0:58/03:00:ce:00:00/40 tag 2 ncq 458752 in
Jan 23 14:35:52 rescue kernel: [11870.982487]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 23 14:35:53 rescue kernel: [11871.019291] ata1.00: cmd 60/00:f0:80:cb:58/04:00:ce:00:00/40 tag 30 ncq 524288 in
Jan 23 14:35:53 rescue kernel: [11871.019291]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 23 14:35:53 rescue kernel: [11871.055486] ata1: hard resetting link
Jan 23 14:35:53 rescue kernel: [11871.707811] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 23 14:35:53 rescue kernel: [11871.708270] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20131218/psargs-359)
Jan 23 14:35:53 rescue kernel: [11871.708279] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88041d869a88), AE_NOT_FOUND (20131
218/psparse-536)
Jan 23 14:35:53 rescue kernel: [11871.709174] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20131218/psargs-359)
Jan 23 14:35:53 rescue kernel: [11871.709182] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88041d869a88), AE_NOT_FOUND (20131
218/psparse-536)

我能夠掛載 /dev/md5 並列出文件。但是我無法將新分區添加到陣列中。

有什麼辦法可以解決這個問題而不會失去分區上的數據?

如果沒有,是否可以僅格式化該單個分區然後再次添加新驅動器?我應該有該分區的最新備份,所以這不是問題。如果可能的話,我只想擦除所有分區。

智能輸出:

/dev/sda:

smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.27] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-1CH166
Serial Number:    Z1F1XJHC
LU WWN Device Id: 5 000c50 04f3fc2c7
Firmware Version: CC24
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Jan 23 16:16:32 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Error SMART Values Read failed: scsi error aborted command
Smartctl: SMART Read Values failed.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

SMART Error Log Version: 1
ATA Error Count: 107 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 107 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 ff ff ff ef 00      15:56:49.931  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:48.680  READ DMA EXT
 ef 10 02 00 00 00 a0 00      15:56:48.644  SET FEATURES [Reserved for Serial ATA]
 27 00 00 00 00 00 e0 00      15:56:48.644  READ NATIVE MAX ADDRESS EXT
 ec 00 00 00 00 00 a0 00      15:56:48.644  IDENTIFY DEVICE

Error 106 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 ff ff ff ef 00      15:56:45.363  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:44.071  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:42.789  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:42.755  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:42.722  READ DMA EXT

Error 105 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 ff ff ff ef 00      15:56:15.716  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:12.832  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:11.540  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:10.290  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:56:09.448  READ DMA EXT

Error 104 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 08 ff ff ff ef 00      15:56:02.563  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:55:59.655  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:55:58.319  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:55:58.069  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:55:57.838  READ DMA EXT

Error 103 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 25 00 80 ff ff ff ef 00      15:55:51.995  READ DMA EXT
 25 00 08 ff ff ff ef 00      15:55:50.735  READ DMA EXT
 ef 10 02 00 00 00 a0 00      15:55:50.700  SET FEATURES [Reserved for Serial ATA]
 27 00 00 00 00 00 e0 00      15:55:50.700  READ NATIVE MAX ADDRESS EXT
 ec 00 00 00 00 00 a0 00      15:55:50.699  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4561         -
# 2  Extended offline    Completed without error       00%      2977         -
# 3  Extended offline    Completed without error       00%         5         -

Device does not support Selective Self Tests/Logging

/dev/sdb:

smartctl -a /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.27] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST33000650NS
Serial Number:    Z295TK0G
LU WWN Device Id: 5 000c50 04f891ded
Firmware Version: 0004
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Jan 23 16:15:30 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                   was completed without error.
                   Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                   without error or no self-test has ever 
                   been run.
Total time to complete Offline 
data collection:        (  600) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                   Auto Offline data collection on/off support.
                   Suspend Offline collection upon new
                   command.
                   Offline surface scan supported.
                   Self-test supported.
                   Conveyance Self-test supported.
                   Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                   power-saving mode.
                   Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                   General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    ( 255) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x10bd) SCT Status supported.
                   SCT Error Recovery Control supported.
                   SCT Feature Control supported.
                   SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x000f   078   053   044    Pre-fail  Always       -       70825960
 3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
 4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       11
 5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1
 7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       791126750
 9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       7155
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       11
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   090   090   000    Old_age   Always       -       10
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   043   045    Old_age   Always   In_the_past 34 (5 173 37 27)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       11
194 Temperature_Celsius     0x0022   034   057   000    Old_age   Always       -       34 (0 24 0 0)
195 Hardware_ECC_Recovered  0x001a   018   007   000    Old_age   Always       -       70825960
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 18 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 61 00 18 ff ff ff 4f 00  26d+03:52:28.560  WRITE FPDMA QUEUED
 60 00 00 ff ff ff 4f 00  26d+03:52:28.560  READ FPDMA QUEUED
 60 00 08 ff ff ff 4f 00  26d+03:52:28.559  READ FPDMA QUEUED
 60 00 08 ff ff ff 4f 00  26d+03:52:28.559  READ FPDMA QUEUED
 60 00 08 ff ff ff 4f 00  26d+03:52:28.559  READ FPDMA QUEUED

Error 17 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 60 00 08 ff ff ff 4f 00  26d+03:52:13.471  READ FPDMA QUEUED
 60 00 58 d0 57 44 43 00  26d+03:52:13.471  READ FPDMA QUEUED
 61 00 02 08 90 6d 49 00  26d+03:52:13.471  WRITE FPDMA QUEUED
 ea 00 00 00 00 00 a0 00  26d+03:52:13.470  FLUSH CACHE EXT
 60 00 00 e0 42 20 4e 00  26d+03:52:13.422  READ FPDMA QUEUED

Error 16 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 60 00 00 ff ff ff 4f 00  26d+03:51:56.176  READ FPDMA QUEUED
 60 00 08 ff ff ff 4f 00  26d+03:51:56.176  READ FPDMA QUEUED
 60 00 08 ff ff ff 4f 00  26d+03:51:56.175  READ FPDMA QUEUED
 60 00 00 e0 0d 20 4e 00  26d+03:51:56.116  READ FPDMA QUEUED
 60 00 00 e0 0c 20 4e 00  26d+03:51:56.114  READ FPDMA QUEUED

Error 15 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 60 00 08 50 59 cb 43 00  26d+03:51:24.077  READ FPDMA QUEUED
 60 00 08 ff ff ff 4f 00  26d+03:51:24.077  READ FPDMA QUEUED
 60 00 00 e0 c5 1c 4e 00  26d+03:51:24.076  READ FPDMA QUEUED
 ea 00 00 00 00 00 a0 00  26d+03:51:24.071  FLUSH CACHE EXT
 60 00 08 28 46 c1 43 00  26d+03:51:22.717  READ FPDMA QUEUED

Error 14 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 60 00 00 ff ff ff 4f 00  26d+03:51:02.317  READ FPDMA QUEUED
 61 00 08 ff ff ff 4f 00  26d+03:51:02.317  WRITE FPDMA QUEUED
 ea 00 00 00 00 00 a0 00  26d+03:51:02.316  FLUSH CACHE EXT
 60 00 08 ff ff ff 4f 00  26d+03:51:02.303  READ FPDMA QUEUED
 60 00 08 ff ff ff 4f 00  26d+03:51:02.300  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7071         -
# 2  Extended offline    Completed without error       00%      7060         -
# 3  Extended offline    Completed without error       00%      5600         -
# 4  Short offline       Completed without error       00%      2489         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

在我看來,這個問題很大程度上取決於sda. 這是鏡像目前唯一的一半,所以如果它不能被讀取,就沒有辦法乾淨地sdb6複制sda6和重新同步鏡像。

我注意到自sda上次通過自檢以來已經過去了將近 10,000 小時,因此硬體故障也可能出現的想法似乎不足為奇。如果您仍然可以讀取/dev/md5您躲過子彈的內容,則意味著不可讀的塊不在文件中。備份該分區的內容,然後也進行替換sda這次將其替換為相當新的磁碟。一切穩定後,重新製作md5設備,然後從備份中恢復。

一旦你得到這個系統備份,確保你有一份cron工作至少每個月或兩個月在兩個驅動器上執行smartctl測試,否則這正是你得到的那種警告,事情正在走向梨形。

引用自:https://serverfault.com/questions/661889