Lvm

從 LVM 卷組中刪除故障驅動器…並從不完整的 LV(缺少 PV)中恢復部分數據

  • September 14, 2017

我一直在與這個問題作鬥爭一段時間。

我有一個帶有 3 個磁碟、1.5TB、2TB 和 3TB 的邏輯卷。1.5TB 驅動器出現故障。大量 I/O 錯誤和死壞扇區。我開始 pvmove 將故障驅動器上的現有擴展區移動到 3TB 驅動器(剩餘空間足夠)。我移動了 99% 的範圍,但最後一個百分比似乎無法閱讀。讀取失敗並且 pvmove 退出。

這是目前狀態:

root@server:~# pvdisplay 
/dev/sdd: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error
/dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK.
--- Physical volume ---
PV Name               /dev/sda # old, working drive
VG Name               lvm_group1
PV Size               1.82 TiB / not usable 1.09 MiB
Allocatable           yes (but full)
PE Size               4.00 MiB
Total PE              476932
Free PE               0
Allocated PE          476932
PV UUID               FEoDYU-Lhjf-FdI1-Ei5p-koue-PIma-TGvs9A

--- Physical volume ---
PV Name               /dev/sdd1  # old failing drive
VG Name               lvm_group1
PV Size               1.36 TiB / not usable 2.40 MiB
Allocatable           NO
PE Size               4.00 MiB
Total PE              357699
Free PE               357600
Allocated PE          99
PV UUID               hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK

--- Physical volume ---
PV Name               /dev/sdf # new drive
VG Name               lvm_group1
PV Size               2.73 TiB / not usable 4.46 MiB
Allocatable           yes 
PE Size               4.00 MiB
Total PE              715396
Free PE               357746
Allocated PE          357650
PV UUID               qs4BVK-PAPv-I1DG-x5wJ-dRNq-vhBE-wQeJL6

這是 pvmove 所說的:

root@server:~# pvmove /dev/sdd1:335950-336500 /dev/sdf --verbose
Finding volume group "lvm_group1"
Archiving volume group "lvm_group1" metadata (seqno 93).
Creating logical volume pvmove0
Moving 50 extents of logical volume lvm_group1/cryptex
Found volume group "lvm_group1"
activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/cryptex
Updating volume group metadata
Found volume group "lvm_group1"
Found volume group "lvm_group1"
Creating lvm_group1-pvmove0
Loading lvm_group1-pvmove0 table (253:2)
Loading lvm_group1-cryptex table (253:0)
Suspending lvm_group1-cryptex (253:0) with device flush
Suspending lvm_group1-pvmove0 (253:2) with device flush
Found volume group "lvm_group1"
activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/pvmove0
Resuming lvm_group1-pvmove0 (253:2)
Found volume group "lvm_group1"
Loading lvm_group1-pvmove0 table (253:2)
Suppressed lvm_group1-pvmove0 identical table reload.
Resuming lvm_group1-cryptex (253:0)
Creating volume group backup "/etc/lvm/backup/lvm_group1" (seqno 94).
Checking progress before waiting every 15 seconds
/dev/sdd1: Moved: 4.0%
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
No physical volume label read from /dev/sdd1
Physical volume /dev/sdd1 not found
ABORTING: Can't reread PV /dev/sdd1
ABORTING: Can't reread VG for /dev/sdd1

故障驅動器上只剩下 99 個擴展區。我可以失去這些數據 - 我只想拉這個驅動器並將其扔掉,而不會失去其他驅動器上的數據。

所以我嘗試了 pvremove:

root@server:~# pvremove /dev/sdd1
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
No physical volume label read from /dev/sdd1
Physical Volume /dev/sdd1 not found

然後vgreduce:

root@server:~# vgreduce lvm_group1  --removemissing
/dev/sdd: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error
/dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK.
WARNING: Partial LV cryptex needs to be repaired or removed. 
WARNING: Partial LV pvmove0 needs to be repaired or removed. 
There are still partial LVs in VG lvm_group1.
To remove them unconditionally use: vgreduce --removemissing --force.
Proceeding to remove empty missing PVs.

pvdisplay 仍然顯示故障驅動器…

有任何想法嗎?

最後我通過手動編輯解決了這個問題/etc/lvm/backup/lvm_group1

以下是其他人遇到此問題的步驟:

  1. 我從伺服器上物理刪除了死驅動器
  2. 我執行了 vgreduce lvm_group1 --removemissing --force
  3. 我從配置中刪除了死驅動器
  4. 我在“好”驅動器上添加了另一個條帶,以代替死驅動器上不可讀的範圍。
  5. 我執行了vgcfgrestore -f edited_config_file.cfg lvm_group1
  6. 重啟
  7. 瞧!驅動器可見並且可以安裝。

我只花了 4 天的時間學習 LVM 來解決這個問題……

到目前為止,它看起來不錯。沒有錯誤。露營快樂。

引用自:https://serverfault.com/questions/665349