Lvm
從 LVM 卷組中刪除故障驅動器…並從不完整的 LV(缺少 PV)中恢復部分數據
我一直在與這個問題作鬥爭一段時間。
我有一個帶有 3 個磁碟、1.5TB、2TB 和 3TB 的邏輯卷。1.5TB 驅動器出現故障。大量 I/O 錯誤和死壞扇區。我開始 pvmove 將故障驅動器上的現有擴展區移動到 3TB 驅動器(剩餘空間足夠)。我移動了 99% 的範圍,但最後一個百分比似乎無法閱讀。讀取失敗並且 pvmove 退出。
這是目前狀態:
root@server:~# pvdisplay /dev/sdd: read failed after 0 of 4096 at 0: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error /dev/sdd: read failed after 0 of 4096 at 4096: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK. --- Physical volume --- PV Name /dev/sda # old, working drive VG Name lvm_group1 PV Size 1.82 TiB / not usable 1.09 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 476932 Free PE 0 Allocated PE 476932 PV UUID FEoDYU-Lhjf-FdI1-Ei5p-koue-PIma-TGvs9A --- Physical volume --- PV Name /dev/sdd1 # old failing drive VG Name lvm_group1 PV Size 1.36 TiB / not usable 2.40 MiB Allocatable NO PE Size 4.00 MiB Total PE 357699 Free PE 357600 Allocated PE 99 PV UUID hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK --- Physical volume --- PV Name /dev/sdf # new drive VG Name lvm_group1 PV Size 2.73 TiB / not usable 4.46 MiB Allocatable yes PE Size 4.00 MiB Total PE 715396 Free PE 357746 Allocated PE 357650 PV UUID qs4BVK-PAPv-I1DG-x5wJ-dRNq-vhBE-wQeJL6
這是 pvmove 所說的:
root@server:~# pvmove /dev/sdd1:335950-336500 /dev/sdf --verbose Finding volume group "lvm_group1" Archiving volume group "lvm_group1" metadata (seqno 93). Creating logical volume pvmove0 Moving 50 extents of logical volume lvm_group1/cryptex Found volume group "lvm_group1" activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/cryptex Updating volume group metadata Found volume group "lvm_group1" Found volume group "lvm_group1" Creating lvm_group1-pvmove0 Loading lvm_group1-pvmove0 table (253:2) Loading lvm_group1-cryptex table (253:0) Suspending lvm_group1-cryptex (253:0) with device flush Suspending lvm_group1-pvmove0 (253:2) with device flush Found volume group "lvm_group1" activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/pvmove0 Resuming lvm_group1-pvmove0 (253:2) Found volume group "lvm_group1" Loading lvm_group1-pvmove0 table (253:2) Suppressed lvm_group1-pvmove0 identical table reload. Resuming lvm_group1-cryptex (253:0) Creating volume group backup "/etc/lvm/backup/lvm_group1" (seqno 94). Checking progress before waiting every 15 seconds /dev/sdd1: Moved: 4.0% /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error No physical volume label read from /dev/sdd1 Physical volume /dev/sdd1 not found ABORTING: Can't reread PV /dev/sdd1 ABORTING: Can't reread VG for /dev/sdd1
故障驅動器上只剩下 99 個擴展區。我可以失去這些數據 - 我只想拉這個驅動器並將其扔掉,而不會失去其他驅動器上的數據。
所以我嘗試了 pvremove:
root@server:~# pvremove /dev/sdd1 /dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error No physical volume label read from /dev/sdd1 Physical Volume /dev/sdd1 not found
然後vgreduce:
root@server:~# vgreduce lvm_group1 --removemissing /dev/sdd: read failed after 0 of 4096 at 0: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error /dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error /dev/sdd: read failed after 0 of 4096 at 4096: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error /dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error /dev/sdd1: read failed after 0 of 4096 at 0: Input/output error /dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK. WARNING: Partial LV cryptex needs to be repaired or removed. WARNING: Partial LV pvmove0 needs to be repaired or removed. There are still partial LVs in VG lvm_group1. To remove them unconditionally use: vgreduce --removemissing --force. Proceeding to remove empty missing PVs.
pvdisplay 仍然顯示故障驅動器…
有任何想法嗎?
最後我通過手動編輯解決了這個問題
/etc/lvm/backup/lvm_group1
。以下是其他人遇到此問題的步驟:
- 我從伺服器上物理刪除了死驅動器
- 我執行了
vgreduce lvm_group1 --removemissing --force
- 我從配置中刪除了死驅動器
- 我在“好”驅動器上添加了另一個條帶,以代替死驅動器上不可讀的範圍。
- 我執行了
vgcfgrestore -f edited_config_file.cfg lvm_group1
- 重啟
- 瞧!驅動器可見並且可以安裝。
我只花了 4 天的時間學習 LVM 來解決這個問題……
到目前為止,它看起來不錯。沒有錯誤。露營快樂。