Filesystems

文件系統達到 100% 儲存容量後現在設置為只讀,如何重置為讀寫模式?

  • September 18, 2021

昨天我們的伺服器(Ubuntu 18.04)達到了 100% 的儲存容量 在此處輸入圖像描述 ,並將我們的一個文件系統設置為只讀模式,請參閱:/dev/md3 / ext4 ro,relatime,errors=remount-ro,data=ordered 0 0. 我已經從其他關於 serverfault 的答案中嘗試了幾種解決方案,但似乎沒有一個適合我的情況。

例如,我嘗試執行以下命令:sudo mount -o remount,rw /dev/md3 /,但這會導致消息:mount: /: cannot remount /dev/md3 read-write, is write-protected.

如何解決此問題以使文件系統再次讀寫?

謝謝!

使用調試資訊更新:

mdadm --detail /dev/md3
/dev/md3:
          Version : 0.90
    Creation Time : Fri Nov 10 10:07:34 2017
       Raid Level : raid1
       Array Size : 20478912 (19.53 GiB 20.97 GB)
    Used Dev Size : 20478912 (19.53 GiB 20.97 GB)
     Raid Devices : 2
    Total Devices : 2
  Preferred Minor : 3
      Persistence : Superblock is persistent

      Update Time : Sat Sep 18 09:15:35 2021
            State : clean
   Active Devices : 2
  Working Devices : 2
   Failed Devices : 0
    Spare Devices : 0

Consistency Policy : unknown

             UUID : 4b632ac4:ae1a7c2b:a4d2adc2:26fd5302
           Events : 0.861

   Number   Major   Minor   RaidDevice State
      0       8        3        0      active sync   /dev/sda3
      1       8       19        1      active sync   /dev/sdb3

並使用 dmesg:

dmesg | grep "md3"
[67448453.830094] EXT4-fs error (device md3): ext4_remount:4840: Abort forced by user

執行tune2fs

tune2fs -l /dev/md3
tune2fs 1.44.1 (24-Mar-2018)
Filesystem volume name:   /
Last mounted on:          /
Filesystem UUID:          d1a985c4-8c5e-4034-93e0-629b8e65f161
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1281120
Block count:              5119728
Reserved block count:     255986
Free blocks:              445848
Free inodes:              1001361
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1022
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8160
Inode blocks per group:   510
Flex block group size:    16
Filesystem created:       Fri Nov 10 10:07:39 2017
Last mount time:          Tue Jul 30 17:51:41 2019
Last write time:          Thu Sep 16 20:06:05 2021
Mount count:              7
Maximum mount count:      -1
Last checked:             Fri Nov 10 10:07:39 2017
Check interval:           0 (<none>)
Lifetime writes:          4013 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       663035
Default directory hash:   half_md4
Directory Hash Seed:      ae316af1-086d-470f-af27-0c10ca25f3c8
Journal backup:           inode blocks
FS Error count:           8
First error time:         Thu Sep 16 20:06:04 2021
First error function:     ext4_lookup
First error line #:       1607
First error inode #:      930317
First error block #:      0
Last error time:          Sat Sep 18 09:15:35 2021
Last error function:      ext4_remount
Last error line #:        4840
Last error inode #:       685456
Last error block #:       0

使用調試資訊e2fsck -n /dev/md3

e2fsck -n /dev/md3
e2fsck 1.44.1 (24-Mar-2018)
Warning: skipping journal recovery because doing a read-only filesystem check.
/ contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 101 was part of the orphaned inode list.  IGNORED.
Inode 117 was part of the orphaned inode list.  IGNORED.
Inode 292 was part of the orphaned inode list.  IGNORED.
Inode 460 was part of the orphaned inode list.  IGNORED.
Inode 465 was part of the orphaned inode list.  IGNORED.
Inode 471 was part of the orphaned inode list.  IGNORED.
Inode 487 was part of the orphaned inode list.  IGNORED.
Inode 529 was part of the orphaned inode list.  IGNORED.
Inode 562 was part of the orphaned inode list.  IGNORED.
Inode 564 was part of the orphaned inode list.  IGNORED.
Inode 707 was part of the orphaned inode list.  IGNORED.
Inode 723 was part of the orphaned inode list.  IGNORED.
Inode 918 was part of the orphaned inode list.  IGNORED.
...
Deleted inode 402614 has zero dtime.  Fix? no
...
Inode 783370, end of extent exceeds allowed value
   (logical block 1024, physical block 3068928, len 76)
Clear? no

Inode 783370, i_blocks is 8784, should be 8200.  Fix? no

Inode 783470, end of extent exceeds allowed value
   (logical block 2708, physical block 1322783, len 193)
Clear? no

Inode 783470, i_blocks is 23200, should be 21672.  Fix? no

Inode 1047956 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure
Entry 'tmp' in /tmp/systemd-private-bb09aae54cab4e12844e5844d11ca5eb-certbot.service-VSBnVY (685456) has deleted/unused inode 685457.  Clear? no

Entry '1159_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920168.  Clear? no

Entry '1159_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no

Entry '1110_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has deleted/unused inode 920176.  Clear? no

Entry '1110_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has an incorrect filetype (was 1, should be 0).
Fix? no

Entry '1106_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920166.  Clear? no

Entry '1106_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no

Entry '1109_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920173.  Clear? no

Entry '1109_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no

Entry '1146_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has deleted/unused inode 920172.  Clear? no

Entry '1146_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has an incorrect filetype (was 1, should be 0).
Fix? no
...
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 685456 ref count is 3, should be 2.  Fix? no

Pass 5: Checking group summary information
Block bitmap differences:  -34565 -(53721--53734) -(59721--59761) -(59981--59983) -(61106--61184) -(61540--61544) -(70964--71007) -(71274--71313) -(84938--84989) -(85084--85107) -(85592--85599) -(116400--116408) -(116423--116436) -(128700--128703) -(128708--128721) -(138904--138914) -(165045--165150) -(169691--169713) -(169717--169742) -(464896--471464) -(471552--471989) -(472928--472947) -(499200--499612) -(501408--501434) -(503808--504070) -(513024--513301) -(513408--513491) -(589477--589480) -(711431--711441) -(747968--748030) -(838733--838740) -(838755--838758) -(838772--838783) -(838791--838800) -(838805--838816) -(838824--838835) -(848384--848972) -(875840--875880) -(1032187--1033031) -(1083840--1083878) -(1120110--1120132) -(1322783--1322975) -(1631196--1631251) -(1635150--1635169) -(1635360--1635391) -(1635571--1635575) -(1635848--1635855) -(1635996--1636001) -1648860 -1648880 -(1715533--1715536) -(1740800--1741311) -(1746432--1746573) -(1750528--1750729) -(1867776--1867880) -(1870717--1871294) -(1880576--1880791) -(1888256--1888258) -1888260 -(1888272--1888273) -(1888275--1888767) -(2226402--2226405) -(2235495--2235719) -(2266304--2266332) -(2301560--2301629) -(2528723--2528753) -(2589088--2589117) -(2597312--2597374) -(2597696--2597757) -(2614784--2615295) -(2619392--2619458) -(2619904--2620297) -2636181 -(2671360--2671491) -(2687328--2687350) -(3068928--3069003) -(3196998--3197002) -(3228728--3228738) -(3236697--3236703) -(3252961--3252970) -(3264276--3264277) -(3264287--3264298) -(3285164--3285170) -(3299518--3299524) -(3399680--3400062) -(3441024--3441129) -(3574080--3574142) -(3601664--3601795) -(3659648--3659724) -(3660672--3660755) -(3704233--3704234) -(3704237--3704242) -3707626 -3708898 -3709310 -3709356 -3709398 -3709984 -(3751694--3751696) -(3751707--3751711) -(3751767--3751768) -(3751774--3751775) -(3751800--3751814) -(3771264--3771343) -(3830025--3830040) -(3860480--3867203) -(3867616--3867644) -(3868160--3868618) -(3869696--3870139) -(4045457--4045483) -(4087936--4088023) -(4088032--4088055) -(4088320--4088780) -(4088960--4089064) -(4089088--4089126) -(4091136--4091324) -(4091392--4092119) -(4092928--4094514) -(4094976--4095854) -(4097088--4097120) -(4097536--4097816) -(4109312--4110157) -(4250368--4250378) -(4278497--4278513) -(4296960--4297014) -(4325486--4325616) -(4325632--4325707) -(4326688--4327074) -(4328826--4328961) -(4329202--4329314) -(4329600--4329666) -(4329764--4329804) -(4332027--4332178) -(4332406--4332476) -(4333568--4333942) -(4334372--4334454) -(4334564--4335227) -(4621153--4621176) -(4669781--4670170) -(4696470--4696548) -(4697074--4697429) -(4697662--4697711) -(4726778--4727894) -(5055921--5056185) -(5056648--5056667) -(5106412--5106620) -(5106668--5107034)
Fix? no

Free blocks count wrong for group #76 (3374, counted=3375).
Fix? no

Free blocks count wrong (445848, counted=445849).
Fix? no

Inode bitmap differences:  -101 -117 -292 -460 -465 -471 -487 -529 -562 -564 -707 -723 -918 -(1837--1838) -2041 -2714 -3593 -3654 -3659 -3894 -3976 -4336 -4425 -5193 -5244 -5252 -5930 -5951 -5967 -(7066--7069) -7431 -8492 -8651 -9298 -9583 -9592 -14261 -14270 -18093 -19214 -21301 -(27843--27844) -27847 -27849 -(27853--27856) -(27868--27869) -(27872--27873) -27875 -27879 -27883 -27885 -(27889--27890) -27892 -162842 -391708 -391741 -391759 -391763 -(391800--391802) -(391804--391805) -(391812--391814) -(391831--391833) -391870 -391873 -391878 -391900 -391902 -(391910--391911) -391915 -391919 -391927 -391956 -392493 -392719 -393759 -393795 -395132 -395134 -395161 -395165 -395221 -395234 -395267 -395289 -(395312--395313) -395315 -395325 -395336 -395387 -395630 -396550 -396589 -(396699--396700) -402594 -(402596--402598) -402601 -(402604--402606) -402608 -(402611--402614) -407918 -413872 -413874 -413881 -413885 -413897 -413900 -413908 -421042 -421202 -421226 -426391 -652905 -(652931--652935) -663035 -685457 -920162 -(920164--920176) -1047956
Fix? no

Directories count wrong for group #84 (17, counted=16).
Fix? no

Free inodes count wrong for group #96 (80, counted=82).
Fix? no

Free inodes count wrong for group #112 (486, counted=487).
Fix? no

Free inodes count wrong (1001361, counted=1001364).
Fix? no


/: ********** WARNING: Filesystem still has errors **********

/: 279759/1281120 files (0.7% non-contiguous), 4673880/5119728 blocks

正是文件系統損壞導致此切換為只讀模式,而不是其溢出,完全遵循 mount 選項errors=remount-ro

備份重要數據和配置並將它們下載到某處。如果啟動重要的東西被破壞,請為案例準備恢復計劃。如果可能,將重要的服務移到另一台機器上。會有一些停機時間。

我注意到這個系統不會經常重啟(自 2017 年以來只有 7 次安裝,上次重啟是在 2019 年)。所以我建議將最大掛載計數設置為 1,這樣每次啟動都會檢查它:

tune2fs -c 1 /dev/md3

然後重新啟動。初始化腳本應在引導期間檢查並修復文件系統。但是,損壞可能非常嚴重,因此可能需要手動互動,因此請確保有人在伺服器附近並準備好幫助您。而且,如果這種腐敗觸動了一些重要的事情,你可能會遇到奇怪的問題。

在最壞的情況下,您將不得不重新安裝系統。但不要忘記再次將最大安裝計數設置為 1。

為什麼文件系統損壞了?它只是發生。塊儲存在記憶體中,並且由於宇宙射線的原因,可能在那裡發生了損壞。非常罕見的情況,有時會發生。然後,磁碟也不理想,無法檢測到所有錯誤;存在非零位錯誤率(在您的設備數據表中查找實際值),因此數據被讀取損壞的可能性非常低,但仍有可能。如果這發生在元數據塊上,問題可能會累積(由錯誤資訊引導的文件系統驅動程序可能會做出一些不正確的假設並進一步破壞文件系統),這就是為什麼不時檢查它很重要的原因。

引用自:https://serverfault.com/questions/1077949