為什麼 XFS 使用 lvm 記憶體塊大小而不是 sunit/swidth 的 raid5 設置
我的虛擬機上有 4 個磁碟可用於測試
sdb
、、、sdc
和。sdd``sde
前 3 個磁碟用於 RAID5 配置,最後一個磁碟用作 lvm 記憶體驅動器。
我不明白的是:
當我創建塊大小為 64KiB 的 50GB 記憶體磁碟時,
xfs_info
給了我以下資訊:[vagrant@node-02 ~]$ xfs_info /data meta-data=/dev/mapper/data-data isize=512 agcount=32, agsize=16777072 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=536866304, imaxpct=5 = sunit=16 swidth=32 blks naming =version 2 bsize=8192 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=262144, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
正如我們在這裡看到的,sunit=16 和 swidth=32 似乎是正確的,並且與 raid5 佈局相匹配。
結果
lsblk -t
[vagrant@node-02 ~]$ lsblk -t NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME sda 0 512 0 512 512 1 deadline 128 4096 0B ├─sda1 0 512 0 512 512 1 deadline 128 4096 0B └─sda2 0 512 0 512 512 1 deadline 128 4096 0B ├─centos-root 0 512 0 512 512 1 128 4096 0B ├─centos-swap 0 512 0 512 512 1 128 4096 0B └─centos-home 0 512 0 512 512 1 128 4096 0B sdb 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B └─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 65536 131072 512 512 1 128 4096 0B sdc 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B └─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 65536 131072 512 512 1 128 4096 0B sdd 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B └─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 65536 131072 512 512 1 128 4096 0B sde 0 512 0 512 512 1 deadline 128 4096 32M sdf 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M │ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B └─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M └─data5-data5 0 65536 131072 512 512 1 128 4096 0B sdg 0 512 0 512 512 1 deadline 128 4096 32M sdh 0 512 0 512 512 1 deadline 128 4096 32M
並
lvdisplay -a -m data
給了我以下資訊:[vagrant@node-02 ~]$ sudo lvdisplay -m -a data --- Logical volume --- LV Path /dev/data/data LV Name data VG Name data LV UUID MBG1p8-beQj-TNDd-Cyx4-QkyN-vdVk-dG6n6I LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:08 +0000 LV Cache pool name cache_data LV Cache origin name data_corig LV Status available # open 1 LV Size <2.00 TiB Cache used blocks 0.06% Cache metadata blocks 0.64% Cache dirty blocks 0.00% Cache read hits/misses 293 / 66 Cache wrt hits/misses 59 / 41173 Cache demotions 0 Cache promotions 486 Current LE 524284 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:9 --- Segments --- Logical extents 0 to 524283: Type cache Chunk size 64.00 KiB Metadata format 2 Mode writethrough Policy smq --- Logical volume --- Internal LV Name cache_data VG Name data LV UUID apACl6-DtfZ-TURM-vxjD-UhxF-tthY-uSYRGq LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:16 +0000 LV Pool metadata cache_data_cmeta LV Pool data cache_data_cdata LV Status NOT available LV Size 50.00 GiB Current LE 12800 Segments 1 Allocation inherit Read ahead sectors auto --- Segments --- Logical extents 0 to 12799: Type cache-pool Chunk size 64.00 KiB Metadata format 2 Mode writethrough Policy smq --- Logical volume --- Internal LV Name cache_data_cmeta VG Name data LV UUID hmkW6M-CKGO-CTUP-rR4v-KnWn-DbBZ-pJeEA2 LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:15 +0000 LV Status available # open 1 LV Size 1.00 GiB Current LE 256 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:11 --- Segments --- Logical extents 0 to 255: Type linear Physical volume /dev/sdf Physical extents 0 to 255 --- Logical volume --- Internal LV Name cache_data_cdata VG Name data LV UUID 9mHe8J-SRiY-l1gl-TO1h-2uCC-Hi10-UpeEVP LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:16 +0000 LV Status available # open 1 LV Size 50.00 GiB Current LE 12800 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:10 --- Segments --- Logical extents 0 to 12799: Type linear Physical volume /dev/sdf Physical extents 256 to 13055 --- Logical volume --- Internal LV Name data_corig VG Name data LV UUID QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4 LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:17 +0000 LV origin of Cache LV data LV Status available # open 1 LV Size <2.00 TiB Current LE 524284 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 768 Block device 253:12 --- Segments --- Logical extents 0 to 524283: Type raid5 Monitoring monitored Raid Data LV 0 Logical volume data_corig_rimage_0 Logical extents 0 to 262141 Raid Data LV 1 Logical volume data_corig_rimage_1 Logical extents 0 to 262141 Raid Data LV 2 Logical volume data_corig_rimage_2 Logical extents 0 to 262141 Raid Metadata LV 0 data_corig_rmeta_0 Raid Metadata LV 1 data_corig_rmeta_1 Raid Metadata LV 2 data_corig_rmeta_2 [vagrant@node-02 ~]$ [vagrant@node-02 ~]$ --- Segments --- Df7SLj LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:08 +0000 LV Status available # open 1 LV Size 1023.99 GiB Current LE 262142 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:8 --- Segments --- Logical extents 0 to 262141: Type linear Physical volume /dev/sdd Physical extents 1 to 262142 --- Logical volume --- Internal LV Name data_corig_rmeta_2 VG Name data LV UUID xi9Ot3-aTnp-bA3z-YL0x-eVaB-87EP-JSM3eN LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:08 +0000 LV Status available # open 1 LV Size 4.00 MiB Current LE 1 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:7 --- Segments --- Logical extents 0 to 0: Type linear Physical volume /dev/sdd Physical extents 0 to 0 --- Logical volume --- Internal LV Name data_corig VG Name data LV UUID QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4 LV Write Access read/write LV Creation host, time node-02, 2019-09-03 13:22:17 +0000 LV origin of Cache LV data LV Status available # open 1 LV Size <2.00 TiB Current LE 524284 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 768 Block device 253:12 --- Segments --- Logical extents 0 to 524283: Type raid5 Monitoring monitored Raid Data LV 0 Logical volume data_corig_rimage_0 Logical extents 0 to 262141 Raid Data LV 1 Logical volume data_corig_rimage_1 Logical extents 0 to 262141 Raid Data LV 2 Logical volume data_corig_rimage_2 Logical extents 0 to 262141 Raid Metadata LV 0 data_corig_rmeta_0 Raid Metadata LV 1 data_corig_rmeta_1 Raid Metadata LV 2 data_corig_rmeta_2
我們可以清楚地看到分段中 64KiB 的塊大小。
但是當我創建一個 250GB 的記憶體磁碟時,lvm 至少需要一個 288KiB 的塊大小來容納該記憶體磁碟的大小。但是當我執行
xfs_info
這些sunit/swidth
值時突然匹配記憶體驅動器而不是 RAID5 佈局。輸出
xfs_info
[vagrant@node-02 ~]$ xfs_info /data meta-data=/dev/mapper/data-data isize=512 agcount=32, agsize=16777152 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=536866816, imaxpct=5 = sunit=72 swidth=72 blks naming =version 2 bsize=8192 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=262144, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
突然我們有一個72 的
sunit
和swidth
匹配記憶體驅動器的 288KiB 的塊大小,我們可以看到這個lvdisplay -m -a
[vagrant@node-02 ~]$ sudo lvdisplay -m -a data --- Logical volume --- LV Path /dev/data/data LV Name data VG Name data LV UUID XLHw3w-RkG9-UNh6-WZBM-HtjM-KcV6-6dOdnG LV Write Access read/write LV Creation host, time node-2, 2019-09-03 13:36:32 +0000 LV Cache pool name cache_data LV Cache origin name data_corig LV Status available # open 1 LV Size <2.00 TiB Cache used blocks 0.17% Cache metadata blocks 0.71% Cache dirty blocks 0.00% Cache read hits/misses 202 / 59 Cache wrt hits/misses 8939 / 34110 Cache demotions 0 Cache promotions 1526 Current LE 524284 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:9 --- Segments --- Logical extents 0 to 524283: Type cache Chunk size 288.00 KiB Metadata format 2 Mode writethrough Policy smq --- Logical volume --- Internal LV Name cache_data VG Name data LV UUID Ps7Z1P-y5Ae-ju80-SZjc-yB6S-YBtx-SWL9vO LV Write Access read/write LV Creation host, time node-2, 2019-09-03 13:36:40 +0000 LV Pool metadata cache_data_cmeta LV Pool data cache_data_cdata LV Status NOT available LV Size 250.00 GiB Current LE 64000 Segments 1 Allocation inherit Read ahead sectors auto --- Segments --- Logical extents 0 to 63999: Type cache-pool Chunk size 288.00 KiB Metadata format 2 Mode writethrough Policy smq --- Logical volume --- Internal LV Name cache_data_cmeta VG Name data LV UUID k4rVn9-lPJm-2Vvt-77jw-NP1K-PTOs-zFy2ph LV Write Access read/write LV Creation host, time node-2, 2019-09-03 13:36:39 +0000 LV Status available # open 1 LV Size 1.00 GiB Current LE 256 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:11 --- Segments --- Logical extents 0 to 255: Type linear Physical volume /dev/sdf Physical extents 0 to 255 --- Logical volume --- Internal LV Name cache_data_cdata VG Name data LV UUID dm571W-f9eX-aFMA-SrPC-PYdd-zs45-ypLksd LV Write Access read/write LV Creation host, time node-2, 2019-09-03 13:36:39 +0000 LV Status available # open 1 LV Size 250.00 GiB Current LE 64000 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:10 --- Logical volume --- Internal LV Name data_corig VG Name data LV UUID hbYiRO-YnV8-gd1B-shQD-N3SR-xpTl-rOjX8V LV Write Access read/write LV Creation host, time node-2, 2019-09-03 13:36:41 +0000 LV origin of Cache LV data LV Status available # open 1 LV Size <2.00 TiB Current LE 524284 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 768 Block device 253:12 --- Segments --- Logical extents 0 to 524283: Type raid5 Monitoring monitored Raid Data LV 0 Logical volume data_corig_rimage_0 Logical extents 0 to 262141 Raid Data LV 1 Logical volume data_corig_rimage_1 Logical extents 0 to 262141 Raid Data LV 2 Logical volume data_corig_rimage_2 Logical extents 0 to 262141 Raid Metadata LV 0 data_corig_rmeta_0 Raid Metadata LV 1 data_corig_rmeta_1 Raid Metadata LV 2 data_corig_rmeta_2
和輸出
lsblk -t
[vagrant@node-02 ~]$ lsblk -t NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME sda 0 512 0 512 512 1 deadline 128 4096 0B ├─sda1 0 512 0 512 512 1 deadline 128 4096 0B └─sda2 0 512 0 512 512 1 deadline 128 4096 0B ├─centos-root 0 512 0 512 512 1 128 4096 0B ├─centos-swap 0 512 0 512 512 1 128 4096 0B └─centos-home 0 512 0 512 512 1 128 4096 0B sdb 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sdc 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sdd 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sde 0 512 0 512 512 1 deadline 128 4096 32M sdf 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sdg 0 512 0 512 512 1 deadline 128 4096 32M sdh 0 512 0 512 512 1 deadline 128 4096 32M
這裡有幾個問題。
XFS Autodetect 顯然這些設置,但為什麼 XFS 選擇使用記憶體驅動器的塊大小?正如我們在第一個範例中看到的那樣,它能夠自動檢測 RAID5 佈局。
我知道我可以通過
su/sw
選項來mkfs.xfs
獲得正確的sunit/swidth
值,但是在這種情況下我應該這樣做嗎?我用Google搜尋了好幾天,我查看了 XFS 原始碼,但我找不到 XFS 這樣做的任何線索。
所以出現的問題:
- 為什麼 XFS 會這樣?
- 我應該
su/sw
在執行時手動定義mkfs.xfs
- 記憶體驅動器的塊大小是否會影響 RAID5 設置,是否應該以某種方式對齊?
最優分配策略是一個複雜的問題,因為它取決於各個塊層之間如何互動。
在確定最優分配策略時,
mkfs.xfs
使用 提供的資訊libblkid
。您可以訪問相同的資訊發布lsblk -t
。很有可能使用288Kmkfs.xfs
分配對齊,因為lvs
(device-mapper
實際上)只是將該值傳遞到堆棧。我看到了與精簡配置非常相似的行為,其中
mkfs.xfs
文件系統與思考塊大小對齊。編輯:所以,這是
lsblk -t
…的輸出[vagrant@node-02 ~]$ lsblk -t NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME sda 0 512 0 512 512 1 deadline 128 4096 0B ├─sda1 0 512 0 512 512 1 deadline 128 4096 0B └─sda2 0 512 0 512 512 1 deadline 128 4096 0B ├─centos-root 0 512 0 512 512 1 128 4096 0B ├─centos-swap 0 512 0 512 512 1 128 4096 0B └─centos-home 0 512 0 512 512 1 128 4096 0B sdb 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sdc 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sdd 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M │ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sde 0 512 0 512 512 1 deadline 128 4096 32M sdf 0 512 0 512 512 1 deadline 128 4096 32M ├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M │ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B └─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M └─data5-data5 0 294912 294912 512 512 1 128 4096 0B sdg 0 512 0 512 512 1 deadline 128 4096 32M sdh 0 512 0 512 512 1 deadline 128 4096 32M
如您所見,
data5-data5
設備(在其上創建 xfs 文件系統)報告MIN-IO
294912OPT-IO
字節(288K,您的記憶體塊),而底層設備報告 RAID 陣列塊大小(64K)。這意味著device-mapper
用目前記憶體塊大小覆蓋了底層 IO 資訊。
mkfs.xfs
只使用libblkid
報告的內容,而這又取決於正在使用的特定記憶體設備映射器目標。