Centos

為什麼 XFS 使用 lvm 記憶體塊大小而不是 sunit/swidth 的 raid5 設置

  • September 3, 2019

我的虛擬機上有 4 個磁碟可用於測試sdb、、、sdc和。sdd``sde

前 3 個磁碟用於 RAID5 配置,最後一個磁碟用作 lvm 記憶體驅動器。

我不明白的是:

當我創建塊大小為 64KiB 的 50GB 記憶體磁碟時,xfs_info給了我以下資訊:

[vagrant@node-02 ~]$ xfs_info /data
meta-data=/dev/mapper/data-data isize=512    agcount=32, agsize=16777072 blks
        =                       sectsz=512   attr=2, projid32bit=1
        =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=536866304, imaxpct=5
        =                       sunit=16     swidth=32 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=262144, version=2
        =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

正如我們在這裡看到的,sunit=16 和 swidth=32 似乎是正確的,並且與 raid5 佈局相匹配。

結果lsblk -t

[vagrant@node-02 ~]$ lsblk -t
NAME                         ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED    RQ-SIZE   RA WSAME
sda                                  0    512      0     512     512    1 deadline     128 4096    0B
├─sda1                               0    512      0     512     512    1 deadline     128 4096    0B
└─sda2                               0    512      0     512     512    1 deadline     128 4096    0B
 ├─centos-root                      0    512      0     512     512    1              128 4096    0B
 ├─centos-swap                      0    512      0     512     512    1              128 4096    0B
 └─centos-home                      0    512      0     512     512    1              128 4096    0B
sdb                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_0          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0  65536 131072     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_0         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0  65536 131072     512     512    1              128 4096    0B
sdc                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_1          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0  65536 131072     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_1         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0  65536 131072     512     512    1              128 4096    0B
sdd                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_2          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0  65536 131072     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_2         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0  65536 131072     512     512    1              128 4096    0B
sde                                  0    512      0     512     512    1 deadline     128 4096   32M
sdf                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-cache_data5_cdata            0    512      0     512     512    1              128 4096   32M
│ └─data5-data5                      0  65536 131072     512     512    1              128 4096    0B
└─data5-cache_data5_cmeta            0    512      0     512     512    1              128 4096   32M
 └─data5-data5                      0  65536 131072     512     512    1              128 4096    0B
sdg                                  0    512      0     512     512    1 deadline     128 4096   32M
sdh                                  0    512      0     512     512    1 deadline     128 4096   32M

lvdisplay -a -m data給了我以下資訊:

[vagrant@node-02 ~]$ sudo lvdisplay -m -a data
 --- Logical volume ---
 LV Path                /dev/data/data
 LV Name                data
 VG Name                data
 LV UUID                MBG1p8-beQj-TNDd-Cyx4-QkyN-vdVk-dG6n6I
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
 LV Cache pool name     cache_data
 LV Cache origin name   data_corig
 LV Status              available
 # open                 1
 LV Size                <2.00 TiB
 Cache used blocks      0.06%
 Cache metadata blocks  0.64%
 Cache dirty blocks     0.00%
 Cache read hits/misses 293 / 66
 Cache wrt hits/misses  59 / 41173
 Cache demotions        0
 Cache promotions       486
 Current LE             524284
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:9

 --- Segments ---
 Logical extents 0 to 524283:
   Type                cache
   Chunk size          64.00 KiB
   Metadata format     2
   Mode                writethrough
   Policy              smq


 --- Logical volume ---
 Internal LV Name       cache_data
 VG Name                data
 LV UUID                apACl6-DtfZ-TURM-vxjD-UhxF-tthY-uSYRGq
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:16 +0000
 LV Pool metadata       cache_data_cmeta
 LV Pool data           cache_data_cdata
 LV Status              NOT available
 LV Size                50.00 GiB
 Current LE             12800
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto

 --- Segments ---
 Logical extents 0 to 12799:
   Type                cache-pool
   Chunk size          64.00 KiB
   Metadata format     2
   Mode                writethrough
   Policy              smq


 --- Logical volume ---
 Internal LV Name       cache_data_cmeta
 VG Name                data
 LV UUID                hmkW6M-CKGO-CTUP-rR4v-KnWn-DbBZ-pJeEA2
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:15 +0000
 LV Status              available
 # open                 1
 LV Size                1.00 GiB
 Current LE             256
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:11

 --- Segments ---
 Logical extents 0 to 255:
   Type                linear
   Physical volume     /dev/sdf
   Physical extents    0 to 255


 --- Logical volume ---
 Internal LV Name       cache_data_cdata
 VG Name                data
 LV UUID                9mHe8J-SRiY-l1gl-TO1h-2uCC-Hi10-UpeEVP
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:16 +0000
 LV Status              available
 # open                 1
 LV Size                50.00 GiB
 Current LE             12800
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:10

 --- Segments ---
 Logical extents 0 to 12799:
   Type                linear
   Physical volume     /dev/sdf
   Physical extents    256 to 13055

 --- Logical volume ---
 Internal LV Name       data_corig
 VG Name                data
 LV UUID                QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:17 +0000
 LV origin of Cache LV  data
 LV Status              available
 # open                 1
 LV Size                <2.00 TiB
 Current LE             524284
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     768
 Block device           253:12

 --- Segments ---
 Logical extents 0 to 524283:
   Type                raid5
   Monitoring          monitored
   Raid Data LV 0
     Logical volume    data_corig_rimage_0
     Logical extents   0 to 262141
   Raid Data LV 1
     Logical volume    data_corig_rimage_1
     Logical extents   0 to 262141
   Raid Data LV 2
     Logical volume    data_corig_rimage_2
     Logical extents   0 to 262141
   Raid Metadata LV 0  data_corig_rmeta_0
   Raid Metadata LV 1  data_corig_rmeta_1
   Raid Metadata LV 2  data_corig_rmeta_2


[vagrant@node-02 ~]$
[vagrant@node-02 ~]$   --- Segments ---
Df7SLj
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
 LV Status              available
 # open                 1
 LV Size                1023.99 GiB
 Current LE             262142
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:8

 --- Segments ---
 Logical extents 0 to 262141:
   Type                linear
   Physical volume     /dev/sdd
   Physical extents    1 to 262142


 --- Logical volume ---
 Internal LV Name       data_corig_rmeta_2
 VG Name                data
 LV UUID                xi9Ot3-aTnp-bA3z-YL0x-eVaB-87EP-JSM3eN
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
 LV Status              available
 # open                 1
 LV Size                4.00 MiB
 Current LE             1
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:7

 --- Segments ---
 Logical extents 0 to 0:
   Type                linear
   Physical volume     /dev/sdd
   Physical extents    0 to 0


 --- Logical volume ---
 Internal LV Name       data_corig
 VG Name                data
 LV UUID                QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4
 LV Write Access        read/write
 LV Creation host, time node-02, 2019-09-03 13:22:17 +0000
 LV origin of Cache LV  data
 LV Status              available
 # open                 1
 LV Size                <2.00 TiB
 Current LE             524284
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     768
 Block device           253:12

 --- Segments ---
 Logical extents 0 to 524283:
   Type                raid5
   Monitoring          monitored
   Raid Data LV 0
     Logical volume    data_corig_rimage_0
     Logical extents   0 to 262141
   Raid Data LV 1
     Logical volume    data_corig_rimage_1
     Logical extents   0 to 262141
   Raid Data LV 2
     Logical volume    data_corig_rimage_2
     Logical extents   0 to 262141
   Raid Metadata LV 0  data_corig_rmeta_0
   Raid Metadata LV 1  data_corig_rmeta_1
   Raid Metadata LV 2  data_corig_rmeta_2

我們可以清楚地看到分段中 64KiB 的塊大小。

但是當我創建一個 250GB 的記憶體磁碟時,lvm 至少需要一個 288KiB 的塊大小來容納該記憶體磁碟的大小。但是當我執行xfs_info這些sunit/swidth值時突然匹配記憶體驅動器而不是 RAID5 佈局。

輸出xfs_info

[vagrant@node-02 ~]$ xfs_info /data
meta-data=/dev/mapper/data-data isize=512    agcount=32, agsize=16777152 blks
        =                       sectsz=512   attr=2, projid32bit=1
        =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=536866816, imaxpct=5
        =                       sunit=72     swidth=72 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=262144, version=2
        =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

突然我們有一個72 的sunitswidth匹配記憶體驅動器的 288KiB 的塊大小,我們可以看到這個lvdisplay -m -a

[vagrant@node-02 ~]$ sudo lvdisplay -m -a data
 --- Logical volume ---
 LV Path                /dev/data/data
 LV Name                data
 VG Name                data
 LV UUID                XLHw3w-RkG9-UNh6-WZBM-HtjM-KcV6-6dOdnG
 LV Write Access        read/write
 LV Creation host, time node-2, 2019-09-03 13:36:32 +0000
 LV Cache pool name     cache_data
 LV Cache origin name   data_corig
 LV Status              available
 # open                 1
 LV Size                <2.00 TiB
 Cache used blocks      0.17%
 Cache metadata blocks  0.71%
 Cache dirty blocks     0.00%
 Cache read hits/misses 202 / 59
 Cache wrt hits/misses  8939 / 34110
 Cache demotions        0
 Cache promotions       1526
 Current LE             524284
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:9

 --- Segments ---
 Logical extents 0 to 524283:
   Type                cache
   Chunk size          288.00 KiB
   Metadata format     2
   Mode                writethrough
   Policy              smq


 --- Logical volume ---
 Internal LV Name       cache_data
 VG Name                data
 LV UUID                Ps7Z1P-y5Ae-ju80-SZjc-yB6S-YBtx-SWL9vO
 LV Write Access        read/write
 LV Creation host, time node-2, 2019-09-03 13:36:40 +0000
 LV Pool metadata       cache_data_cmeta
 LV Pool data           cache_data_cdata
 LV Status              NOT available
 LV Size                250.00 GiB
 Current LE             64000
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto

 --- Segments ---
 Logical extents 0 to 63999:
   Type                cache-pool
   Chunk size          288.00 KiB
   Metadata format     2
   Mode                writethrough
   Policy              smq


 --- Logical volume ---
 Internal LV Name       cache_data_cmeta
 VG Name                data
 LV UUID                k4rVn9-lPJm-2Vvt-77jw-NP1K-PTOs-zFy2ph
 LV Write Access        read/write
 LV Creation host, time node-2, 2019-09-03 13:36:39 +0000
 LV Status              available
 # open                 1
 LV Size                1.00 GiB
 Current LE             256
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:11

 --- Segments ---
 Logical extents 0 to 255:
   Type                linear
   Physical volume     /dev/sdf
   Physical extents    0 to 255


 --- Logical volume ---
 Internal LV Name       cache_data_cdata
 VG Name                data
 LV UUID                dm571W-f9eX-aFMA-SrPC-PYdd-zs45-ypLksd
 LV Write Access        read/write
 LV Creation host, time node-2, 2019-09-03 13:36:39 +0000
 LV Status              available
 # open                 1
 LV Size                250.00 GiB
 Current LE             64000
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     8192
 Block device           253:10

 --- Logical volume ---
 Internal LV Name       data_corig
 VG Name                data
 LV UUID                hbYiRO-YnV8-gd1B-shQD-N3SR-xpTl-rOjX8V
 LV Write Access        read/write
 LV Creation host, time node-2, 2019-09-03 13:36:41 +0000
 LV origin of Cache LV  data
 LV Status              available
 # open                 1
 LV Size                <2.00 TiB
 Current LE             524284
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     768
 Block device           253:12

 --- Segments ---
 Logical extents 0 to 524283:
   Type                raid5
   Monitoring          monitored
   Raid Data LV 0
     Logical volume    data_corig_rimage_0
     Logical extents   0 to 262141
   Raid Data LV 1
     Logical volume    data_corig_rimage_1
     Logical extents   0 to 262141
   Raid Data LV 2
     Logical volume    data_corig_rimage_2
     Logical extents   0 to 262141
   Raid Metadata LV 0  data_corig_rmeta_0
   Raid Metadata LV 1  data_corig_rmeta_1
   Raid Metadata LV 2  data_corig_rmeta_2

和輸出lsblk -t

[vagrant@node-02 ~]$ lsblk -t
NAME                         ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED    RQ-SIZE   RA WSAME
sda                                  0    512      0     512     512    1 deadline     128 4096    0B
├─sda1                               0    512      0     512     512    1 deadline     128 4096    0B
└─sda2                               0    512      0     512     512    1 deadline     128 4096    0B
 ├─centos-root                      0    512      0     512     512    1              128 4096    0B
 ├─centos-swap                      0    512      0     512     512    1              128 4096    0B
 └─centos-home                      0    512      0     512     512    1              128 4096    0B
sdb                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_0          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_0         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
sdc                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_1          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_1         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
sdd                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_2          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_2         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
sde                                  0    512      0     512     512    1 deadline     128 4096   32M
sdf                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-cache_data5_cdata            0    512      0     512     512    1              128 4096   32M
│ └─data5-data5                      0 294912 294912     512     512    1              128 4096    0B
└─data5-cache_data5_cmeta            0    512      0     512     512    1              128 4096   32M
 └─data5-data5                      0 294912 294912     512     512    1              128 4096    0B
sdg                                  0    512      0     512     512    1 deadline     128 4096   32M
sdh                                  0    512      0     512     512    1 deadline     128 4096   32M

這裡有幾個問題。

XFS Autodetect 顯然這些設置,但為什麼 XFS 選擇使用記憶體驅動器的塊大小?正如我們在第一個範例中看到的那樣,它能夠自動檢測 RAID5 佈局。

我知道我可以通過su/sw選項來mkfs.xfs獲得正確的sunit/swidth值,但是在這種情況下我應該這樣做嗎?

http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance

我用Google搜尋了好幾天,我查看了 XFS 原始碼,但我找不到 XFS 這樣做的任何線索。

所以出現的問題:

  • 為什麼 XFS 會這樣?
  • 我應該su/sw在執行時手動定義mkfs.xfs
  • 記憶體驅動器的塊大小是否會影響 RAID5 設置,是否應該以某種方式對齊?

最優分配策略是一個複雜的問題,因為它取決於各個塊層之間如何互動。

在確定最優分配策略時,mkfs.xfs使用 提供的資訊libblkid。您可以訪問相同的資訊發布lsblk -t。很有可能使用288Kmkfs.xfs分配對齊,因為lvsdevice-mapper實際上)只是將該值傳遞到堆棧。

我看到了與精簡配置非常相似的行為,其中mkfs.xfs文件系統與思考塊大小對齊。

編輯:所以,這是lsblk -t…的輸出

[vagrant@node-02 ~]$ lsblk -t
NAME                         ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED    RQ-SIZE   RA WSAME
sda                                  0    512      0     512     512    1 deadline     128 4096    0B
├─sda1                               0    512      0     512     512    1 deadline     128 4096    0B
└─sda2                               0    512      0     512     512    1 deadline     128 4096    0B
 ├─centos-root                      0    512      0     512     512    1              128 4096    0B
 ├─centos-swap                      0    512      0     512     512    1              128 4096    0B
 └─centos-home                      0    512      0     512     512    1              128 4096    0B
sdb                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_0          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_0         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
sdc                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_1          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_1         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
sdd                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-data5_corig_rmeta_2          0    512      0     512     512    1              128 4096   32M
│ └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
│   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
└─data5-data5_corig_rimage_2         0    512      0     512     512    1              128 4096   32M
 └─data5-data5_corig                0  65536 131072     512     512    1              128  384    0B
   └─data5-data5                    0 294912 294912     512     512    1              128 4096    0B
sde                                  0    512      0     512     512    1 deadline     128 4096   32M
sdf                                  0    512      0     512     512    1 deadline     128 4096   32M
├─data5-cache_data5_cdata            0    512      0     512     512    1              128 4096   32M
│ └─data5-data5                      0 294912 294912     512     512    1              128 4096    0B
└─data5-cache_data5_cmeta            0    512      0     512     512    1              128 4096   32M
 └─data5-data5                      0 294912 294912     512     512    1              128 4096    0B
sdg                                  0    512      0     512     512    1 deadline     128 4096   32M
sdh                                  0    512      0     512     512    1 deadline     128 4096   32M

如您所見,data5-data5設備(在其上創建 xfs 文件系統)報告MIN-IO294912OPT-IO字節(288K,您的記憶體塊),而底層設備報告 RAID 陣列塊大小(64K)。這意味著device-mapper用目前記憶體塊大小覆蓋了底層 IO 資訊。

mkfs.xfs只使用libblkid報告的內容,而這又取決於正在使用的特定記憶體設備映射器目標。

引用自:https://serverfault.com/questions/981694