Infiniband

如何將 Infiniband 埠從 INIT 設置為 ACTIVE

  • February 2, 2018

我有以下設置。7 個節點,假設它們被呼叫gauss1gauss7. gauss1我在to之間建立了穩定的連接gauss6。只是gauss7在製造麻煩。

# ibnodes
Ca  : 0x0002c90300f2eef0 ports 2 "gauss1 mlx4_0"
Ca  : 0x0002c90300f2ef20 ports 2 "gauss2 mlx4_0"
Ca  : 0x7cfe900300be5350 ports 1 "gauss3 mlx4_0"
Ca  : 0x7cfe900300be5170 ports 1 "gauss4 mlx4_0"
Ca  : 0x7cfe900300be51a0 ports 1 "gauss5 mlx4_0"
Ca  : 0x248a070300d8f5c0 ports 1 "gauss6 mlx4_0"
Ca  : 0xec0d9a03002baf50 ports 1 "gauss7 mlx4_0"

所以所有節點似乎都在交換機上註冊。埠狀態為gauss1to gauss6on ACTIVE。就在gauss7我有埠狀態INIT

ibv_devinfo高斯7 說:

hca_id: mlx4_0
   transport:          InfiniBand (0)
   fw_ver:             2.42.5000
   node_guid:          ec0d:9a03:002b:af50
   sys_image_guid:         ec0d:9a03:002b:af53
   vendor_id:          0x02c9
   vendor_part_id:         4099
   hw_ver:             0x0
   board_id:           MT_1100120019
   phys_port_cnt:          1
       port:   1
           state:          PORT_INIT (2)
           max_mtu:        4096 (5)
           active_mtu:     4096 (5)
           sm_lid:         3
           port_lid:       9
           port_lmc:       0x00
           link_layer:     InfiniBand

我也安裝opensm在 gauss7 上,它說它在STANDBY

Feb 02 20:15:36 gauss7 opensm-launch[355306]: Using default GUID 0xec0d9a03002baf51
Feb 02 20:15:36 gauss7 OpenSM[355309]: Entering DISCOVERING state
Feb 02 20:15:36 gauss7 opensm-launch[355306]: Entering DISCOVERING state
Feb 02 20:15:36 gauss7 OpenSM[355309]: Entering STANDBY state
Feb 02 20:15:36 gauss7 opensm-launch[355306]: Entering STANDBY state

我的問題:如何設置埠gauss7ACTIVE在所有 7 個節點之間建立連接?

重新啟動 gauss7 解決了這個問題。

引用自:https://serverfault.com/questions/895423