Dhcp

使用 NetworkManager 和不使用 NetworkManager 的機器之間的不同 DHCP 行為

  • March 4, 2021

誰能闡明我在下面列出的差異。也許可以解釋為什麼 NetworkManager 做的不同。請告知我們是否可以將 NetworkManager 更改為更像非 NetworkManager 場景。

兩台 CentOS 7.8 伺服器都使用 dhclient,但其中一台由 NetworkManager 控制。兩者每隔幾天都有相同的開關/網卡關閉/啟動事件(此時無法控制 - 出於多種原因,而且我們是遠端的)

使用 NetworkManager 的伺服器#0 在停機/停機後立即嘗試請求 DHCP。它無法從 DHCP 獲得任何響應(另一個交換機問題),然後取消 DHCP 事務並將狀態更改為超時。然後它什麼也不做,除非重新啟動 NetworkManager(顯然這只能在控制台完成)。請看下面的整個序列。

沒有使用 NetworkManager 的伺服器#1 通過這些停機/停機中斷恢復正常,似乎它只是在整個 NIC 停機時保持其租約,甚至沒有在 NIC 上更新,只是繼續使用它的 IP!稍後,它能夠以正常租用超時間隔更新 DHCP。請看下面的整個序列。

請讓我知道我是否可以將 NetworkManager 更改為更像普通的 dhclient。也許可以將其配置為在關閉/啟動後僅保留目前租約,並以正常租約超時間隔續訂?謝謝!!

伺服器#0:

-- Last regular DHCP renew:
Feb 26 09:31:21 server0 dhclient[4766]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x58eefe09)
Feb 26 09:31:21 server0 dhclient[4766]: DHCPACK from 10.20.20.131 (xid=0x58eefe09)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5084] dhcp4 (enp96s0f0):   address 10.20.20.223
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   plen 22 (255.255.252.0)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   gateway 10.20.20.1
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   lease time 18000
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   nameserver '10.20.20.49'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0):   nameserver '10.20.20.48'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0):   domain name 'dom.com'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0): state changed bound -> bound
Feb 26 09:31:21 server0 dhclient[4766]: bound to 10.20.20.223 -- renewal in 8129 seconds.
Feb 26 09:31:21 server0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 09:31:21 server0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: new request (4 scripts)
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: start running ordered scripts...
-- Random switch outage:
Feb 26 10:49:10 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8263] device (enp96s0f0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8467] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 4766
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8468] dhcp4 (enp96s0f0): state changed bound -> done
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8679] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:49:16 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:49:16 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: start running ordered scripts...
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 10:58:46 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is not accessible
-- NetworkManager tries to recover and request DHCP:
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6768] device (enp96s0f0): carrier: link connected
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6783] device (enp96s0f0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6823] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6835] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6837] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6844] manager: NetworkManager state is now CONNECTING
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6848] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7360] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7369] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7435] dhcp4 (enp96s0f0): dhclient started with pid 44653
Feb 26 10:58:46 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:58:54 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:59:13 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 3 (xid=0x2f70b1a3)
Feb 26 10:59:16 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 6 (xid=0x2f70b1a3)
Feb 26 10:59:22 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 9 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn>  [1614355171.8451] dhcp4 (enp96s0f0): request timed out
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8451] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8540] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44653
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8541] dhcp4 (enp96s0f0): state changed timeout -> done
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8545] device (enp96s0f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8553] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn>  [1614355171.8559] device (enp96s0f0): Activation: failed for connection 'enp96s0f0'
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8563] device (enp96s0f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8606] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8615] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8617] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
-- NetworkManager tries to recover and request DHCP again following a different process:
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8624] manager: NetworkManager state is now CONNECTING
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8628] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9420] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9429] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9489] dhcp4 (enp96s0f0): dhclient started with pid 44712
Feb 26 10:59:32 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:36 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:44 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:49 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:54 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 7 (xid=0x3ffbeab4)
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5823] device (enp96s0f0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5846] device (enp96s0f0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5850] device (enp96s0f0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5869] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5982] manager: NetworkManager state is now CONNECTED_SITE
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5988] policy: set 'enp96s0f0' (enp96s0f0) as default for IPv6 routing and DNS
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5992] device (enp96s0f0): Activation: successful, device activated.
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.6003] manager: NetworkManager state is now CONNECTED_GLOBAL
Feb 26 10:59:59 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:59:59 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: start running ordered scripts...
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 11:00:01 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x3ffbeab4)
Feb 26 11:00:15 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 21 (xid=0x3ffbeab4)
-- NetworkManager cancels and times out and does nothing anymore
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <warn>  [1614355216.8456] dhcp4 (enp96s0f0): request timed out
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8463] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8649] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44712
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8650] dhcp4 (enp96s0f0): state changed timeout -> done

伺服器#1:

-- Last regular DHCP renew:
Feb 26 10:34:00 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 10:34:00 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 10:34:02 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8195 seconds.
-- Random switch outage:
Feb 26 10:49:10 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:58:46 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is accessible during this time!
-- Next regular DHCP renew:
Feb 26 12:50:37 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 12:50:37 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 12:50:39 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8611 seconds.

在 NetworkManager 中,設備具有整體的邏輯狀態。這就是你在nmcli device.

如果設備已連接(啟動),則它可能無法從 DHCP 獲取地址(或者,稍後可能會發生 DHCP 超時)。取決於ipv4.dhcp-timeout(您可以設置為無窮大),一段時間後 DHCP 將被視為失敗。發生這種情況時,設備可能會完全停機。這取決於設置ipv4.may-fail。如果ipv4.may-fail=no,則 DHCP 失敗對啟動來說是致命的,並且設備會關閉。如果沒有,只要你有 IPv6 地址,整體狀態還是算不錯的。在這種情況下,應無限期重試 DHCP,同時設備保持啟動/啟動狀態。

另一方面,如果設備由於故障而停機,它就有資格再次自動連接(至少,如果你設置了它connection.autoconect=yes)。此自動連接循環最多重複connection.autoconnect-retries多次,然後自動連接被阻止 5 分鐘,然後再次開始。

這就是它應該的樣子。但是對於 CentOS7.8,我不確定這一切是否如我所說的那樣有效。你說,“那麼它什麼也不做,除非 NetworkManager 重新啟動”。你確定嗎?你等得夠久嗎?DHCP 失敗後,它可能會後退一點。您粘貼的日誌在此之後完成。

調試 NetworkManager 時,調試日誌更有用。level=TRACE在 NetworkManager.conf 中配置日誌記錄。

也許ipv4.may-fail=no會有幫助?然後至少設備會關閉,並且自動連接週期將再次開始。


順便說一句,如果您希望 NetworkManager 在拔下電纜時讓設備保持執行狀態(您似乎喜歡 dhclient),那麼在man NetworkManager.conf.

引用自:https://serverfault.com/questions/1055942