使用 NetworkManager 和不使用 NetworkManager 的機器之間的不同 DHCP 行為
誰能闡明我在下面列出的差異。也許可以解釋為什麼 NetworkManager 做的不同。請告知我們是否可以將 NetworkManager 更改為更像非 NetworkManager 場景。
兩台 CentOS 7.8 伺服器都使用 dhclient,但其中一台由 NetworkManager 控制。兩者每隔幾天都有相同的開關/網卡關閉/啟動事件(此時無法控制 - 出於多種原因,而且我們是遠端的)
使用 NetworkManager 的伺服器#0 在停機/停機後立即嘗試請求 DHCP。它無法從 DHCP 獲得任何響應(另一個交換機問題),然後取消 DHCP 事務並將狀態更改為超時。然後它什麼也不做,除非重新啟動 NetworkManager(顯然這只能在控制台完成)。請看下面的整個序列。
沒有使用 NetworkManager 的伺服器#1 通過這些停機/停機中斷恢復正常,似乎它只是在整個 NIC 停機時保持其租約,甚至沒有在 NIC 上更新,只是繼續使用它的 IP!稍後,它能夠以正常租用超時間隔更新 DHCP。請看下面的整個序列。
請讓我知道我是否可以將 NetworkManager 更改為更像普通的 dhclient。也許可以將其配置為在關閉/啟動後僅保留目前租約,並以正常租約超時間隔續訂?謝謝!!
伺服器#0:
-- Last regular DHCP renew: Feb 26 09:31:21 server0 dhclient[4766]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x58eefe09) Feb 26 09:31:21 server0 dhclient[4766]: DHCPACK from 10.20.20.131 (xid=0x58eefe09) Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5084] dhcp4 (enp96s0f0): address 10.20.20.223 Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): plen 22 (255.255.252.0) Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): gateway 10.20.20.1 Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): lease time 18000 Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): nameserver '10.20.20.49' Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5091] dhcp4 (enp96s0f0): nameserver '10.20.20.48' Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5091] dhcp4 (enp96s0f0): domain name 'dom.com' Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5091] dhcp4 (enp96s0f0): state changed bound -> bound Feb 26 09:31:21 server0 dhclient[4766]: bound to 10.20.20.223 -- renewal in 8129 seconds. Feb 26 09:31:21 server0 systemd: Starting Network Manager Script Dispatcher Service... Feb 26 09:31:21 server0 systemd: Started Network Manager Script Dispatcher Service. Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: new request (4 scripts) Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: start running ordered scripts... -- Random switch outage: Feb 26 10:49:10 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8263] device (enp96s0f0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed') Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8467] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 4766 Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8468] dhcp4 (enp96s0f0): state changed bound -> done Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8679] manager: NetworkManager state is now CONNECTED_LOCAL Feb 26 10:49:16 SERVER0 systemd: Starting Network Manager Script Dispatcher Service... Feb 26 10:49:16 SERVER0 systemd: Started Network Manager Script Dispatcher Service. Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: new request (4 scripts) Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: start running ordered scripts... Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts) Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts... Feb 26 10:58:46 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None -- Machine is not accessible -- NetworkManager tries to recover and request DHCP: Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6768] device (enp96s0f0): carrier: link connected Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6783] device (enp96s0f0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6823] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5) Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6835] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5) Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6837] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6844] manager: NetworkManager state is now CONNECTING Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6848] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.7360] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.7369] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds) Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.7435] dhcp4 (enp96s0f0): dhclient started with pid 44653 Feb 26 10:58:46 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4) Feb 26 10:58:54 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4) Feb 26 10:59:13 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 3 (xid=0x2f70b1a3) Feb 26 10:59:16 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 6 (xid=0x2f70b1a3) Feb 26 10:59:22 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 9 (xid=0x2f70b1a3) Feb 26 10:59:31 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x2f70b1a3) Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn> [1614355171.8451] dhcp4 (enp96s0f0): request timed out Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8451] dhcp4 (enp96s0f0): state changed unknown -> timeout Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8540] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44653 Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8541] dhcp4 (enp96s0f0): state changed timeout -> done Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8545] device (enp96s0f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed') Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8553] manager: NetworkManager state is now CONNECTED_LOCAL Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn> [1614355171.8559] device (enp96s0f0): Activation: failed for connection 'enp96s0f0' Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8563] device (enp96s0f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed') Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8606] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5) Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8615] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5) Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8617] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') -- NetworkManager tries to recover and request DHCP again following a different process: Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8624] manager: NetworkManager state is now CONNECTING Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8628] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.9420] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.9429] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds) Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.9489] dhcp4 (enp96s0f0): dhclient started with pid 44712 Feb 26 10:59:32 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866) Feb 26 10:59:36 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866) Feb 26 10:59:44 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4) Feb 26 10:59:49 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4) Feb 26 10:59:54 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 7 (xid=0x3ffbeab4) Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5823] device (enp96s0f0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed') Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5846] device (enp96s0f0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed') Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5850] device (enp96s0f0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5869] manager: NetworkManager state is now CONNECTED_LOCAL Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5982] manager: NetworkManager state is now CONNECTED_SITE Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5988] policy: set 'enp96s0f0' (enp96s0f0) as default for IPv6 routing and DNS Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5992] device (enp96s0f0): Activation: successful, device activated. Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.6003] manager: NetworkManager state is now CONNECTED_GLOBAL Feb 26 10:59:59 SERVER0 systemd: Starting Network Manager Script Dispatcher Service... Feb 26 10:59:59 SERVER0 systemd: Started Network Manager Script Dispatcher Service. Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: new request (4 scripts) Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: start running ordered scripts... Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts) Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts... Feb 26 11:00:01 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x3ffbeab4) Feb 26 11:00:15 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 21 (xid=0x3ffbeab4) -- NetworkManager cancels and times out and does nothing anymore Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <warn> [1614355216.8456] dhcp4 (enp96s0f0): request timed out Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info> [1614355216.8463] dhcp4 (enp96s0f0): state changed unknown -> timeout Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info> [1614355216.8649] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44712 Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info> [1614355216.8650] dhcp4 (enp96s0f0): state changed timeout -> done
伺服器#1:
-- Last regular DHCP renew: Feb 26 10:34:00 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34) Feb 26 10:34:00 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34) Feb 26 10:34:02 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8195 seconds. -- Random switch outage: Feb 26 10:49:10 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down Feb 26 10:58:46 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None -- Machine is accessible during this time! -- Next regular DHCP renew: Feb 26 12:50:37 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34) Feb 26 12:50:37 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34) Feb 26 12:50:39 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8611 seconds.
在 NetworkManager 中,設備具有整體的邏輯狀態。這就是你在
nmcli device
.如果設備已連接(啟動),則它可能無法從 DHCP 獲取地址(或者,稍後可能會發生 DHCP 超時)。取決於
ipv4.dhcp-timeout
(您可以設置為無窮大),一段時間後 DHCP 將被視為失敗。發生這種情況時,設備可能會完全停機。這取決於設置ipv4.may-fail
。如果ipv4.may-fail=no
,則 DHCP 失敗對啟動來說是致命的,並且設備會關閉。如果沒有,只要你有 IPv6 地址,整體狀態還是算不錯的。在這種情況下,應無限期重試 DHCP,同時設備保持啟動/啟動狀態。另一方面,如果設備由於故障而停機,它就有資格再次自動連接(至少,如果你設置了它
connection.autoconect=yes
)。此自動連接循環最多重複connection.autoconnect-retries
多次,然後自動連接被阻止 5 分鐘,然後再次開始。這就是它應該的樣子。但是對於 CentOS7.8,我不確定這一切是否如我所說的那樣有效。你說,“那麼它什麼也不做,除非 NetworkManager 重新啟動”。你確定嗎?你等得夠久嗎?DHCP 失敗後,它可能會後退一點。您粘貼的日誌在此之後完成。
調試 NetworkManager 時,調試日誌更有用。
level=TRACE
在 NetworkManager.conf 中配置日誌記錄。也許
ipv4.may-fail=no
會有幫助?然後至少設備會關閉,並且自動連接週期將再次開始。順便說一句,如果您希望 NetworkManager 在拔下電纜時讓設備保持執行狀態(您似乎喜歡 dhclient),那麼在
man NetworkManager.conf
.