Linux

ISC DHCP 無法在對等方之間同步租約

  • April 6, 2018

我在兩台伺服器上的 Debian GNU/Linux 上使用 ISC DHCP 版本 4.1.1。我嘗試使用各種版本的 ISC DHCP 解決以下問題,但它保持不變。

我在不同子網上的兩台伺服器之間的故障轉移配置是:

#-----------------------------------------------
# Primary Server
#-----------------------------------------------

authoritative;
default-lease-time 900;
max-lease-time 1800;         
option domain-name "foo.com";
option domain-name-servers 10.12.0.254;

failover peer "foo" {
   primary;
   address 10.12.0.254;
   port 647;
   peer address 10.10.10.12;
   peer port 647;
   max-response-delay 30;
   max-unacked-updates 10;
   load balance max seconds 3;
   mclt 1800;  
   split 128;
}

subnet 10.12.0.0 netmask 255.255.0.0 {
   pool {
       failover peer "foo";
       range 10.12.10.0 10.12.112.0;
       range 10.12.112.12 10.12.255.254;
       deny dynamic bootp clients;
   }
   option routers 10.12.0.254;
   option subnet-mask 255.255.0.0;
   option broadcast-address 10.12.255.255;
}

#-----------------------------------------------
# Secondary Server
#-----------------------------------------------

authoritative;
default-lease-time 900;
max-lease-time 1800;
option domain-name "foo.com";
option domain-name-servers 10.12.0.254;

failover peer "foo" {
       secondary;
       address 10.10.10.12;
       port 647;
       peer address 10.12.0.254;
       peer port 647;
       max-response-delay 30;
       max-unacked-updates 10;
       load balance max seconds 3;
}

subnet 10.12.0.0 netmask 255.255.0.0 {
       pool {
               failover peer "foo";
               range 10.12.10.0 10.12.112.0;
               range 10.12.112.12 10.12.255.254;
       deny dynamic bootp clients;
       }
   option routers 10.12.0.254;
   option subnet-mask 255.255.0.0;
   option broadcast-address 10.12.255.255;
}

subnet 10.10.10.0 netmask 255.255.255.240 {
}

在連接主伺服器網路和輔助伺服器網路的路由器上啟用了 IP 助手(又名 UDP 助手)和 DHCP 中繼,我可以從一台伺服器對另一台伺服器執行 ping 和 ssh 並返回。

當我在兩台伺服器上啟動 dhcpd 服務時,它們無法平衡租約。

我粘貼了兩台伺服器的日誌樣本

主伺服器

Sep 19 10:31:11 primary dhcpd: failover peer foo: I move from recover to startup
Sep 19 10:31:11 primary dhcpd: failover peer foo: I move from startup to recover
Sep 19 10:31:11 primary dhcpd: Sent update request all message to foo
Sep 19 10:31:20 primary dhcpd: peer foo: disconnected
Sep 19 10:31:22 primary dhcpd: failover peer foo: peer moves from recover-done to recover-done
Sep 19 10:31:22 primary dhcpd: failover peer foo: peer moves from recover-done to recover-done
Sep 19 10:31:45 primary dhcpd: DHCPINFORM from 10.12.181.177 via eth1
Sep 19 10:31:45 primary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth1
Sep 19 10:32:45 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:32:46 primary dhcpd: DHCPINFORM from 10.12.181.177 via eth1
Sep 19 10:32:46 primary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth1
Sep 19 10:32:49 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:32:57 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:33:13 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)
Sep 19 10:33:13 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:33:17 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)
Sep 19 10:33:25 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)
Sep 19 10:33:41 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)

輔助伺服器

Sep 19 10:31:11 secondary dhcpd: Update request all from foo: sending update
Sep 19 10:31:23 secondary dhcpd: Wrote 22 leases to leases file.
Sep 19 10:31:23 secondary dhcpd: failover peer foo: I move from recover-done to startup
Sep 19 10:31:23 secondary dhcpd: failover peer foo: I move from startup to recover-done
Sep 19 10:31:45 secondary dhcpd: DHCPINFORM from 10.12.181.177 via 10.12.0.1
Sep 19 10:31:45 secondary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth0
Sep 19 10:32:45 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:32:46 secondary dhcpd: DHCPINFORM from 10.12.181.177 via 10.12.0.1
Sep 19 10:32:46 secondary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth0
Sep 19 10:32:49 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:32:57 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:33:13 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:33:13 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:33:17 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:33:25 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:33:41 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:34:46 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:34:51 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:34:59 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:35:16 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:38:28 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:38:32 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)

我似乎沒有負載平衡日誌行,所以我不認為租賃平衡正在發生……

Sent update request all message to foo
Update request all from foo: sending update

平衡過程似乎停留在上面的兩條線上

如果我關閉一台伺服器上的 DHCPD 守護程序,即使它檢測到其他對等方已關閉,對等方似乎也不會接管

我該如何解決這個問題?

提前謝謝你(對不起我的英語不好):-)

該消息not responding (recovering)表明伺服器沒有響應,因為它正在從故障轉移(或初始啟動)中恢復。並且可能仍在使用池中的所有免費租約填充租約數據庫,如果池很大,這可能需要一段時間。

嘗試使用較小的池來驗證您的故障轉移是否正常工作,然後重新調整。您的範圍非常大,可能是它似乎掛在更新上的原因。

我以前遇到過這個問題。對我來說,是防火牆阻止了兩台伺服器上的埠 647/tcp。我在每台伺服器上執行了以下命令,它解決了這個問題。

firewall-cmd --add-port=647/tcp --permanent
firewall-cmd --reload

然後,重新啟動 dhcpd 服務。

引用自:https://serverfault.com/questions/313008