CoreDNS 由於循環而失敗:如何為 kubelet 提供適當的 resolvConf?
這是調查開始的地方:CoreDNS 無法工作超過幾秒鐘,出現以下錯誤:
$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE ingress-nginx ingress-nginx-controller-8xcl9 1/1 Running 0 11h ingress-nginx ingress-nginx-controller-hwhvk 1/1 Running 0 11h ingress-nginx ingress-nginx-controller-xqdqx 1/1 Running 2 (10h ago) 11h kube-system calico-kube-controllers-684bcfdc59-cr7hr 1/1 Running 0 11h kube-system calico-node-62p58 1/1 Running 2 (10h ago) 11h kube-system calico-node-btvdh 1/1 Running 0 11h kube-system calico-node-q5bkr 1/1 Running 0 11h kube-system coredns-8474476ff8-dnt6b 0/1 CrashLoopBackOff 1 (3s ago) 5s kube-system coredns-8474476ff8-ftcbx 0/1 Error 1 (2s ago) 5s kube-system dns-autoscaler-5ffdc7f89d-4tshm 1/1 Running 2 (10h ago) 11h kube-system kube-apiserver-hyzio 1/1 Running 4 (10h ago) 11h kube-system kube-controller-manager-hyzio 1/1 Running 4 (10h ago) 11h kube-system kube-proxy-2d8ls 1/1 Running 0 11h kube-system kube-proxy-c6c4l 1/1 Running 4 (10h ago) 11h kube-system kube-proxy-nzqdd 1/1 Running 0 11h kube-system kube-scheduler-hyzio 1/1 Running 5 (10h ago) 11h kube-system kubernetes-dashboard-548847967d-66dwz 1/1 Running 0 11h kube-system kubernetes-metrics-scraper-6d49f96c97-r6dz2 1/1 Running 0 11h kube-system nginx-proxy-dyzio 1/1 Running 0 11h kube-system nginx-proxy-zyzio 1/1 Running 0 11h kube-system nodelocaldns-g9wxh 1/1 Running 0 11h kube-system nodelocaldns-j2qc9 1/1 Running 4 (10h ago) 11h kube-system nodelocaldns-vk84j 1/1 Running 0 11h kube-system registry-j5prk 1/1 Running 0 11h kube-system registry-proxy-5wbhq 1/1 Running 0 11h kube-system registry-proxy-77lqd 1/1 Running 0 11h kube-system registry-proxy-s45p4 1/1 Running 2 (10h ago) 11h
kubectl describe
在那個吊艙上並沒有給圖片帶來太多:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 67s default-scheduler Successfully assigned kube-system/coredns-8474476ff8-dnt6b to zyzio Normal Pulled 25s (x4 over 68s) kubelet Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine Normal Created 25s (x4 over 68s) kubelet Created container coredns Normal Started 25s (x4 over 68s) kubelet Started container coredns Warning BackOff 6s (x11 over 66s) kubelet Back-off restarting failed container
但是查看日誌確實:
$ kubectl logs coredns-8474476ff8-dnt6b -n kube-system .:53 [INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab CoreDNS-1.8.0 linux/amd64, go1.15.3, 054c9ae [FATAL] plugin/loop: Loop (127.0.0.1:49048 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2906344495550081187.9117452939332601176."
連結故障排除文件真是太好了!我開始瀏覽該頁面並發現我確實
/etc/resolv.conf
包含有問題的本地 IPnameserver 127.0.0.53
。另外,我在 中找到了真正的DNS IP
/run/systemd/resolve/resolv.conf
,但現在的問題是:如何執行故障排除文件中描述的操作,說:將以下內容添加到您的 kubelet 配置 yaml:resolvConf:(或通過命令行標誌 –resolv-conf 在 1.10 中已棄用)。您的“真實” resolv.conf 是包含您的上游伺服器的實際 IP 的文件,並且沒有本地/環回地址。該標誌告訴 kubelet 將備用的 resolv.conf 傳遞給 Pod。對於使用 systemd-resolved 的系統,/run/systemd/resolve/resolv.conf 通常是“真正的”resolv.conf 的位置,儘管這可能因您的發行版而異。
所以,問題是:
- 如何找到或在哪裡創建提到的 kubelet 配置 yaml,
- 我應該在什麼級別指定
resolvConf
值,以及- 它可以接受多個值嗎?我定義了兩個名稱伺服器。它們應該作為單獨的條目還是數組給出?
/etc/resolv.conf/
位於您的每個節點中。SSH
您可以通過進入節點來編輯它。然後您必須重新啟動
kubelet
才能使更改生效。sudo systemctl restart kubelet
(如果這不起作用,請使用 重新啟動您的節點
sudo reboot
)
/home/kubernetes/kubelet-config.yaml
(也位於您的每個節點上)文件包含 kubelet 的配置。您可以創建新文件,並使用欄位resolv.conf
指向它resolvConf
apiVersion: kubelet.config.k8s.io/v1beta1 ... kind: KubeletConfiguration ... resolvConf: <location of the file>
重要提示:新配置僅適用於更新後創建的 pod。強烈建議在更改配置之前耗盡您的節點。
它可以接受多個值嗎?我定義了兩個名稱伺服器。它們應該作為單獨的條目還是數組給出?
Kubelet 配置文件狀態
resolvConf
的類型為string,因此可能只接受單個值。