Kubernetes

CoreDNS 由於循環而失敗:如何為 kubelet 提供適當的 resolvConf?

  • November 2, 2021

這是調查開始的地方:CoreDNS 無法工作超過幾秒鐘,出現以下錯誤:

$ kubectl get pods --all-namespaces
NAMESPACE       NAME                                          READY   STATUS             RESTARTS      AGE
ingress-nginx   ingress-nginx-controller-8xcl9                1/1     Running            0             11h
ingress-nginx   ingress-nginx-controller-hwhvk                1/1     Running            0             11h
ingress-nginx   ingress-nginx-controller-xqdqx                1/1     Running            2 (10h ago)   11h
kube-system     calico-kube-controllers-684bcfdc59-cr7hr      1/1     Running            0             11h
kube-system     calico-node-62p58                             1/1     Running            2 (10h ago)   11h
kube-system     calico-node-btvdh                             1/1     Running            0             11h
kube-system     calico-node-q5bkr                             1/1     Running            0             11h
kube-system     coredns-8474476ff8-dnt6b                      0/1     CrashLoopBackOff   1 (3s ago)    5s
kube-system     coredns-8474476ff8-ftcbx                      0/1     Error              1 (2s ago)    5s
kube-system     dns-autoscaler-5ffdc7f89d-4tshm               1/1     Running            2 (10h ago)   11h
kube-system     kube-apiserver-hyzio                          1/1     Running            4 (10h ago)   11h
kube-system     kube-controller-manager-hyzio                 1/1     Running            4 (10h ago)   11h
kube-system     kube-proxy-2d8ls                              1/1     Running            0             11h
kube-system     kube-proxy-c6c4l                              1/1     Running            4 (10h ago)   11h
kube-system     kube-proxy-nzqdd                              1/1     Running            0             11h
kube-system     kube-scheduler-hyzio                          1/1     Running            5 (10h ago)   11h
kube-system     kubernetes-dashboard-548847967d-66dwz         1/1     Running            0             11h
kube-system     kubernetes-metrics-scraper-6d49f96c97-r6dz2   1/1     Running            0             11h
kube-system     nginx-proxy-dyzio                             1/1     Running            0             11h
kube-system     nginx-proxy-zyzio                             1/1     Running            0             11h
kube-system     nodelocaldns-g9wxh                            1/1     Running            0             11h
kube-system     nodelocaldns-j2qc9                            1/1     Running            4 (10h ago)   11h
kube-system     nodelocaldns-vk84j                            1/1     Running            0             11h
kube-system     registry-j5prk                                1/1     Running            0             11h
kube-system     registry-proxy-5wbhq                          1/1     Running            0             11h
kube-system     registry-proxy-77lqd                          1/1     Running            0             11h
kube-system     registry-proxy-s45p4                          1/1     Running            2 (10h ago)   11h

kubectl describe在那個吊艙上並沒有給圖片帶來太多:

Events:
 Type     Reason     Age                From               Message
 ----     ------     ----               ----               -------
 Normal   Scheduled  67s                default-scheduler  Successfully assigned kube-system/coredns-8474476ff8-dnt6b to zyzio
 Normal   Pulled     25s (x4 over 68s)  kubelet            Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
 Normal   Created    25s (x4 over 68s)  kubelet            Created container coredns
 Normal   Started    25s (x4 over 68s)  kubelet            Started container coredns
 Warning  BackOff    6s (x11 over 66s)  kubelet            Back-off restarting failed container

但是查看日誌確實:

$ kubectl logs coredns-8474476ff8-dnt6b -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae
[FATAL] plugin/loop: Loop (127.0.0.1:49048 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2906344495550081187.9117452939332601176."

連結故障排除文件真是太好了!我開始瀏覽該頁面並發現我確實/etc/resolv.conf包含有問題的本地 IP nameserver 127.0.0.53

另外,我在 中找到了真正的DNS IP /run/systemd/resolve/resolv.conf,但現在的問題是:如何執行故障排除文件中描述的操作,說:

將以下內容添加到您的 kubelet 配置 yaml:resolvConf:(或通過命令行標誌 –resolv-conf 在 1.10 中已棄用)。您的“真實” resolv.conf 是包含您的上游伺服器的實際 IP 的文件,並且沒有本地/環回地址。該標誌告訴 kubelet 將備用的 resolv.conf 傳遞給 Pod。對於使用 systemd-resolved 的系統,/run/systemd/resolve/resolv.conf 通常是“真正的”resolv.conf 的位置,儘管這可能因您的發行版而異。

所以,問題是:

  • 如何找到或在哪裡創建提到的 kubelet 配置 yaml,
  • 我應該在什麼級別指定resolvConf值,以及
  • 它可以接受多個值嗎?我定義了兩個名稱伺服器。它們應該作為單獨的條目還是數組給出?

/etc/resolv.conf/位於您的每個節點中。SSH您可以通過進入節點來編輯它。

然後您必須重新啟動kubelet才能使更改生效。

sudo systemctl restart kubelet

(如果這不起作用,請使用 重新啟動您的節點sudo reboot


/home/kubernetes/kubelet-config.yaml(也位於您的每個節點上)文件包含 kubelet 的配置。您可以創建新文件,並使用欄位resolv.conf指向它resolvConf

apiVersion: kubelet.config.k8s.io/v1beta1
...
kind: KubeletConfiguration
...
resolvConf: <location of the file>

重要提示:新配置僅適用於更新後創建的 pod。強烈建議在更改配置之前耗盡您的節點。


它可以接受多個值嗎?我定義了兩個名稱伺服器。它們應該作為單獨的條目還是數組給出?

Kubelet 配置文件狀態resolvConf的類型為string,因此可能只接受單個值。

引用自:https://serverfault.com/questions/1081862