Kubernetes

即使在 fargate 配置文件更新檔之後,coredns 部署也無法查找節點

  • August 16, 2021

安裝 fargate 配置文件和 coreddns 外掛的問題;我將 terraform 用於某些部分,而kubetctl對於其他部分,fargate 配置文件是通過 terraform 創建的:

fargate_profiles = {
 kube-system-profile = {
   name = "kube-system-profile"
   selectors = [
     {
       namespace = "kube-system"
       labels = {
         name = "kube-system"
         k8s-app = "kube-dns"
       }
     }
   ]
   tags = {
     Cost = "DaCost"
     Environment = "dev"
     Name = "coredns-fargate-profile"
   }
 },
 swiftalk-dev-profile = {
   name = "dev-profile"
   selectors = [
     {
       namespace = "dev"
       labels = {
         name = "dev"
       }
     }
   ]
   tags = {
     Cost = "DaCost"
     Environment = "dev"
     Name = "dev-profile"
   }
 },
}

然後我使用 terraform 再次安裝 coredns 外掛

resource "aws_eks_addon" "core_dns" {
 addon_name        = "coredns"
 addon_version     = "v1.8.3-eksbuild.1"
 cluster_name      = "${var.eks_cluster_name}-dev"
 resolve_conflicts = "OVERWRITE"
 tags              = { "eks_addon" = "coredns", name = "kube-system" }
 depends_on        = [kubernetes_namespace.dev]
}

我修補了 fargate 的 coredns 部署

kubectl patch deployment coredns \
 --namespace kube-system \
 --type=json \
 -p='[{"op": "remove", "path": "/spec/template/metadata/annotations/eks.amazonaws.com~1compute-type"}]'

然後重新啟動

kubectl rollout restart -n kube-system deployment/coredns

但是,coredns pod 仍處於待處理狀態

kubectl get pods -n kube-system
NAME                      READY   STATUS    RESTARTS   AGE
coredns-5766d4545-g6nxn   0/1     Pending   0          46m
coredns-5766d4545-xng48   0/1     Pending   0          46m
coredns-b744fccf4-hb726   0/1     Pending   0          77m

雲監視日誌指出 Pod 正在尋找要部署的節點,而不是fargate

I0723 10:24:38.059960       1 factory.go:319] "Unable to schedule pod; no nodes are registered to the cluster; waiting" pod="kube-system/coredns-b744fccf4-hb726"
I0723 10:24:38.060078       1 factory.go:319] "Unable to schedule pod; no nodes are registered to the cluster; waiting" pod="kube-system/coredns-5766d4545-xng48"

就我而言,問題是在啟用集群公共訪問端點後,我僅限於(我們的 VPN 的)公共 CIDR,這意味著我必須添加 pod CIDRS 或啟用本文中提到的私有訪問端點

https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html

它現在可以工作了,無需修補 coredns 部署

引用自:https://serverfault.com/questions/1070424