Kubernetes
使用 containerd 作為 CRI 時離線安裝 kubernetes 失敗
由於某種原因,我不得不建構一個沒有 Internet 連接的裸機 Kubernetes 集群。
由於 dockershim 已被棄用,我決定使用 containerd 作為 CRI,但
kubeadm init
由於超時,使用 kubeadm 離線安裝在執行時失敗。Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet'
由於以下原因,我可以看到很多錯誤日誌
journalctl -u kubelet -f
:11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.473188 9299 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://133.117.20.57:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/rhel8?timeout=10s": dial tcp 133.117.20.57:6443: connect: connection refused 11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.533555 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:25 rhel8 kubelet[9299]: I1124 16:25:25.588986 9299 kubelet_node_status.go:71] "Attempting to register node" node="rhel8" 11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.589379 9299 kubelet_node_status.go:93] "Unable to register node with API server" err="Post \"https://133.117.20.57:6443/api/v1/nodes\": dial tcp 133.117.20.57:6443: connect: connection refused" node="rhel8" 11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.634625 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.735613 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.835815 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.936552 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.036989 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.137464 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.238594 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.338704 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.394465 9299 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"rhel8.16ba6aab63e58bd8", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"rhel8", UID:"rhel8", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"rhel8"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc05f9812b2b227d8, ext:5706873656, loc:(*time.Location)(0x55a228f25680)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc05f9812b2b227d8, ext:5706873656, loc:(*time.Location)(0x55a228f25680)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://133.117.20.57:6443/api/v1/namespaces/default/events": dial tcp 133.117.20.57:6443: connect: connection refused'(may retry after sleeping) 11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.143503 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.244526 9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found" 11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302890 9299 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused" 11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302949 9299 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused" pod="kube-system/kube-scheduler-rhel8" 11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302989 9299 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused" pod="kube-system/kube-scheduler-rhel8" 11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.303080 9299 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-rhel8_kube-system(e5616b23d0312e4995fcb768f04aabbb)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-rhel8_kube-system(e5616b23d0312e4995fcb768f04aabbb)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.2\\\": failed to pull image \\\"k8s.gcr.io/pause:3.2\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.2\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.2\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.2\\\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused\"" pod="kube-system/kube-scheduler-rhel8" podUID=e5616b23d0312e4995fcb768f04aabbb
當我對 Internet 連接執行相同操作時,安裝成功。並且當使用 docker 代替 containerd 時,即使沒有 Internet 連接也可以成功完成安裝。
這是由預設情況下設置為從 k8s.gcr.io 中提取其設置的 containerd 引起的
sandbox_image
,即使沒有 Internet 連接也是如此。此設置在
/etc/containerd/config.toml
文件的第 57 行附近指定。[plugins."io.containerd.grpc.v1.cri"] <snip> sandbox_image = "k8s.gcr.io/pause:3.2"
我目前的 k8s 集群是 v1.22.1,所以它使用 pause:3.5 而不是 pause:3.2。通過將其更改為現有映像(這次是 k8s.gcr.io/pause:3.5),我成功建構了我的 kubernetes 集群,無需 Internet 連接。