Kubernetes
Metrics-server 在 CrashLoopBackOff 中,由 rke 新安裝
過去一天我至少安裝了 10 次,但每次都執行正常,但指標伺服器在 CrashLoopBackOff 中
pods YAML 文件中缺少我理解的以下部分,需要添加到部署中。
我是 Kubernetes 新手,我有 2 個問題
- 我正在使用 rke 安裝 Rancher 集群,那麼為什麼 Pod 中會缺少以下設置來啟動 metric-server?
命令:/metrics-server
–cert-dir=/tmp
–secure-port=4443
–kubelet-preferred-address-types=InternalIP
–kubelet-insecure-tls 2. 添加這些行的最佳方法是什麼,我很新,所以需要一些指導
集群資訊:
Kubernetes version: [rke@rke19-master1 ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION 192.168.0.56 Ready controlplane,etcd 17m v1.19.10 192.168.0.57 Ready controlplane,etcd 17m v1.19.10 192.168.0.58 Ready controlplane,etcd 17m v1.19.10 192.168.0.59 Ready worker 17m v1.19.10 192.168.0.60 Ready worker 17m v1.19.10 [rke@rke19-master1 ~]$
$$ rke@rke19-master1 ~ $$$ kubectl 獲取 pod 指標-伺服器-5b6d79d4f4-ggl57 -n kube-system -o yaml
apiVersion: v1 kind: Pod metadata: annotations: cni.projectcalico.org/podIP: 10.42.4.3/32 cni.projectcalico.org/podIPs: 10.42.4.3/32 creationTimestamp: "2021-08-16T23:00:42Z" generateName: metrics-server-5b6d79d4f4- labels: k8s-app: metrics-server pod-template-hash: 5b6d79d4f4 managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:generateName: {} f:labels: .: {} f:k8s-app: {} f:pod-template-hash: {} f:ownerReferences: .: {} k:{"uid":"fb15b257-4a9d-478b-b461-8b61c165e3db"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: f:affinity: .: {} f:nodeAffinity: .: {} f:requiredDuringSchedulingIgnoredDuringExecution: .: {} f:nodeSelectorTerms: {} f:containers: k:{"name":"metrics-server"}: .: {} f:args: {} f:image: {} f:imagePullPolicy: {} f:livenessProbe: .: {} f:failureThreshold: {} f:httpGet: .: {} f:path: {} f:port: {} f:scheme: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:name: {} f:ports: .: {} k:{"containerPort":4443,"protocol":"TCP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} f:readinessProbe: .: {} f:failureThreshold: {} f:httpGet: .: {} f:path: {} f:port: {} f:scheme: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:resources: {} f:securityContext: .: {} f:readOnlyRootFilesystem: {} f:runAsNonRoot: {} f:runAsUser: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/tmp"}: .: {} f:mountPath: {} f:name: {} f:dnsPolicy: {} f:enableServiceLinks: {} f:priorityClassName: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: {} f:serviceAccount: {} f:serviceAccountName: {} f:terminationGracePeriodSeconds: {} f:tolerations: {} f:volumes: .: {} k:{"name":"tmp-dir"}: .: {} f:emptyDir: {} f:name: {} manager: kube-controller-manager operation: Update time: "2021-08-16T23:00:42Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:cni.projectcalico.org/podIP: {} f:cni.projectcalico.org/podIPs: {} manager: calico operation: Update time: "2021-08-16T23:00:47Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: k:{"type":"ContainersReady"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} k:{"type":"Initialized"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:status: {} f:type: {} k:{"type":"Ready"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} f:containerStatuses: {} f:hostIP: {} f:phase: {} f:podIP: {} f:podIPs: .: {} k:{"ip":"10.42.4.3"}: .: {} f:ip: {} f:startTime: {} manager: kubelet operation: Update time: "2021-08-16T23:00:54Z" name: metrics-server-5b6d79d4f4-ggl57 namespace: kube-system ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: metrics-server-5b6d79d4f4 uid: fb15b257-4a9d-478b-b461-8b61c165e3db resourceVersion: "5775" selfLink: /api/v1/namespaces/kube-system/pods/metrics-server-5b6d79d4f4-ggl57 uid: af8d4e07-aa3f-4efe-8169-feb37cfd97df spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/os operator: NotIn values: - windows - key: node-role.kubernetes.io/worker operator: Exists containers: - args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP - --logtostderr image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /livez port: https scheme: HTTPS periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: metrics-server ports: - containerPort: 4443 name: https protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: https scheme: HTTPS periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: {} securityContext: readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /tmp name: tmp-dir - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: metrics-server-token-78b6h readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true nodeName: 192.168.0.59 preemptionPolicy: PreemptLowerPriority priority: 2000000000 priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: metrics-server serviceAccountName: metrics-server terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute operator: Exists - effect: NoSchedule operator: Exists volumes: - emptyDir: {} name: tmp-dir - name: metrics-server-token-78b6h secret: defaultMode: 420 secretName: metrics-server-token-78b6h status: conditions: - lastProbeTime: null lastTransitionTime: "2021-08-16T23:00:43Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2021-08-16T23:00:43Z" message: 'containers with unready status: [metrics-server]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2021-08-16T23:00:43Z" message: 'containers with unready status: [metrics-server]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2021-08-16T23:00:43Z" status: "True" type: PodScheduled containerStatuses: - containerID: docker://344c587a7edd3abed035c12bfc16b9dbd0da3f26ba9101aa246bf4793648d380 image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6 imageID: docker-pullable://192.168.0.35:5000/rancher/metrics-server@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b lastState: terminated: containerID: docker://e28b6812965786cd2f520a20dd2adf6cbe9c6a720de905ce16992ed0f4cd7c9e exitCode: 2 finishedAt: "2021-08-16T23:21:47Z" reason: Error startedAt: "2021-08-16T23:21:18Z" name: metrics-server ready: false restartCount: 12 started: true state: running: startedAt: "2021-08-16T23:26:52Z" hostIP: 192.168.0.59 phase: Running podIP: 10.42.4.3 podIPs: - ip: 10.42.4.3 qosClass: BestEffort startTime: "2021-08-16T23:00:43Z" [rke@rke19-master1 ~]$ kubectl describe pods metrics-server-5b6d79d4f4-ggl57 -n kube-system Name: metrics-server-5b6d79d4f4-ggl57 Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: 192.168.0.59/192.168.0.59 Start Time: Tue, 17 Aug 2021 00:00:43 +0100 Labels: k8s-app=metrics-server pod-template-hash=5b6d79d4f4 Annotations: cni.projectcalico.org/podIP: 10.42.4.3/32 cni.projectcalico.org/podIPs: 10.42.4.3/32 Status: Running IP: 10.42.4.3 IPs: IP: 10.42.4.3 Controlled By: ReplicaSet/metrics-server-5b6d79d4f4 Containers: metrics-server: Container ID: docker://74ea122709aefc07b89dcbd3514e86fdff9874627b87413571d1624a55c32baa Image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6 Image ID: docker-pullable://192.168.0.35:5000/rancher/metrics-server@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b Port: 4443/TCP Host Port: 0/TCP Args: --cert-dir=/tmp --secure-port=4443 --kubelet-insecure-tls --kubelet-preferred-address-types=InternalIP --logtostderr State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Tue, 17 Aug 2021 00:27:18 +0100 Finished: Tue, 17 Aug 2021 00:27:47 +0100 Ready: False Restart Count: 13 Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get https://:https/readyz delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /tmp from tmp-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-78b6h (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: tmp-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> metrics-server-token-78b6h: Type: Secret (a volume populated by a Secret) SecretName: metrics-server-token-78b6h Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: :NoExecuteop=Exists :NoScheduleop=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 28m default-scheduler Successfully assigned kube-system/metrics-server-5b6d79d4f4-ggl57 to 192.168.0.59 Normal Pulling 28m kubelet Pulling image "192.168.0.35:5000/rancher/metrics-server:v0.3.6" Normal Pulled 28m kubelet Successfully pulled image "192.168.0.35:5000/rancher/metrics-server:v0.3.6" in 4.687484656s Warning Unhealthy 28m kubelet Readiness probe failed: Get "https://10.42.4.3:4443/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 28m kubelet Liveness probe failed: Get "https://10.42.4.3:4443/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 27m kubelet Readiness probe failed: Get "https://10.42.4.3:4443/readyz": dial tcp 10.42.4.3:4443: connect: connection refused Warning Unhealthy 27m (x5 over 28m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 404 Warning Unhealthy 27m (x5 over 28m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404 Normal Killing 27m (x2 over 27m) kubelet Container metrics-server failed liveness probe, will be restarted Normal Created 27m (x3 over 28m) kubelet Created container metrics-server Normal Started 27m (x3 over 28m) kubelet Started container metrics-server Normal Pulled 8m14s (x10 over 27m) kubelet Container image "192.168.0.35:5000/rancher/metrics-server:v0.3.6" already present on machine Warning BackOff 3m15s (x97 over 25m) kubelet Back-off restarting failed container [rke@rke19-master1 ~]$ [rke@rke19-master1 ~]$ ^C [rke@rke19-master1 ~]$ kubectl logs metrics-server-5b6d79d4f4-ggl57 -n kube-system I0816 23:27:20.011598 1 secure_serving.go:116] Serving securely on [::]:4443 [rke@rke19-master1 ~]$
該
Failed with statuscode: 404
消息表明您正在查詢一個不存在的地址。我們可以看到您正在提取 metrics-server 映像的一些 v0.3.6 標籤。雖然它來自牧場主,但我們可以假設他們堅持上游版本控制。
檢查上游的變更日誌,我們可以看到
/livez
並/readyz
在 v0.4.0 中引入,參見:https ://github.com/kubernetes-sigs/metrics-server/releases/tag/v0.4.0我建議您嘗試查詢
/healthz
從 v0.4.0 中刪除的 URL。或者將您的 httpGet 探針更改為 tcpSocket 探針。或者:嘗試將指標伺服器升級到最新版本?