Kubernetes

Metrics-server 在 CrashLoopBackOff 中,由 rke 新安裝

  • November 2, 2021

過去一天我至少安裝了 10 次,但每次都執行正常,但指標伺服器在 CrashLoopBackOff 中

pods YAML 文件中缺少我理解的以下部分,需要添加到部署中。

我是 Kubernetes 新手,我有 2 個問題

  1. 我正在使用 rke 安裝 Rancher 集群,那麼為什麼 Pod 中會缺少以下設置來啟動 metric-server?

命令:/metrics-server

–cert-dir=/tmp

–secure-port=4443

–kubelet-preferred-address-types=InternalIP

–kubelet-insecure-tls 2. 添加這些行的最佳方法是什麼,我很新,所以需要一些指導

集群資訊:

Kubernetes version:
[rke@rke19-master1 ~]$ kubectl get nodes
NAME           STATUS   ROLES               AGE   VERSION
192.168.0.56   Ready    controlplane,etcd   17m   v1.19.10
192.168.0.57   Ready    controlplane,etcd   17m   v1.19.10
192.168.0.58   Ready    controlplane,etcd   17m   v1.19.10
192.168.0.59   Ready    worker              17m   v1.19.10
192.168.0.60   Ready    worker              17m   v1.19.10
[rke@rke19-master1 ~]$

$$ rke@rke19-master1 ~ $$$ kubectl 獲取 pod 指標-伺服器-5b6d79d4f4-ggl57 -n kube-system -o yaml

apiVersion: v1
kind: Pod
metadata:
 annotations:
   cni.projectcalico.org/podIP: 10.42.4.3/32
   cni.projectcalico.org/podIPs: 10.42.4.3/32
 creationTimestamp: "2021-08-16T23:00:42Z"
 generateName: metrics-server-5b6d79d4f4-
 labels:
   k8s-app: metrics-server
   pod-template-hash: 5b6d79d4f4
 managedFields:
 - apiVersion: v1
   fieldsType: FieldsV1
   fieldsV1:
     f:metadata:
       f:generateName: {}
       f:labels:
         .: {}
         f:k8s-app: {}
         f:pod-template-hash: {}
       f:ownerReferences:
         .: {}
         k:{"uid":"fb15b257-4a9d-478b-b461-8b61c165e3db"}:
           .: {}
           f:apiVersion: {}
           f:blockOwnerDeletion: {}
           f:controller: {}
           f:kind: {}
           f:name: {}
           f:uid: {}
     f:spec:
       f:affinity:
         .: {}
         f:nodeAffinity:
           .: {}
           f:requiredDuringSchedulingIgnoredDuringExecution:
             .: {}
             f:nodeSelectorTerms: {}
       f:containers:
         k:{"name":"metrics-server"}:
           .: {}
           f:args: {}
           f:image: {}
           f:imagePullPolicy: {}
           f:livenessProbe:
             .: {}
             f:failureThreshold: {}
             f:httpGet:
               .: {}
               f:path: {}
               f:port: {}
               f:scheme: {}
             f:periodSeconds: {}
             f:successThreshold: {}
             f:timeoutSeconds: {}
           f:name: {}
           f:ports:
             .: {}
             k:{"containerPort":4443,"protocol":"TCP"}:
               .: {}
               f:containerPort: {}
               f:name: {}
               f:protocol: {}
           f:readinessProbe:
             .: {}
             f:failureThreshold: {}
             f:httpGet:
               .: {}
               f:path: {}
               f:port: {}
               f:scheme: {}
             f:periodSeconds: {}
             f:successThreshold: {}
             f:timeoutSeconds: {}
           f:resources: {}
           f:securityContext:
             .: {}
             f:readOnlyRootFilesystem: {}
             f:runAsNonRoot: {}
             f:runAsUser: {}
           f:terminationMessagePath: {}
           f:terminationMessagePolicy: {}
           f:volumeMounts:
             .: {}
             k:{"mountPath":"/tmp"}:
               .: {}
               f:mountPath: {}
               f:name: {}
       f:dnsPolicy: {}
       f:enableServiceLinks: {}
       f:priorityClassName: {}
       f:restartPolicy: {}
       f:schedulerName: {}
       f:securityContext: {}
       f:serviceAccount: {}
       f:serviceAccountName: {}
       f:terminationGracePeriodSeconds: {}
       f:tolerations: {}
       f:volumes:
         .: {}
         k:{"name":"tmp-dir"}:
           .: {}
           f:emptyDir: {}
           f:name: {}
   manager: kube-controller-manager
   operation: Update
   time: "2021-08-16T23:00:42Z"
 - apiVersion: v1
   fieldsType: FieldsV1
   fieldsV1:
     f:metadata:
       f:annotations:
         .: {}
         f:cni.projectcalico.org/podIP: {}
         f:cni.projectcalico.org/podIPs: {}
   manager: calico
   operation: Update
   time: "2021-08-16T23:00:47Z"
 - apiVersion: v1
   fieldsType: FieldsV1
   fieldsV1:
     f:status:
       f:conditions:
         k:{"type":"ContainersReady"}:
           .: {}
           f:lastProbeTime: {}
           f:lastTransitionTime: {}
           f:message: {}
           f:reason: {}
           f:status: {}
           f:type: {}
         k:{"type":"Initialized"}:
           .: {}
           f:lastProbeTime: {}
           f:lastTransitionTime: {}
           f:status: {}
           f:type: {}
         k:{"type":"Ready"}:
           .: {}
           f:lastProbeTime: {}
           f:lastTransitionTime: {}
           f:message: {}
           f:reason: {}
           f:status: {}
           f:type: {}
       f:containerStatuses: {}
       f:hostIP: {}
       f:phase: {}
       f:podIP: {}
       f:podIPs:
         .: {}
         k:{"ip":"10.42.4.3"}:
           .: {}
           f:ip: {}
       f:startTime: {}
   manager: kubelet
   operation: Update
   time: "2021-08-16T23:00:54Z"
 name: metrics-server-5b6d79d4f4-ggl57
 namespace: kube-system
 ownerReferences:
 - apiVersion: apps/v1
   blockOwnerDeletion: true
   controller: true
   kind: ReplicaSet
   name: metrics-server-5b6d79d4f4
   uid: fb15b257-4a9d-478b-b461-8b61c165e3db
 resourceVersion: "5775"
 selfLink: /api/v1/namespaces/kube-system/pods/metrics-server-5b6d79d4f4-ggl57
 uid: af8d4e07-aa3f-4efe-8169-feb37cfd97df
spec:
 affinity:
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
       nodeSelectorTerms:
       - matchExpressions:
         - key: beta.kubernetes.io/os
           operator: NotIn
           values:
           - windows
         - key: node-role.kubernetes.io/worker
           operator: Exists
 containers:
 - args:
   - --cert-dir=/tmp
   - --secure-port=4443
   - --kubelet-insecure-tls
   - --kubelet-preferred-address-types=InternalIP
   - --logtostderr
   image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6
   imagePullPolicy: IfNotPresent
   livenessProbe:
     failureThreshold: 3
     httpGet:
       path: /livez
       port: https
       scheme: HTTPS
     periodSeconds: 10
     successThreshold: 1
     timeoutSeconds: 1
   name: metrics-server
   ports:
   - containerPort: 4443
     name: https
     protocol: TCP
   readinessProbe:
     failureThreshold: 3
     httpGet:
       path: /readyz
       port: https
       scheme: HTTPS
     periodSeconds: 10
     successThreshold: 1
     timeoutSeconds: 1
   resources: {}
   securityContext:
     readOnlyRootFilesystem: true
     runAsNonRoot: true
     runAsUser: 1000
   terminationMessagePath: /dev/termination-log
   terminationMessagePolicy: File
   volumeMounts:
   - mountPath: /tmp
     name: tmp-dir
   - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
     name: metrics-server-token-78b6h
     readOnly: true
 dnsPolicy: ClusterFirst
 enableServiceLinks: true
 nodeName: 192.168.0.59
 preemptionPolicy: PreemptLowerPriority
 priority: 2000000000
 priorityClassName: system-cluster-critical
 restartPolicy: Always
 schedulerName: default-scheduler
 securityContext: {}
 serviceAccount: metrics-server
 serviceAccountName: metrics-server
 terminationGracePeriodSeconds: 30
 tolerations:
 - effect: NoExecute
   operator: Exists
 - effect: NoSchedule
   operator: Exists
 volumes:
 - emptyDir: {}
   name: tmp-dir
 - name: metrics-server-token-78b6h
   secret:
     defaultMode: 420
     secretName: metrics-server-token-78b6h
status:
 conditions:
 - lastProbeTime: null
   lastTransitionTime: "2021-08-16T23:00:43Z"
   status: "True"
   type: Initialized
 - lastProbeTime: null
   lastTransitionTime: "2021-08-16T23:00:43Z"
   message: 'containers with unready status: [metrics-server]'
   reason: ContainersNotReady
   status: "False"
   type: Ready
 - lastProbeTime: null
   lastTransitionTime: "2021-08-16T23:00:43Z"
   message: 'containers with unready status: [metrics-server]'
   reason: ContainersNotReady
   status: "False"
   type: ContainersReady
 - lastProbeTime: null
   lastTransitionTime: "2021-08-16T23:00:43Z"
   status: "True"
   type: PodScheduled
 containerStatuses:
 - containerID: docker://344c587a7edd3abed035c12bfc16b9dbd0da3f26ba9101aa246bf4793648d380
   image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6
   imageID: docker-pullable://192.168.0.35:5000/rancher/metrics-server@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b
   lastState:
     terminated:
       containerID: docker://e28b6812965786cd2f520a20dd2adf6cbe9c6a720de905ce16992ed0f4cd7c9e
       exitCode: 2
       finishedAt: "2021-08-16T23:21:47Z"
       reason: Error
       startedAt: "2021-08-16T23:21:18Z"
   name: metrics-server
   ready: false
   restartCount: 12
   started: true
   state:
     running:
       startedAt: "2021-08-16T23:26:52Z"
 hostIP: 192.168.0.59
 phase: Running
 podIP: 10.42.4.3
 podIPs:
 - ip: 10.42.4.3
 qosClass: BestEffort
 startTime: "2021-08-16T23:00:43Z"


[rke@rke19-master1 ~]$ kubectl describe  pods metrics-server-5b6d79d4f4-ggl57 -n kube-system
Name:                 metrics-server-5b6d79d4f4-ggl57
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 192.168.0.59/192.168.0.59
Start Time:           Tue, 17 Aug 2021 00:00:43 +0100
Labels:               k8s-app=metrics-server
                     pod-template-hash=5b6d79d4f4
Annotations:          cni.projectcalico.org/podIP: 10.42.4.3/32
                     cni.projectcalico.org/podIPs: 10.42.4.3/32
Status:               Running
IP:                   10.42.4.3
IPs:
 IP:           10.42.4.3
Controlled By:  ReplicaSet/metrics-server-5b6d79d4f4
Containers:
 metrics-server:
   Container ID:  docker://74ea122709aefc07b89dcbd3514e86fdff9874627b87413571d1624a55c32baa
   Image:         192.168.0.35:5000/rancher/metrics-server:v0.3.6
   Image ID:      docker-pullable://192.168.0.35:5000/rancher/metrics-server@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b
   Port:          4443/TCP
   Host Port:     0/TCP
   Args:
     --cert-dir=/tmp
     --secure-port=4443
     --kubelet-insecure-tls
     --kubelet-preferred-address-types=InternalIP
     --logtostderr
   State:          Waiting
     Reason:       CrashLoopBackOff
   Last State:     Terminated
     Reason:       Error
     Exit Code:    2
     Started:      Tue, 17 Aug 2021 00:27:18 +0100
     Finished:     Tue, 17 Aug 2021 00:27:47 +0100
   Ready:          False
   Restart Count:  13
   Liveness:       http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
   Readiness:      http-get https://:https/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
   Environment:    <none>
   Mounts:
     /tmp from tmp-dir (rw)
     /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-78b6h (ro)
Conditions:
 Type              Status
 Initialized       True
 Ready             False
 ContainersReady   False
 PodScheduled      True
Volumes:
 tmp-dir:
   Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
   Medium:
   SizeLimit:  <unset>
 metrics-server-token-78b6h:
   Type:        Secret (a volume populated by a Secret)
   SecretName:  metrics-server-token-78b6h
   Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     :NoExecuteop=Exists
                :NoScheduleop=Exists
Events:
 Type     Reason     Age                   From               Message
 ----     ------     ----                  ----               -------
 Normal   Scheduled  28m                   default-scheduler  Successfully assigned kube-system/metrics-server-5b6d79d4f4-ggl57 to 192.168.0.59
 Normal   Pulling    28m                   kubelet            Pulling image "192.168.0.35:5000/rancher/metrics-server:v0.3.6"
 Normal   Pulled     28m                   kubelet            Successfully pulled image "192.168.0.35:5000/rancher/metrics-server:v0.3.6" in 4.687484656s
 Warning  Unhealthy  28m                   kubelet            Readiness probe failed: Get "https://10.42.4.3:4443/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
 Warning  Unhealthy  28m                   kubelet            Liveness probe failed: Get "https://10.42.4.3:4443/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
 Warning  Unhealthy  27m                   kubelet            Readiness probe failed: Get "https://10.42.4.3:4443/readyz": dial tcp 10.42.4.3:4443: connect: connection refused
 Warning  Unhealthy  27m (x5 over 28m)     kubelet            Readiness probe failed: HTTP probe failed with statuscode: 404
 Warning  Unhealthy  27m (x5 over 28m)     kubelet            Liveness probe failed: HTTP probe failed with statuscode: 404
 Normal   Killing    27m (x2 over 27m)     kubelet            Container metrics-server failed liveness probe, will be restarted
 Normal   Created    27m (x3 over 28m)     kubelet            Created container metrics-server
 Normal   Started    27m (x3 over 28m)     kubelet            Started container metrics-server
 Normal   Pulled     8m14s (x10 over 27m)  kubelet            Container image "192.168.0.35:5000/rancher/metrics-server:v0.3.6" already present on machine
 Warning  BackOff    3m15s (x97 over 25m)  kubelet            Back-off restarting failed container
[rke@rke19-master1 ~]$



[rke@rke19-master1 ~]$ ^C
[rke@rke19-master1 ~]$ kubectl logs  metrics-server-5b6d79d4f4-ggl57 -n kube-system
I0816 23:27:20.011598       1 secure_serving.go:116] Serving securely on [::]:4443
[rke@rke19-master1 ~]$

Failed with statuscode: 404消息表明您正在查詢一個不存在的地址。

我們可以看到您正在提取 metrics-server 映像的一些 v0.3.6 標籤。雖然它來自牧場主,但我們可以假設他們堅持上游版本控制。

檢查上游的變更日誌,我們可以看到/livez/readyz在 v0.4.0 中引入,參見:https ://github.com/kubernetes-sigs/metrics-server/releases/tag/v0.4.0

我建議您嘗試查詢/healthz從 v0.4.0 中刪除的 URL。或者將您的 httpGet 探針更改為 tcpSocket 探針。或者:嘗試將指標伺服器升級到最新版本?

引用自:https://serverfault.com/questions/1074758