Monitoring

kubectl 頂部節點不起作用。看起來像 heapster 的問題

  • April 24, 2019

我在 gke 上有一個新的 k8s 集群。

每當我跑步時,kubectl top node gke-data-custom-vm-6-25-0cbae9b9-hrkc 我都會得到

Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

同時我有這個服務:

> kubectl -n kube-system get services
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default-http-backend   NodePort    10.11.241.20    <none>        80:32688/TCP    59d
heapster               ClusterIP   10.11.245.182   <none>        80/TCP          59d
kube-dns               ClusterIP   10.11.240.10    <none>        53/UDP,53/TCP   59d
metrics-server         ClusterIP   10.11.249.26    <none>        443/TCP         59d

並且一個帶有 heapster 的 pod 正在執行(我可以看到它被重新啟動了很多次)

kubectl -n kube-system get pods
NAME                                               READY     STATUS    RESTARTS   AGE
event-exporter-v0.2.3-85644fcdf-kwd6g              2/2       Running   0          16d
fluentd-gcp-scaler-8b674f786-dbrcr                 1/1       Running   0          16d
fluentd-gcp-v3.2.0-2fqgl                           2/2       Running   0          17d
fluentd-gcp-v3.2.0-47586                           2/2       Running   0          17d
fluentd-gcp-v3.2.0-552xm                           2/2       Running   0          16d
heapster-v1.6.0-beta.1-fdc7fd478-8s998             3/3       Running   73         16d

但是我可以在 heapster-nanny 容器的日誌中看到一些錯誤:

> kubectl logs -n kube-system --tail 10 -f po/heapster-v1.6.0-beta.1-fdc7fd478-8s998 -c heapster-nanny
ERROR: logging before flag.Parse: E0418 23:30:10.075539       1 nanny_lib.go:95] Error while querying apiserver for resources: Get https://10.11.240.1:443/api/v1/namespaces/kube-system/pods/heapster-v1.6.0-beta.1-fdc7fd478-8s998: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:10.971230       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:11.972337       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:12.973637       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:13.975024       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:14.976582       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:16.063760       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:27.065693       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: net/http: TLS handshake timeout
ERROR: logging before flag.Parse: E0418 23:30:30.077159       1 nanny_lib.go:95] Error while querying apiserver for resources: Get https://10.11.240.1:443/api/v1/namespaces/kube-system/pods/heapster-v1.6.0-beta.1-fdc7fd478-8s998: net/http: TLS handshake timeout
ERROR: logging before flag.Parse: E0418 23:30:59.778560       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: i/o timeout

以及在 heapster 容器中

I0423 07:02:10.765134       1 heapster.go:113] Starting heapster on port 8082
W0423 07:16:27.975467       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:16:43.064110       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:20:36.875359       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:20:44.383790       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:22:29.683060       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:22:40.278962       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:31:27.072711       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:31:54.580031       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time

我怎樣才能解決這個問題?

我應該提供任何其他資訊嗎?

棄用 Heapster

Heapster 是一個已棄用的項目,在最近的 Kubernetes 版本中執行時可能會出現問題。

請參閱Heapster 棄用時間表

| Kubernetes Release  | Action              | Policy/Support                                                                   |
|---------------------|---------------------|----------------------------------------------------------------------------------|
| Kubernetes 1.11     | Initial Deprecation | No new features or sinks are added.  Bugfixes may be made.                       |
| Kubernetes 1.12     | Setup Removal       | The optional to install Heapster via the Kubernetes setup script is removed.     |
| Kubernetes 1.13     | Removal             | No new bugfixes will be made.  Move to kubernetes-retired organization.          |

從 Kubernetes v1.10 開始,預設情況下kubectl top依賴於metrics-server

CHANGELOG-1.10.md:

  • kubectl top在命令中支持指標 API 。(#56206,@brancz)

此 PR 實現了對kubectl top將 metrics-server 用作聚合 API 的命令的支持,而不是直接從 heapster 請求指標。如果metrics.k8s.ioapiserver 不提供 API,那麼這仍然會退回到以前的行為。


你應該做什麼:

由於Heapster已被棄用,並且您已經部署了metrics-server,最好的選擇是使用kubectl版本v1.10或更高版本,因為它從 metrics-server 獲取指標。

但是,請注意kubectl版本傾斜策略

kubectl在一個次要版本(較舊或較新)中受支持 kube-apiserver

kube-apiserver在選擇您的版本之前檢查您的kubectl版本。

引用自:https://serverfault.com/questions/964167