Kubernetes 節點指標端點返回 401

May 10, 2017

我有一個 GKE 集群，為了簡單起見，它只執行 Prometheus，監控每個成員節點。最近我最近將 API 伺服器升級到 1.6（引入了 RBAC），沒有任何問題。然後我添加了一個新節點，執行 1.6 版 kubelet。Prometheus 無法訪問此新節點的指標 API。

因此，我在命名空間中添加了和 a ClusterRole，並將部署配置為使用新的 ServiceAccount。然後我刪除了 pod 以進行良好的衡量：ClusterRoleBinding``ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
 name: prometheus
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
 name: prometheus
rules:
- apiGroups: [""]
 resources:
 - nodes
 - services
 - endpoints
 - pods
 verbs: ["get", "list", "watch"]
- apiGroups: [""]
 resources:
 - configmaps
 verbs: ["get"]
- nonResourceURLs: ["/metrics"]
 verbs: ["get"]
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
 name: prometheus
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: ClusterRole
 name: prometheus
subjects:
- kind: ServiceAccount
 name: prometheus
 namespace: default
---

apiVersion: v1
kind: ServiceAccount
metadata:
 name: prometheus
 namespace: default
secrets:
- name: prometheus-token-xxxxx

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 labels:
   app: prometheus-prometheus
   component: server
   release: prometheus
 name: prometheus-server
 namespace: default
spec:
 replicas: 1
 selector:
   matchLabels:
     app: prometheus-prometheus
     component: server
     release: prometheus
 strategy:
   rollingUpdate:
     maxSurge: 1
     maxUnavailable: 1
   type: RollingUpdate
 template:
   metadata:
     labels:
       app: prometheus-prometheus
       component: server
       release: prometheus
   spec:
     dnsPolicy: ClusterFirst
     restartPolicy: Always
     schedulerName: default-scheduler
     serviceAccount: prometheus
     serviceAccountName: prometheus
     ...

但情況仍然沒有改變。

指標端點返回HTTP/1.1 401 Unauthorized，當我修改部署以包含另一個安裝了 bash + curl 的容器並手動發出請求時，我得到：

# curl -vsSk -H "Authorization: Bearer $(&lt;/var/run/secrets/kubernetes.io/serviceaccount/token)" https://$NODE_IP:10250/metrics
*   Trying $NODE_IP...
* Connected to $NODE_IP ($NODE_IP) port 10250 (#0)
* found XXX certificates in /etc/ssl/certs/ca-certificates.crt
* found XXX certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*    server certificate verification SKIPPED
*    server certificate status verification SKIPPED
*    common name: node-running-kubelet-1-6@000000000 (does not match '$NODE_IP')
*    server certificate expiration date OK
*    server certificate activation date OK
*    certificate public key: RSA
*    certificate version: #3
*    subject: CN=node-running-kubelet-1-6@000000000
*    start date: Fri, 07 Apr 2017 22:00:00 GMT
*    expire date: Sat, 07 Apr 2018 22:00:00 GMT
*    issuer: CN=node-running-kubelet-1-6@000000000
*    compression: NULL
* ALPN, server accepted to use http/1.1
> GET /metrics HTTP/1.1
> Host: $NODE_IP:10250
> User-Agent: curl/7.47.0
> Accept: */*
> Authorization: Bearer **censored**
>
&lt; HTTP/1.1 401 Unauthorized
&lt; Date: Mon, 10 Apr 2017 20:04:20 GMT
&lt; Content-Length: 12
&lt; Content-Type: text/plain; charset=utf-8
&lt;
* Connection #0 to host $NODE_IP left intact

為什麼該令牌不允許我訪問該資源？
如何檢查授予 ServiceAccount 的訪問權限？

根據關於@JorritSalverda 票證的討論；https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099
由於 GKE 不允許您獲取允許您使用 kubelet 進行身份驗證的客戶端證書，因此 GKE 上使用者的最佳解決方案似乎使用 kubernetes API 伺服器作為對節點的代理請求。
為此（引用@JorritSalverda）；
“對於在 GKE 中執行的 Prometheus 伺服器，我現在可以通過以下重新標記執行它：
relabel_configs:
- action: labelmap
 regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
 replacement: kubernetes.default.svc.cluster.local:443
- target_label: __scheme__
 replacement: https
- source_labels: [__meta_kubernetes_node_name]
 regex: (.+)
 target_label: __metrics_path__
 replacement: /api/v1/nodes/${1}/proxy/metrics
並且以下 ClusterRole 綁定到 Prometheus 使用的服務帳戶：
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
 name: prometheus
rules:
- apiGroups: [""]
 resources:
 - nodes
 - nodes/proxy
 - services
 - endpoints
 - pods
 verbs: ["get", "list", "watch"]
因為 GKE 集群在 RBAC 失敗的情況下仍然有 ABAC 回退，所以我不能 100% 確定這是否涵蓋了所有必需的權限。

我遇到了同樣的問題並為此創建了票https://github.com/prometheus/prometheus/issues/2606並在討論中通過 PR https://github.com/prometheus/prometheus/pull更新了配置範例/2641。
您可以在https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L76-L84查看kubernetes-nodes作業的更新重新標記
複製參考：
 relabel_configs:
 - action: labelmap
   regex: __meta_kubernetes_node_label_(.+)
 - target_label: __address__
   replacement: kubernetes.default.svc:443
 - source_labels: [__meta_kubernetes_node_name]
   regex: (.+)
   target_label: __metrics_path__
   replacement: /api/v1/nodes/${1}/proxy/metrics
對於 RBAC 本身，您需要使用自己創建的服務帳戶執行 Prometheus
apiVersion: v1
kind: ServiceAccount
metadata:
 name: prometheus
 namespace: default
確保使用以下 pod 規範將該服務帳戶傳遞到 pod 中：
spec:
 serviceAccount: prometheus
然後 Kubernetes 清單用於設置適當的 RBAC 角色和綁定，以使 prometheus 服務帳戶訪問https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup所需的 API 端點.yml
複製參考
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
 name: prometheus
rules:
- apiGroups: [""]
 resources:
 - nodes
 - nodes/proxy
 - services
 - endpoints
 - pods
 verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
 verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
 name: prometheus
 namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
 name: prometheus
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: ClusterRole
 name: prometheus
subjects:
- kind: ServiceAccount
 name: prometheus
 namespace: default
將所有清單中的命名空間替換為與您執行 Prometheus 的名稱相對應，然後使用具有集群管理員權限的帳戶應用清單。
我沒有在沒有 ABAC 回退的集群中對此進行測試，因此 RBAC 角色可能仍然缺少一些重要的東西。

引用自：https://serverfault.com/questions/843751

Kubernetes 節點指標端點返回 401

相關問答

如何在 Google Container Engine 隨附的預設 Google 負載均衡器上啟用 HSTS？

無法從 k8s pod 發出 https 請求

kubectl auth can-i 說我可以，但我不能

管理員應該在 kubernetes 網路中強制執行 HTTPS，還是只針對外部流量（通過入口）？

AWS kubernetes 負載均衡器在埠 443 上終止 SSL 並轉發到埠 80 上的服務

在 Kubernetes/Ingress 和 Google Cloud Platform 上執行 HTTPS SSE 伺服器