Kubernetes 節點指標端點返回 401
我有一個 GKE 集群,為了簡單起見,它只執行 Prometheus,監控每個成員節點。最近我最近將 API 伺服器升級到 1.6(引入了 RBAC),沒有任何問題。然後我添加了一個新節點,執行 1.6 版 kubelet。Prometheus 無法訪問此新節點的指標 API。
因此,我在命名空間中添加了和 a
ClusterRole
,並將部署配置為使用新的 ServiceAccount。然後我刪除了 pod 以進行良好的衡量:ClusterRoleBinding``ServiceAccount
apiVersion: v1 kind: ServiceAccount metadata: name: prometheus --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: [""] resources: - configmaps verbs: ["get"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: default --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: default secrets: - name: prometheus-token-xxxxx --- apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: app: prometheus-prometheus component: server release: prometheus name: prometheus-server namespace: default spec: replicas: 1 selector: matchLabels: app: prometheus-prometheus component: server release: prometheus strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: labels: app: prometheus-prometheus component: server release: prometheus spec: dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler serviceAccount: prometheus serviceAccountName: prometheus ...
但情況仍然沒有改變。
指標端點返回
HTTP/1.1 401 Unauthorized
,當我修改部署以包含另一個安裝了 bash + curl 的容器並手動發出請求時,我得到:# curl -vsSk -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" https://$NODE_IP:10250/metrics * Trying $NODE_IP... * Connected to $NODE_IP ($NODE_IP) port 10250 (#0) * found XXX certificates in /etc/ssl/certs/ca-certificates.crt * found XXX certificates in /etc/ssl/certs * ALPN, offering http/1.1 * SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256 * server certificate verification SKIPPED * server certificate status verification SKIPPED * common name: node-running-kubelet-1-6@000000000 (does not match '$NODE_IP') * server certificate expiration date OK * server certificate activation date OK * certificate public key: RSA * certificate version: #3 * subject: CN=node-running-kubelet-1-6@000000000 * start date: Fri, 07 Apr 2017 22:00:00 GMT * expire date: Sat, 07 Apr 2018 22:00:00 GMT * issuer: CN=node-running-kubelet-1-6@000000000 * compression: NULL * ALPN, server accepted to use http/1.1 > GET /metrics HTTP/1.1 > Host: $NODE_IP:10250 > User-Agent: curl/7.47.0 > Accept: */* > Authorization: Bearer **censored** > < HTTP/1.1 401 Unauthorized < Date: Mon, 10 Apr 2017 20:04:20 GMT < Content-Length: 12 < Content-Type: text/plain; charset=utf-8 < * Connection #0 to host $NODE_IP left intact
- 為什麼該令牌不允許我訪問該資源?
- 如何檢查授予 ServiceAccount 的訪問權限?
根據關於@JorritSalverda 票證的討論;https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099
由於 GKE 不允許您獲取允許您使用 kubelet 進行身份驗證的客戶端證書,因此 GKE 上使用者的最佳解決方案似乎使用 kubernetes API 伺服器作為對節點的代理請求。
為此(引用@JorritSalverda);
“對於在 GKE 中執行的 Prometheus 伺服器,我現在可以通過以下重新標記執行它:
relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc.cluster.local:443 - target_label: __scheme__ replacement: https - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics
並且以下 ClusterRole 綁定到 Prometheus 使用的服務帳戶:
apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"]
因為 GKE 集群在 RBAC 失敗的情況下仍然有 ABAC 回退,所以我不能 100% 確定這是否涵蓋了所有必需的權限。
我遇到了同樣的問題並為此創建了票https://github.com/prometheus/prometheus/issues/2606並在討論中通過 PR https://github.com/prometheus/prometheus/pull更新了配置範例/2641。
您可以在https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L76-L84查看kubernetes-nodes作業的更新重新標記
複製參考:
relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics
對於 RBAC 本身,您需要使用自己創建的服務帳戶執行 Prometheus
apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: default
確保使用以下 pod 規範將該服務帳戶傳遞到 pod 中:
spec: serviceAccount: prometheus
然後 Kubernetes 清單用於設置適當的 RBAC 角色和綁定,以使 prometheus 服務帳戶訪問https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup所需的 API 端點.yml
複製參考
apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: default --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: default
將所有清單中的命名空間替換為與您執行 Prometheus 的名稱相對應,然後使用具有集群管理員權限的帳戶應用清單。
我沒有在沒有 ABAC 回退的集群中對此進行測試,因此 RBAC 角色可能仍然缺少一些重要的東西。