Kubernetes

kube-proxy 不適用於服務集群 IP

  • March 1, 2022

我在四個執行 raspberrypi OS 11 (bullseye) arm64的樹莓派上安裝了一個 k8s 1.23.3 集群;主要通過遵循本指南

它的要點是控制平面是使用這個命令創建的

kubeadm init --token={some_token} --kubernetes-version=v1.23.3 --pod-network-cidr=10.1.0.0/16 --service-cidr=10.11.0.0/16 --control-plane-endpoint=10.0.4.16 --node-name=rpi-1-1

然後我創建了自己的kube-verify命名空間,將echo-server部署到其中,並為它創建了一個服務。

但是,**我無法從任何節點訪問服務的集群 IP。*為什麼?***請求只是超時,而對 pod 的集群 IP 的請求工作正常。

我懷疑我kube-proxy的工作不正常。以下是我到目前為止調查的內容。

$ kubectl get services -n kube-verify -o=wide

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE   SELECTOR
echo-server   ClusterIP   10.11.213.180   <none>        8080/TCP   24h   app=echo-server
$ kubectl get pods -n kube-system -o=wide

NAME                              READY   STATUS    RESTARTS      AGE   IP          NODE      NOMINATED NODE   READINESS GATES
coredns-64897985d-47gpr           1/1     Running   1 (69m ago)   41h   10.1.0.5    rpi-1-1   <none>           <none>
coredns-64897985d-nf55w           1/1     Running   1 (69m ago)   41h   10.1.0.4    rpi-1-1   <none>           <none>
etcd-rpi-1-1                      1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-apiserver-rpi-1-1            1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-controller-manager-rpi-1-1   1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-flannel-ds-5467m             1/1     Running   1 (69m ago)   28h   10.0.4.17   rpi-1-2   <none>           <none>
kube-flannel-ds-7wpvz             1/1     Running   1 (69m ago)   28h   10.0.4.18   rpi-1-3   <none>           <none>
kube-flannel-ds-9chxk             1/1     Running   1 (69m ago)   28h   10.0.4.19   rpi-1-4   <none>           <none>
kube-flannel-ds-x5rvx             1/1     Running   1 (69m ago)   29h   10.0.4.16   rpi-1-1   <none>           <none>
kube-proxy-8bbjn                  1/1     Running   1 (69m ago)   28h   10.0.4.17   rpi-1-2   <none>           <none>
kube-proxy-dw45d                  1/1     Running   1 (69m ago)   28h   10.0.4.18   rpi-1-3   <none>           <none>
kube-proxy-gkkxq                  1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-proxy-ntl5w                  1/1     Running   1 (69m ago)   28h   10.0.4.19   rpi-1-4   <none>           <none>
kube-scheduler-rpi-1-1            1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
$ kubectl logs kube-proxy-gkkxq -n kube-system

I0220 13:52:02.281289       1 node.go:163] Successfully retrieved node IP: 10.0.4.16
I0220 13:52:02.281535       1 server_others.go:138] "Detected node IP" address="10.0.4.16"
I0220 13:52:02.281610       1 server_others.go:561] "Unknown proxy mode, assuming iptables proxy" proxyMode=""
I0220 13:52:02.604880       1 server_others.go:206] "Using iptables Proxier"
I0220 13:52:02.604966       1 server_others.go:213] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0220 13:52:02.605026       1 server_others.go:214] "Creating dualStackProxier for iptables"
I0220 13:52:02.605151       1 server_others.go:491] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6"
I0220 13:52:02.606905       1 server.go:656] "Version info" version="v1.23.3"
W0220 13:52:02.614777       1 sysinfo.go:203] Nodes topology is not available, providing CPU topology
I0220 13:52:02.619535       1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072
I0220 13:52:02.620869       1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600
I0220 13:52:02.660947       1 config.go:317] "Starting service config controller"
I0220 13:52:02.661015       1 shared_informer.go:240] Waiting for caches to sync for service config
I0220 13:52:02.662669       1 config.go:226] "Starting endpoint slice config controller"
I0220 13:52:02.662726       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0220 13:52:02.762734       1 shared_informer.go:247] Caches are synced for service config 
I0220 13:52:02.762834       1 shared_informer.go:247] Caches are synced for endpoint slice config

我在這裡註意到的是Nodes topology is not available,所以我對 kube-proxy 配置進行了更多研究,但沒有什麼對我來說很突出。

如果我的集群中的節點拓撲確實存在問題,請指導我獲取有關如何解決此問題的一些資源,因為根據此錯誤消息我找不到任何有意義的東西。

$ kubectl describe configmap kube-proxy -n kube-system

Name:         kube-proxy
Namespace:    kube-system
Labels:       app=kube-proxy
Annotations:  kubeadm.kubernetes.io/component-config.hash: sha256:edce433d45f2ed3a58ee400690184ad033594e8275fdbf52e9c8c852caa7124d

Data
====
config.conf:
----
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
 acceptContentTypes: ""
 burst: 0
 contentType: ""
 kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
 qps: 0
clusterCIDR: 10.1.0.0/16
configSyncPeriod: 0s
conntrack:
 maxPerCore: null
 min: null
 tcpCloseWaitTimeout: null
 tcpEstablishedTimeout: null
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
 masqueradeAll: false
 masqueradeBit: null
 minSyncPeriod: 0s
 syncPeriod: 0s
ipvs:
 excludeCIDRs: null
 minSyncPeriod: 0s
 scheduler: ""
 strictARP: false
 syncPeriod: 0s
 tcpFinTimeout: 0s
 tcpTimeout: 0s
 udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
udpIdleTimeout: 0s
winkernel:
 enableDSR: false
 networkName: ""
 sourceVip: ""
kubeconfig.conf:
----
apiVersion: v1
kind: Config
clusters:
- cluster:
   certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
   server: https://10.0.4.16:6443
 name: default
contexts:
- context:
   cluster: default
   namespace: default
   user: default
 name: default
current-context: default
users:
- name: default
 user:
   tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

BinaryData
====

Events:  <none>
$ kubectl -n kube-system exec kube-proxy-gkkxq cat /var/lib/kube-proxy/kubeconfig.conf

kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
apiVersion: v1
kind: Config
clusters:
- cluster:
   certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
   server: https://10.0.4.16:6443
 name: default
contexts:
- context:
   cluster: default
   namespace: default
   user: default
 name: default
current-context: default
users:
- name: default
 user:
   tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

正如上面的日誌所確認的那樣,預設mode 值為。iptables

我還在所有節點上啟用了 IP 轉發。

$ sudo sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

Flannel可以通過應用儲存庫中的清單來安裝。

Flannel 可以添加到任何現有的 Kubernetes 集群中,儘管 flannel 在使用 pod 網路的任何 pod 啟動之前添加它是最簡單的。適用於 Kubernetes v1.17+ kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

正如您在此yaml文件中看到的,預設網路子網設置為10.244.0.0/16

 net-conf.json: |
   {
     "Network": "10.244.0.0/16",
     "Backend": {
       "Type": "vxlan"
     }

kubeadm init是一個初始化集群的命令,它需要為集群網路指定一個子網,並且它需要與 CNI 中的子網相同。您可以檢查更多選項

–pod-network-cidr string 指定 pod 網路的 IP 地址範圍。如果設置,控制平面將自動為每個節點分配 CIDR。

您啟動了一個集群,--pod-network-cidr=10.1.0.0/16並且您的集群的子網設置為與 flannel 清單的 yaml 文件中的子網不同"10.244.0.0/16",這就是它不起作用的原因。

有兩個選項可以修復它:

首先 - 將 flannel 配置的 yaml 中的子網更改為與集群初始化時應用的相同,在這種情況下它是--pod-network-cidr=10.1.0.0/16(參見下面的腳本)

第二個 - 如果集群用於測試目的並且剛剛被初始化,然後銷毀一個集群並從與 flannel 配置的 yaml 相同的子網開始"Network": "10.244.0.0/16"

要自動修改kube-flannel.yml,可以使用以下基於yqjq命令的腳本:

#!/bin/bash

input=$1
output=$2

echo "Converting $input to $output"

netconf=$( yq '. | select(.kind == "ConfigMap") | select(.metadata.name == "kube-flannel-cfg") | .data."net-conf.json"' "$input" | jq 'fromjson | .Network="10.1.0.0/16"' | yq -R '.' )
kube_flannel_cfg=$( yq --yaml-output '. | select(.kind == "ConfigMap") | select(.metadata.name == "kube-flannel-cfg") | .data."net-conf.json"='"$netconf" "$input" )
everything_else=$( yq --yaml-output '. | select(.kind != "ConfigMap") | select(.metadata.name != "kube-flannel-cfg")' "$input" )
echo "$kube_flannel_cfg" >> "$output"
echo '---' >> "$output"
echo "$everything_else" >> "$output"

引用自:https://serverfault.com/questions/1094255