kube-proxy 不適用於服務集群 IP
我在四個執行 raspberrypi OS 11 (bullseye) arm64的樹莓派上安裝了一個 k8s 1.23.3 集群;主要通過遵循本指南。
它的要點是控制平面是使用這個命令創建的
kubeadm init --token={some_token} --kubernetes-version=v1.23.3 --pod-network-cidr=10.1.0.0/16 --service-cidr=10.11.0.0/16 --control-plane-endpoint=10.0.4.16 --node-name=rpi-1-1
然後我創建了自己的
kube-verify
命名空間,將echo-server部署到其中,並為它創建了一個服務。但是,**我無法從任何節點訪問服務的集群 IP。*為什麼?***請求只是超時,而對 pod 的集群 IP 的請求工作正常。
我懷疑我
kube-proxy
的工作不正常。以下是我到目前為止調查的內容。$ kubectl get services -n kube-verify -o=wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR echo-server ClusterIP 10.11.213.180 <none> 8080/TCP 24h app=echo-server
$ kubectl get pods -n kube-system -o=wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-64897985d-47gpr 1/1 Running 1 (69m ago) 41h 10.1.0.5 rpi-1-1 <none> <none> coredns-64897985d-nf55w 1/1 Running 1 (69m ago) 41h 10.1.0.4 rpi-1-1 <none> <none> etcd-rpi-1-1 1/1 Running 2 (69m ago) 41h 10.0.4.16 rpi-1-1 <none> <none> kube-apiserver-rpi-1-1 1/1 Running 2 (69m ago) 41h 10.0.4.16 rpi-1-1 <none> <none> kube-controller-manager-rpi-1-1 1/1 Running 2 (69m ago) 41h 10.0.4.16 rpi-1-1 <none> <none> kube-flannel-ds-5467m 1/1 Running 1 (69m ago) 28h 10.0.4.17 rpi-1-2 <none> <none> kube-flannel-ds-7wpvz 1/1 Running 1 (69m ago) 28h 10.0.4.18 rpi-1-3 <none> <none> kube-flannel-ds-9chxk 1/1 Running 1 (69m ago) 28h 10.0.4.19 rpi-1-4 <none> <none> kube-flannel-ds-x5rvx 1/1 Running 1 (69m ago) 29h 10.0.4.16 rpi-1-1 <none> <none> kube-proxy-8bbjn 1/1 Running 1 (69m ago) 28h 10.0.4.17 rpi-1-2 <none> <none> kube-proxy-dw45d 1/1 Running 1 (69m ago) 28h 10.0.4.18 rpi-1-3 <none> <none> kube-proxy-gkkxq 1/1 Running 2 (69m ago) 41h 10.0.4.16 rpi-1-1 <none> <none> kube-proxy-ntl5w 1/1 Running 1 (69m ago) 28h 10.0.4.19 rpi-1-4 <none> <none> kube-scheduler-rpi-1-1 1/1 Running 2 (69m ago) 41h 10.0.4.16 rpi-1-1 <none> <none>
$ kubectl logs kube-proxy-gkkxq -n kube-system I0220 13:52:02.281289 1 node.go:163] Successfully retrieved node IP: 10.0.4.16 I0220 13:52:02.281535 1 server_others.go:138] "Detected node IP" address="10.0.4.16" I0220 13:52:02.281610 1 server_others.go:561] "Unknown proxy mode, assuming iptables proxy" proxyMode="" I0220 13:52:02.604880 1 server_others.go:206] "Using iptables Proxier" I0220 13:52:02.604966 1 server_others.go:213] "kube-proxy running in dual-stack mode" ipFamily=IPv4 I0220 13:52:02.605026 1 server_others.go:214] "Creating dualStackProxier for iptables" I0220 13:52:02.605151 1 server_others.go:491] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6" I0220 13:52:02.606905 1 server.go:656] "Version info" version="v1.23.3" W0220 13:52:02.614777 1 sysinfo.go:203] Nodes topology is not available, providing CPU topology I0220 13:52:02.619535 1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072 I0220 13:52:02.620869 1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600 I0220 13:52:02.660947 1 config.go:317] "Starting service config controller" I0220 13:52:02.661015 1 shared_informer.go:240] Waiting for caches to sync for service config I0220 13:52:02.662669 1 config.go:226] "Starting endpoint slice config controller" I0220 13:52:02.662726 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config I0220 13:52:02.762734 1 shared_informer.go:247] Caches are synced for service config I0220 13:52:02.762834 1 shared_informer.go:247] Caches are synced for endpoint slice config
我在這裡註意到的是
Nodes topology is not available
,所以我對 kube-proxy 配置進行了更多研究,但沒有什麼對我來說很突出。如果我的集群中的節點拓撲確實存在問題,請指導我獲取有關如何解決此問題的一些資源,因為根據此錯誤消息我找不到任何有意義的東西。
$ kubectl describe configmap kube-proxy -n kube-system Name: kube-proxy Namespace: kube-system Labels: app=kube-proxy Annotations: kubeadm.kubernetes.io/component-config.hash: sha256:edce433d45f2ed3a58ee400690184ad033594e8275fdbf52e9c8c852caa7124d Data ==== config.conf: ---- apiVersion: kubeproxy.config.k8s.io/v1alpha1 bindAddress: 0.0.0.0 bindAddressHardFail: false clientConnection: acceptContentTypes: "" burst: 0 contentType: "" kubeconfig: /var/lib/kube-proxy/kubeconfig.conf qps: 0 clusterCIDR: 10.1.0.0/16 configSyncPeriod: 0s conntrack: maxPerCore: null min: null tcpCloseWaitTimeout: null tcpEstablishedTimeout: null detectLocalMode: "" enableProfiling: false healthzBindAddress: "" hostnameOverride: "" iptables: masqueradeAll: false masqueradeBit: null minSyncPeriod: 0s syncPeriod: 0s ipvs: excludeCIDRs: null minSyncPeriod: 0s scheduler: "" strictARP: false syncPeriod: 0s tcpFinTimeout: 0s tcpTimeout: 0s udpTimeout: 0s kind: KubeProxyConfiguration metricsBindAddress: "" mode: "" nodePortAddresses: null oomScoreAdj: null portRange: "" showHiddenMetricsForVersion: "" udpIdleTimeout: 0s winkernel: enableDSR: false networkName: "" sourceVip: "" kubeconfig.conf: ---- apiVersion: v1 kind: Config clusters: - cluster: certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt server: https://10.0.4.16:6443 name: default contexts: - context: cluster: default namespace: default user: default name: default current-context: default users: - name: default user: tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token BinaryData ==== Events: <none>
$ kubectl -n kube-system exec kube-proxy-gkkxq cat /var/lib/kube-proxy/kubeconfig.conf kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. apiVersion: v1 kind: Config clusters: - cluster: certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt server: https://10.0.4.16:6443 name: default contexts: - context: cluster: default namespace: default user: default name: default current-context: default users: - name: default user: tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
正如上面的日誌所確認的那樣,預設
mode
值為。iptables
我還在所有節點上啟用了 IP 轉發。
$ sudo sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 1
Flannel可以通過應用儲存庫中的清單來安裝。
Flannel 可以添加到任何現有的 Kubernetes 集群中,儘管
flannel
在使用 pod 網路的任何 pod 啟動之前添加它是最簡單的。適用於 Kubernetes v1.17+kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
正如您在此
yaml
文件中看到的,預設網路子網設置為10.244.0.0/16
net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" }
kubeadm init
是一個初始化集群的命令,它需要為集群網路指定一個子網,並且它需要與 CNI 中的子網相同。您可以檢查更多選項。–pod-network-cidr string 指定 pod 網路的 IP 地址範圍。如果設置,控制平面將自動為每個節點分配 CIDR。
您啟動了一個集群,
--pod-network-cidr=10.1.0.0/16
並且您的集群的子網設置為與 flannel 清單的 yaml 文件中的子網不同"10.244.0.0/16"
,這就是它不起作用的原因。有兩個選項可以修復它:
首先 - 將 flannel 配置的 yaml 中的子網更改為與集群初始化時應用的相同,在這種情況下它是
--pod-network-cidr=10.1.0.0/16
(參見下面的腳本)或
第二個 - 如果集群用於測試目的並且剛剛被初始化,然後銷毀一個集群並從與 flannel 配置的 yaml 相同的子網開始
"Network": "10.244.0.0/16"
要自動修改
kube-flannel.yml
,可以使用以下基於yq
和jq
命令的腳本:#!/bin/bash input=$1 output=$2 echo "Converting $input to $output" netconf=$( yq '. | select(.kind == "ConfigMap") | select(.metadata.name == "kube-flannel-cfg") | .data."net-conf.json"' "$input" | jq 'fromjson | .Network="10.1.0.0/16"' | yq -R '.' ) kube_flannel_cfg=$( yq --yaml-output '. | select(.kind == "ConfigMap") | select(.metadata.name == "kube-flannel-cfg") | .data."net-conf.json"='"$netconf" "$input" ) everything_else=$( yq --yaml-output '. | select(.kind != "ConfigMap") | select(.metadata.name != "kube-flannel-cfg")' "$input" ) echo "$kube_flannel_cfg" >> "$output" echo '---' >> "$output" echo "$everything_else" >> "$output"