在 coreos 上執行 calico rkt 容器時出現“EtcdException:無法獲取伺服器列表”
我有兩台coreos stable v1122.2.0機器,每台都配置了tls的etcd2。
我使用https://github.com/coreos/etcd/tree/master/hack/tls-setup創建了證書。
現在我正在嘗試配置 calico-node 以使用 rkt 在我的 coreos 主節點上執行。
我在 cloud-config 配置中有以下內容:
write_files: - path: "/etc/kubernetes/cni/net.d/10-calico.conf" content: | { "name": "calico", "type": "flannel", "delegate": { "type": "calico", "etcd_endpoints": "https://10.79.218.2:2379,https://10.79.218.3:2379", "log_level": "none", "log_level_stderr": "info", "hostname": "10.79.218.2", "policy": { "type": "k8s", "k8s_api_root": "http://127.0.0.1:8080/api/v1/" } } } - path: "/etc/kubernetes/manifests/policy-controller.yaml" content: | apiVersion: v1 kind: Pod metadata: name: calico-policy-controller namespace: calico-system spec: hostNetwork: true containers: # The Calico policy controller. - name: k8s-policy-controller image: calico/kube-policy-controller:v0.2.0 env: - name: ETCD_ENDPOINTS value: "https://10.79.218.2:2379,https://10.79.218.3:2379" - name: K8S_API value: "http://127.0.0.1:8080" - name: LEADER_ELECTION value: "true" # Leader election container used by the policy controller. - name: leader-elector image: quay.io/calico/leader-elector:v0.1.0 imagePullPolicy: IfNotPresent args: - "--election=calico-policy-election" - "--election-namespace=calico-system" - "--http=127.0.0.1:4040" ... units: - name: calico-node.service enable: true command: start content: | [Unit] Description=Calico per-host agent Requires=network-online.target After=network-online.target [Service] Slice=machine.slice Environment=CALICO_DISABLE_FILE_LOGGING=true Environment=HOSTNAME=10.79.218.2 Environment=IP=10.79.218.2 Environment=FELIX_FELIXHOSTNAME=10.79.218.2 Environment=CALICO_NETWORKING=false Environment=NO_DEFAULT_POOLS=true Environment=ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379 ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \ --volume=modules,kind=host,source=/lib/modules,readOnly=false \ --mount=volume=modules,target=/lib/modules \ --trust-keys-from-https quay.io/calico/node:v0.19.0 KillMode=mixed Restart=always TimeoutStartSec=0 [Install] WantedBy=multi-user.target
請忽略空格縮進..我認為我沒有正確複製/粘貼它:)
當我嘗試啟動 calico-node 服務時,出現以下錯誤:
Sep 14 05:45:17 localhost systemd[1]: Started Calico per-host agent. Sep 14 05:45:17 localhost rkt[1644]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci Sep 14 05:45:18 localhost rkt[1644]: image: using image from local store for image name quay.io/calico/node:v0.19.0 Sep 14 05:45:25 localhost rkt[1644]: Traceback (most recent call last): Sep 14 05:45:25 localhost rkt[1644]: File "startup.py", line 292, in <module> Sep 14 05:45:25 localhost rkt[1644]: client = IPAMClient() Sep 14 05:45:25 localhost rkt[1644]: File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__ Sep 14 05:45:25 localhost rkt[1644]: "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca)) Sep 14 05:45:25 localhost rkt[1644]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m Sep 14 05:45:25 localhost rkt[1644]: Calico node failed to start Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Main process exited, code=exited, status=1/FAILURE Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Unit entered failed state. Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Failed with result 'exit-code'. Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Service hold-off time over, scheduling restart. Sep 14 05:45:25 localhost systemd[1]: Stopped Calico per-host agent. Sep 14 05:45:25 localhost systemd[1]: Started Calico per-host agent. Sep 14 05:45:25 localhost rkt[1714]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci Sep 14 05:45:26 localhost rkt[1714]: image: using image from local store for image name quay.io/calico/node:v0.19.0 Sep 14 05:45:28 localhost rkt[1714]: Traceback (most recent call last): Sep 14 05:45:28 localhost rkt[1714]: File "startup.py", line 292, in <module> Sep 14 05:45:28 localhost rkt[1714]: client = IPAMClient() Sep 14 05:45:28 localhost rkt[1714]: File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__ Sep 14 05:45:28 localhost rkt[1714]: "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca)) Sep 14 05:45:28 localhost rkt[1714]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m
第 2-25 行
所以我明白了
Invalid ETCD_CA_CERT_FILE.
。我並沒有真正向 calico 指定要使用的鍵..所以我想我缺少一些配置。我在 /etc/ssl/etcd 有以下等相關的鍵
8 -rw-------. 1 etcd etcd 1050 Sep 14 05:45 ca.pem 8 -rw-------. 1 etcd etcd 289 Sep 14 05:45 etcd1-key.pem 8 -rw-------. 1 etcd etcd 1058 Sep 14 05:45 etcd1.pem 8 -rw-------. 1 etcd etcd 227 Sep 12 03:49 server1-key.pem 8 -rw-------. 1 etcd etcd 822 Sep 12 03:49 server1.pem
我嘗試添加
Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
到 calico-node systemd 文件,但得到完全相同的結果。有任何想法嗎 ?
更新
所以我嘗試手動執行 calico,而不是使用 systemd。我還添加了 calico 所需的所有環境變數
export CALICO_DISABLE_FILE_LOGGING=true export HOSTNAME=10.79.218.2 export IP=10.79.218.2 export FELIX_FELIXHOSTNAME=10.79.218.2 export CALICO_NETWORKING=false export NO_DEFAULT_POOLS=true export ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379 export ETCD_AUTHORITY=10.79.218.2:2379 export ETCD_SCHEME=https export ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem export ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem export ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem
當我嘗試使用以下命令執行印花布容器時:
/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \ --volume=modules,kind=host,source=/lib/modules,readOnly=false \ --mount=volume=modules,target=/lib/modules \ --trust-keys-from-https quay.io/calico/node:v0.19.0
我明白了
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci image: using image from local store for image name quay.io/calico/node:v0.19.0 Traceback (most recent call last): File "startup.py", line 292, in <module> client = IPAMClient() File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 221, in __init__ ETCD_CERT_FILE_ENV, etcd_cert)) pycalico.datastore_errors.DataStoreError: Cannot read ETCD_KEY_FILE and/or ETCD_CERT_FILE. Both must be readable file paths. Values provided: ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem, ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
我將證書文件的文件權限更改為 666,但這並不能解決問題。而且我知道這些證書是有效的,因為 etcd tls 可以正常工作。所以我錯過了什麼?
更新 2
看來我缺少將證書目錄安裝在印花布容器上。
所以現在我正在執行印花布容器
/usr/bin/rkt run --volume etcd-ssl,kind=host,source=/etc/ssl/etcd/,readOnly=true --inherit-env --stage1-from-dir=stage1-fly.aci --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --trust-keys-from-https quay.io/calico/node:v0.19.0 --mount volume=etcd-ssl,target=/etc/ssl/etcd
我得到以下輸出:
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci image: using image from local store for image name quay.io/calico/node:v0.19.0 Traceback (most recent call last): File "startup.py", line 292, in <module> client = IPAMClient() File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 246, in __init__ allow_reconnect=True) File "/usr/lib/python2.7/site-packages/etcd/client.py", line 204, in __init__ set(self.machines)) File "/usr/lib/python2.7/site-packages/etcd/client.py", line 299, in machines return self.machines File "/usr/lib/python2.7/site-packages/etcd/client.py", line 301, in machines raise etcd.EtcdException("Could not get the list of servers, " etcd.EtcdException: Could not get the list of servers, maybe you provided the wrong host(s) to connect to? Calico node failed to start
我有點接近..但仍然沒有解決方案。
更新 3
我嘗試通過執行將 ETCD_ENDPOINTS 設置為 coreos 機器上的 etcd 伺服器
export ETCD_ENDPOINTS=https://10.79.218.2:2379
,現在當我嘗試執行 calico rkt 映像時,我得到:image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci image: using image from local store for image name quay.io/calico/node:v0.19.0 Traceback (most recent call last): File "startup.py", line 295, in <module> main() File "startup.py", line 251, in main warn_if_hostname_conflict(ip) File "startup.py", line 192, in warn_if_hostname_conflict current_ipv4, _ = client.get_host_bgp_ips(hostname) File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 132, in wrapped "running?" % (fn.__name__, e.message)) pycalico.datastore_errors.DataStoreError: get_host_bgp_ips: Error accessing etcd (Connection to etcd failed due to SSLError(CertificateError("hostname '10.79.218.2' doesn't match u'etcd'",),)). Is etcd running? Calico node failed to start
我也遇到了這個問題,最終通過查看 etcd 連接邏輯和使用的庫的程式碼以及 Calico 團隊在他們的 Slack 頻道中的一些指針找到了問題的根源。
問題是因為 Calico 的目前版本(至少高達 0.22.0)使用 Python etcd 客戶端,該客戶端不支持 TLS 證書中的 IP SAN(Subject Alt Name)。這意味著您正在使用的證書無法正確地與配置它們的 etcd 伺服器相關聯。
這在此GitHub 問題中有所描述。
要解決此問題,您必須等到 urllib 庫的新版本發布,它被 etcd 客戶端拾取,並發布新版本,然後 Calico 更新以使用新的 etcd 客戶端。或者,您可以使用 FQDN 而不是 SAN 欄位中的 IP 地址重新生成證書。這意味著您需要確保可以通過這些名稱訪問您的伺服器,無論是使用 DNS 還是
/etc/hosts
正確設置。用於生成證書的 OpenSSL 配置應包含如下內容:[alt_names] DNS.1 = $ENV::FQDN
描述您如何生成證書的連結使用CFSSL,因此我建議閱讀其有關如何更改為使用主機名而不是 IP 地址的文件。我相信它可能就像修改JSON配置一樣簡單,如下所示:
"hosts": [ "example.com", "www.example.com" ],