Linux

如何從 systemd 在 docker 中啟動 etcd?

  • August 19, 2021

我想從 systemd 在 docker 中啟動 etcd(單節點),但似乎出了點問題 - 它在啟動後大約 30 秒被終止。

看起來服務以**“activating”狀態啟動,但在大約 30 秒後終止,但未達到“active”**狀態。也許 docker 容器和 systemd 之間缺少任何信號?

更新(見文章底部):systemd 服務狀態達到failed (Result: timeout)- 當我刪除Restart=on-failure指令時。

當我在啟動後檢查 etcd 服務的狀態時,我得到了這個結果:

$ sudo systemctl status etcd● etcd.service - etcd   Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
  Active: activating (auto-restart) (Result: exit-code) since Wed 2021-08-18 20:13:30 UTC; 4s ago
 Process: 2971 ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd0 --advertise-client-urls http://10.0.0.11:2379 --listen-client-urls http://0.0.0.0:2379 --initial-advertise-peer-urls http://10.0.0.11:2380 --listen-peer-urls http://0.0.0.0:2380 --initial-cluster etcd0=http://10.0.0.11:2380 (code=exited, status=125)
Main PID: 2971 (code=exited, status=125)

我在 Amazon Linux 2 機器上執行它,並在啟動時執行使用者數據腳本。我已經確認docker.servicedocker_ecr_login.service成功執行。

機器啟動後不久,我可以看到 etcd 正在執行:

sudo systemctl status etcd
● etcd.service - etcd
  Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
  Active: activating (start) since Wed 2021-08-18 20:30:07 UTC; 1min 20s ago
Main PID: 1573 (docker)
   Tasks: 9
  Memory: 24.3M
  CGroup: /system.slice/etcd.service
          └─1573 /usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com...

Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.690Z","logger":"raft","caller":"...rm 2"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.691Z","caller":"etcdserver/serve..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"membership/clust..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"etcdserver/server.go:2...
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"api/capability.g..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"etcdserver/serve..."3.5"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.693Z","caller":"embed/serve.go:9...ests"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.695Z","caller":"etcdmain/main.go...emon"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.695Z","caller":"etcdmain/main.go...emon"}
Aug 18 20:30:17 ip-10-0-0-11.eu-north-1.compute.internal docker[1573]: {"level":"info","ts":"2021-08-18T20:30:17.702Z","caller":"embed/serve.go:1...2379"}
Hint: Some lines were ellipsized, use -l to show in full.

無論 etcd 監聽節點 IP (10.0.0.11) 還是 127.0.0.1,我都會得到相同的行為。

我可以在本地執行 etcd,從命令行開始(它不會在 30 秒後終止),使用:

sudo docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd-local \
my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 \
/usr/local/bin/etcd --data-dir=/etcd-data \
--name etcd0 \
--advertise-client-urls http://127.0.0.1:2379 \
--listen-client-urls http://0.0.0.0:2379 \
--initial-advertise-peer-urls http://127.0.0.1:2380 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-cluster etcd0=http://127.0.0.1:2380

etcd 的參數類似於執行單節點 etcd-ectd 3.5 文件

這是用於啟動 etcd 的啟動腳本的相關部分:

sudo docker volume create --name etcd-data

cat <<EOF | sudo tee /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=docker_ecr_login.service

[Service]
Type=notify
ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data \
--name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 \
/usr/local/bin/etcd --data-dir=/etcd-data \
--name etcd0 \
--advertise-client-urls http://10.0.0.11:2379 \
--listen-client-urls http://0.0.0.0:2379 \
--initial-advertise-peer-urls http://10.0.0.11:2380 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-cluster etcd0=http://10.0.0.11:2380
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable etcd
sudo systemctl start etcd

列出機器上的所有容器時,我可以看到它一直在執行:

sudo docker ps -a
CONTAINER ID   IMAGE                                                       COMMAND                  CREATED          STATUS                      PORTS                          NAMES
a744aed0beb1   my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0   "/usr/local/bin/etcd…"   25 minutes ago   Exited (0) 24 minutes ago                          etcd

但我懷疑它無法重新啟動,因為容器名稱已經存在。

**從 systemd 啟動時,為什麼 etcd 容器會在大約 30 秒後終止?**看起來它成功啟動了,但 systemd 只顯示它處於“啟動”狀態,但從未處於“啟動”狀態,並且它似乎在大約 30 秒後終止。從 etcd docker 容器到 systemd 是否缺少一些信號?如果是這樣,我怎樣才能讓那個信號正確?


更新:

刪除Restart=on-failure服務單元文件中的指令後,我現在得到 status: failed (Result: timeout)

$ sudo systemctl status etcd
● etcd.service - etcd
  Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
  Active: failed (Result: timeout) since Wed 2021-08-18 21:35:54 UTC; 5min ago
 Process: 1567 ExecStart=/usr/bin/docker run -p 2380:2380 -p 2379:2379 --volume=etcd-data:/etcd-data --name etcd my-aws-account.dkr.ecr.eu-north-1.amazonaws.com/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd0 --advertise-client-urls http://127.0.0.1:2379 --listen-client-urls http://0.0.0.0:2379 --initial-advertise-peer-urls http://127.0.0.1:2380 --listen-peer-urls http://0.0.0.0:2380 --initial-cluster etcd0=http://127.0.0.1:2380 (code=exited, status=0/SUCCESS)
Main PID: 1567 (code=exited, status=0/SUCCESS)

Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.332Z","caller":"osutil/interrupt...ated"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.333Z","caller":"embed/etcd.go:36...379"]}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: WARNING: 2021/08/18 21:35:54 [core] grpc: addrConn.createTransport failed ...ing...
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.335Z","caller":"etcdserver/serve...6a6c"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.337Z","caller":"embed/etcd.go:56...2380"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.338Z","caller":"embed/etcd.go:56...2380"}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal docker[1567]: {"level":"info","ts":"2021-08-18T21:35:54.339Z","caller":"embed/etcd.go:36...379"]}
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: Failed to start etcd.
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: Unit etcd.service entered failed state.
Aug 18 21:35:54 ip-10-0-0-11.eu-north-1.compute.internal systemd[1]: etcd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

更新:發布測試數據並根據收到的評論整合更新。正如最初所想的那樣,systemd 集成不需要 docker -d 。根據我的經驗,Michael 指出的 Type= 設置似乎比將服務的守護程序狀態解除安裝到 docker 更重要。正如我最初解釋的那樣,OP 問題乍一看似乎是沒有背景的副作用。在進一步測試後,這個背景似乎無關緊要。

請注意,OP 中使用的 Amazon AWS 映像不是我可以測試或直接排除故障的。此處顯示了 etcd 和 systemd 的對比範例,以幫助配置類似於我的端點系統。系統詳情:

  • Ubuntu 20.04 LTS
  • 碼頭工人 20.10.7
  • etcd 3.5.0

系統配置

我最終得到了以下 systemd 服務文件。請注意 Type=simple,因為 Michael 建議在回復中澄清這一點(顯然,我自己對這塊拼圖的理解)。您可以在此處了解有關 systemd 類型的更多資訊:

https://www.freedesktop.org/software/systemd/man/systemd.service.html

類型很重要;更重要的是,我最初對 simple as type 的理解是短視地關注缺乏與 systemd 的通信,這導致我忽略了類型設置對來自被呼叫應用程序的響應所做的適用行為(在這種情況下碼頭工人)。

刪除類型或將類型添加到簡單類型,無論如何都會導致相同的行為。我的測試中的以下配置工作可靠,在 docker run 命令中是否存在 -d 也是如此:

[Unit]
Description=Docker container-etcd.service
Documentation=man:docker
Requires=docker.service
Wants=network.target
After=network-online.target

[Service]
ExecStartPre=- /usr/bin/docker stop etcd
ExecStartPre=- /usr/bin/docker rm etcd
ExecStart=docker run --rm -d -p 2379:2379 -p 2380:2380 --volume=/home/user/etcd-data:/etcd-data --name etcd quay.io/coreos/etcd:v3.5.0 /usr/local/bin/etcd --data-dir=/etcd-data --name etcd --initial-advertise-peer-urls http://10.4.4.132:2380 --listen-peer-urls http://0.0.0.0:2380 --advertise-client-urls http://10.4.4.132:2379 --listen-client-urls http://0.0.0.0:2379 --initial-cluster etcd=http://10.4.4.132:2380
ExecStop=/usr/bin/docker stop etcd -t 10
ExecRestart=/usr/bin/docker restart etcd
KillMode=none
RemainAfterExit=1
Restart=on-failure
Type=simple

[Install]
WantedBy=multi-user.target default.target

筆記

  • 添加了 RemainAfterExit,因為 systemd 將認為服務在啟動後退出,如果不存在;缺少此佈爾值會造成看似錯誤的情況,即docker ps顯示容器正在執行,但systemctl status container-etcd顯示為已退出且處於非活動狀態。
  • systemd 單元文件在語法上有些不正確。%n 通常用於 Exec 行以引用服務名稱(如 …docker restart %n);在嘗試解決 OP 的問題時,我不想引入進一步的混淆。更不用說我使用 etcd 作為 docker 容器名稱,而不是 container-etcd 作為單元服務名稱。
  • ExecStart 被折疊成一個單行命令。\ 標準語法對我不起作用,也沒有將 etcd 呼叫命令引用到容器中。我昨天的測試似乎執行良好,但今天的配置與昨天的表現不同。所以我重新進行了測試和配置,以找到對我來說最穩定的東西。
  • 顯然,如果您要在任何時候使用 docker rm,您必須或非常強烈地應該使用綁定掛載,如 OP 中所述,此處使用 –volume。我個人使用完整路徑位置,全部儲存在 /srv 下,然後將 mount 綁定到容器中。這樣我就有一個要備份的文件夾,而容器的狀態,無論是否存在都是無關緊要的。

確認

在更新 systemd 服務文件、執行 daemon-reload 等之後,我執行到容器中並針對 etcd 執行測試命令:

  • docker exec -it etcd sh
  • etcdctl --endpoints=http://10.4.4.132:2379 member list

結果

9a552f9b95628384, started, etcd, http://10.4.4.132:2380, http://10.4.4.132:2379, false

引用自:https://serverfault.com/questions/1074981