Kubernetes

Kubernetes 集群中的 RabbitMQ Helm 圖表安裝無法將 Erlang cookie 分發到節點

  • April 20, 2022

我正在嘗試通過 EKS 集群中的 Bitnami Helm 圖表(https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq)安裝 RabbitMQ 集群,當我執行 Helm 安裝時,我得到以下資訊創建的第一個 pod 中的錯誤:

rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 Welcome to the Bitnami rabbitmq container
rabbitmq 13:41:15.99 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-rabbitmq
rabbitmq 13:41:15.99 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-rabbitmq/issues
rabbitmq 13:41:15.99
rabbitmq 13:41:15.99 INFO  ==> ** Starting RabbitMQ setup **
rabbitmq 13:41:16.01 INFO  ==> Validating settings in RABBITMQ_* env vars..
rabbitmq 13:41:16.03 INFO  ==> Initializing RabbitMQ...
rabbitmq 13:41:16.03 DEBUG ==> Creating environment file...
rabbitmq 13:41:16.03 DEBUG ==> Creating enabled_plugins file...
rabbitmq 13:41:16.04 DEBUG ==> Creating Erlang cookie...
rabbitmq 13:41:16.04 DEBUG ==> Ensuring expected directories/files exist...
rabbitmq 13:41:16.05 INFO  ==> Starting RabbitMQ in background...
Waiting for erlang distribution on node 'rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local' while OS process '51' is running
2022-04-19 13:41:19.198340+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:19.212884+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:19.212941+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:19.212965+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:19.212985+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:19.213077+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:19.213104+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:19.213124+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2022-04-19 13:41:19.637051+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2022-04-19 13:41:19.637148+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2022-04-19 13:41:19.656264+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2022-04-19 13:41:19.904087+00:00 [info] <0.222.0> ra: starting system quorum_queues
2022-04-19 13:41:19.904200+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/quorum/rabbit@rabbitmq-0
2022-04-19 13:41:19.995094+00:00 [info] <0.263.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2022-04-19 13:41:20.013384+00:00 [noti] <0.268.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
2022-04-19 13:41:20.022921+00:00 [info] <0.222.0> ra: starting system coordination
2022-04-19 13:41:20.022987+00:00 [info] <0.222.0> starting Ra system: coordination in directory: /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0/coordination/rabbit@rabbitmq-0
2022-04-19 13:41:20.026371+00:00 [info] <0.276.0> ra: meta data store initialised for system coordination. 0 record(s) recovered
2022-04-19 13:41:20.026628+00:00 [noti] <0.281.0> WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Starting RabbitMQ 3.9.8 on Erlang 24.1.2 [jit]
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
2022-04-19 13:41:20.032159+00:00 [info] <0.222.0>  Licensed under the MPL 2.0. Website: https://rabbitmq.com

 ##  ##      RabbitMQ 3.9.8
 ##  ##
 ##########  Copyright (c) 2007-2021 VMware, Inc. or its affiliates.
 ######  ##
 ##########  Licensed under the MPL 2.0. Website: https://rabbitmq.com

 Erlang:      24.1.2 [jit]
 TLS Library: OpenSSL - OpenSSL 1.1.1d  10 Sep 2019

 Doc guides:  https://rabbitmq.com/documentation.html
 Support:     https://rabbitmq.com/contact.html
 Tutorials:   https://rabbitmq.com/getstarted.html
 Monitoring:  https://rabbitmq.com/monitoring.html

 Logs: /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
       <stdout>

 Config file(s): /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf

 Starting broker...2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  node           : rabbit@rabbitmq-0
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  home dir       : /opt/bitnami/rabbitmq/.rabbitmq
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  config file(s) : /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  cookie hash    : d3Nfp8t690Ln1h811Tuxzw==
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  log(s)         : /opt/bitnami/rabbitmq/var/log/rabbitmq/rabbit@rabbitmq-0_upgrade.log
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>                 : <stdout>
2022-04-19 13:41:20.033907+00:00 [info] <0.222.0>  database dir   : /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0
2022-04-19 13:41:20.307590+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2022-04-19 13:41:20.307654+00:00 [info] <0.222.0> Feature flags:   [ ] drop_unroutable_metric
2022-04-19 13:41:20.307681+00:00 [info] <0.222.0> Feature flags:   [ ] empty_basic_get_metric
2022-04-19 13:41:20.307705+00:00 [info] <0.222.0> Feature flags:   [ ] implicit_default_bindings
2022-04-19 13:41:20.307792+00:00 [info] <0.222.0> Feature flags:   [ ] maintenance_mode_status
2022-04-19 13:41:20.307818+00:00 [info] <0.222.0> Feature flags:   [ ] quorum_queue
2022-04-19 13:41:20.307838+00:00 [info] <0.222.0> Feature flags:   [ ] stream_queue
2022-04-19 13:41:20.307908+00:00 [info] <0.222.0> Feature flags:   [ ] user_limits
2022-04-19 13:41:20.307947+00:00 [info] <0.222.0> Feature flags:   [ ] virtual_host_metadata
2022-04-19 13:41:20.307968+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
Error: operation wait on node rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local timed out. Timeout value used: 5000
2022-04-19 13:41:23.299211+00:00 [info] <0.222.0> Running boot step pre_boot defined by app rabbit
2022-04-19 13:41:23.299295+00:00 [info] <0.222.0> Running boot step rabbit_global_counters defined by app rabbit
2022-04-19 13:41:23.299545+00:00 [info] <0.222.0> Running boot step rabbit_osiris_metrics defined by app rabbit
2022-04-19 13:41:23.299746+00:00 [info] <0.222.0> Running boot step rabbit_core_metrics defined by app rabbit
2022-04-19 13:41:23.300299+00:00 [info] <0.222.0> Running boot step rabbit_alarm defined by app rabbit
2022-04-19 13:41:23.304497+00:00 [info] <0.297.0> Memory high watermark set to 12695 MiB (13312088473 bytes) of 31738 MiB (33280221184 bytes) total
2022-04-19 13:41:23.308954+00:00 [info] <0.299.0> Enabling free disk space monitoring
2022-04-19 13:41:23.309007+00:00 [info] <0.299.0> Disk free limit set to 50MB
2022-04-19 13:41:23.312489+00:00 [info] <0.222.0> Running boot step code_server_cache defined by app rabbit
2022-04-19 13:41:23.312650+00:00 [info] <0.222.0> Running boot step file_handle_cache defined by app rabbit
2022-04-19 13:41:23.312958+00:00 [info] <0.302.0> Limiting to approx 65439 file handles (58893 sockets)
2022-04-19 13:41:23.313163+00:00 [info] <0.303.0> FHC read buffering: OFF
2022-04-19 13:41:23.313217+00:00 [info] <0.303.0> FHC write buffering: ON
2022-04-19 13:41:23.313829+00:00 [info] <0.222.0> Running boot step worker_pool defined by app rabbit
2022-04-19 13:41:23.313932+00:00 [info] <0.283.0> Will use 4 processes for default worker pool
2022-04-19 13:41:23.313982+00:00 [info] <0.283.0> Starting worker pool 'worker_pool' with 4 processes in it
2022-04-19 13:41:23.314583+00:00 [info] <0.222.0> Running boot step database defined by app rabbit
2022-04-19 13:41:23.314894+00:00 [info] <0.222.0> Node database directory at /bitnami/rabbitmq/mnesia/rabbit@rabbitmq-0 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2022-04-19 13:41:23.314963+00:00 [info] <0.222.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2022-04-19 13:41:23.315110+00:00 [info] <0.222.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2022-04-19 13:41:23.316998+00:00 [noti] <0.44.0> Application mnesia exited with reason: stopped

BOOT FAILED
===========
Exception during startup:

2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> BOOT FAILED
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> ===========
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> Exception during startup:
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0> error:{badmatch,{error,enoent}}
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:make_request/0, line 121
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:list_nodes/0, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery_k8s:lock/1, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_peer_discovery:lock/0, line 190
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init_with_lock/3, line 104
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_mnesia:init/0, line 76
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>     rabbit_boot_steps:run_step/2, line 46
2022-04-19 13:41:23.317269+00:00 [erro] <0.222.0>
error:{badmatch,{error,enoent}}

   rabbit_peer_discovery_k8s:make_request/0, line 121
   rabbit_peer_discovery_k8s:list_nodes/0, line 41
   rabbit_peer_discovery_k8s:lock/1, line 76
   rabbit_peer_discovery:lock/0, line 190
   rabbit_mnesia:init_with_lock/3, line 104
   rabbit_mnesia:init/0, line 76
   rabbit_boot_steps:-run_step/2-lc$^0/1-0-/2, line 41
   rabbit_boot_steps:run_step/2, line 46

2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   crasher:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     initial call: application_master:init/4
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     pid: <0.221.0>
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     registered_name: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     exception exit: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>       in function  application_master:init/4 (application_master.erl, line 142)
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     ancestors: [<0.220.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     message_queue_len: 1
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     messages: [{'EXIT',<0.222.0>,normal}]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     links: [<0.220.0>,<0.44.0>]
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     dictionary: []
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     trap_exit: true
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     status: running
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     heap_size: 2586
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     stack_size: 29
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>     reductions: 186
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>   neighbours:
2022-04-19 13:41:24.318598+00:00 [erro] <0.221.0>
2022-04-19 13:41:24.319087+00:00 [noti] <0.44.0> Application rabbit exited with reason: {{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{badmatch,{error,enoent}},{rabbit,start,[normal,[]]}}})

Crash dump is being written to: /opt/bitnami/rabbitmq/var/log/rabbitmq/erl_crash.dump...done
Waiting for erlang distribution on node 'rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local' while OS process '51' is running
Error:
process_not_running
Waiting for erlang distribution on node 'rabbit@rabbitmq-0.rabbitmq-headless.tdc.svc.cluster.local' while OS process '51' is running
Error:
process_not_running

似乎 Erlang cookie 沒有正確分發,但在檢查了一些文章後,我沒有得出任何結論。

如果您有任何可能有用的資訊,如果您與我分享,我將不勝感激。

編輯 1:我已經進入了必須創建的三個副本中的第一個也是唯一一個 pod,執行rabbitmq-diagnostics erlang_cookie_sources以找出 Erland cookie 文件儲存在哪裡(/opt/bitnami/rabbitmq/.rabbitmq/.erlang.cookie)和檢查它是否與我在圖表的 values.yaml 中指示的相同,並且完全相同,所以最後我認為分配密鑰沒有問題,但我仍然有同樣的問題。再次查看日誌我可以看到有一些程序沒有執行,我不知道問題是否應該存在。

問題是未分發到 Pod 的服務帳戶令牌。我已經更改了 Helm 圖表的 values.yaml:

serviceAccount:
 ## @param serviceAccount.create Enable creation of ServiceAccount for RabbitMQ pods
 ##
 create: true
 ## @param serviceAccount.name Name of the created serviceAccount
 ## If not set and create is true, a name is generated using the rabbitmq.fullname template
 ##
 #name: ""
 ## @param serviceAccount.automountServiceAccountToken Auto-mount the service account token in the pod
 ##
 automountServiceAccountToken: true

引用自:https://serverfault.com/questions/1098961