Monitoring

在 Google Compute Engine 中使用 Stackdriver 監控 MongoDB 3.2 靜默失敗

  • September 13, 2019

截至 2016 年 8 月 28 日,我在使用 Stackdriver 監控 MongoDB 3.2 時遇到問題。

沒有提到mongo任何內容,/var/log/syslog但如果我在文件上犯了配置錯誤.conf,它會抱怨,所以我知道它正在正確載入文件……

所以沒有錯誤,但mongo沒有提及https://app.google.stackdriver.com/services/mongodb聲稱我沒有安裝代理。/var/log/syslog

gke-fatih-standard-fb894cbb-d7ue:/opt/stackdriver/collectd/etc$ sudo service stackdriver-agent restart
[....] Restarting Stackdriver metrics collection agent: stackdriver-agentoption = Interval; value = 60.000000;
Created new plugin context.
option = Interval; value = 60.000000;
Created new plugin context.
option = PIDFile; value = /var/run/stackdriver-agent.pid;
option = Interval; value = 60.000000;
Created new plugin context.
. ok

$ tail -F /var/log/syslog
Aug 28 06:53:01 gke-fatih-standard-fb894cbb-d7ue /USR/SBIN/CRON[21824]: (root) CMD (/etc/supervisor/supervisor_watcher.sh 2>&1 | logger)
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21844]: type = syslog, key = LogLevel, value = info
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21844]: write_gcm: inside module_register for stackdriver_agent/5.5.0-340.wheezy
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21845]: type = syslog, key = LogLevel, value = info
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21845]: write_gcm: inside module_register for stackdriver_agent/5.5.0-340.wheezy
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21846]: Initialization complete, entering read-loop.
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21846]: match_throttle_metadata_keys: 1 history entries, 1 distinct keys, 78 bytes server memory.
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21846]: tcpconns plugin: Reading from netlink succeeded. Will use the netlink method from now on.
Aug 28 06:53:03 gke-fatih-standard-fb894cbb-d7ue collectd[21846]: write_gcm: Asking metadata server for auth token
Aug 28 06:53:04 gke-fatih-standard-fb894cbb-d7ue collectd[21846]: match_throttle_metadata_keys: 2 history entries, 1025 distinct keys, 102801 bytes server memory.

請注意,實例/節點被正確監控,只有 MongoDB 有問題。

/opt/stackdriver/collectd/etc/collect.d/mongo0.conf:

# scheduled to node: gke-fatih-standard-fb894cbb-d7ue
# This is the monitoring configuration for MongoDB.
# Look for STATS_USER, STATS_PASS, MONGODB_HOST and MONGODB_PORT to adjust your configuration file.
LoadPlugin mongodb
<Plugin "mongodb">
   # When using non-standard MongoDB configurations, replace the below with
   #Host "MONGODB_HOST"
   #Port "MONGODB_PORT"
   # Must use the load balancer because we don't know the fixed nodePort
   Host "xxx"
   Port "27017"

   # If you restricted access to the database, you can set the username and
   # password here:
   User "stats"
   Password "xxx"
</Plugin>

與在 GCE 中使用 StackDriver 監控 MongoDB 3相關

Google正在棄用他們專注於非 GCP 的 Stackdriver 集成(如 Mongo),並轉向 BindPlane MIaaS 平台作為他們支持的非 GCP 數據源監控集成平台。

更多詳情可在這找到:

https://cloud.google.com/monitoring/agent/plugins/bindplane-transition

和這裡:

https://bluemedora.com/how-to-monitor-mongodb-bindplane-for-stackdriver-blue-medora/

再做sudo service stackdriver-agent restart一次(我以前做過)和大約 30 分鐘的原始事件之後,現在 Stackdriver 檢測到這些指標。

因此,如果您確定一切正確且沒有錯誤,您可以嘗試stackdriver-agent多次重啟等待約 30 分鐘。

缺乏任何mongo相關的東西/var/log/syslog是一個問題。我希望@Corey-Kosak 可以提供更多資訊。

引用自:https://serverfault.com/questions/799571