Linux

Google計算引擎中的失敗實例

  • February 17, 2022

我有一個已經執行了幾年的 GCE 實例。在夜間,實例重新啟動並顯示以下日誌:

2022-02-13 04:46:36.370 CET compute.instances.hostError Instance terminated by Compute Engine.
2022-02-13 04:47:08.279 CET compute.instances.automaticRestart Instance automatically restarted by Compute Engine.

但是實例沒有重新啟動。

我可以連接到看到這個的串列控制台:

serialport: Connected to ***.europe-west1-b.*** port 1 (
[ TIME ] Timed out waiting for device ***
[DEPEND] Dependency failed for File… ***.
[DEPEND] Dependency failed for /data.
[DEPEND] Dependency failed for Local File Systems.
[  OK  ] Stopped Dispatch Password …ts to Console Directory Watch.
[  OK  ] Stopped Forward Password R…uests to Wall Directory Watch.
[  OK  ] Reached target Timers.
        Starting Raise network interfaces...
[  OK  ] Closed Syslog Socket.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Sockets.
[  OK  ] Started Emergency Shell.
[  OK  ] Reached target Emergency Mode.
        Starting Create Volatile Files and Directories...
[  OK  ] Finished Create Volatile Files and Directories.
        Starting Network Time Synchronization...
        Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
        Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.
[  OK  ] Started Network Time Synchronization.
[  OK  ] Reached target System Time Set.
[  OK  ] Reached target System Time Synchronized.
        Stopping Network Time Synchronization...
[  OK  ] Stopped Network Time Synchronization.
        Starting Network Time Synchronization...
[  OK  ] Started Network Time Synchronization.
[  OK  ] Finished Raise network interfaces.
[  OK  ] Reached target Network.
[  OK  ] Reached target Network is Online.
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to r
Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.
Press Enter to continue.

似乎其中一個磁槃無法連接 - 但現在我該怎麼辦?該磁碟似乎在計算引擎中通常可用。

恐怕您無法對這個受影響的虛擬機做任何事情。

Host Events文件或常見問題解答中,您可以找到以下資訊:

主機錯誤 ( compute.instances.hostError) 表示託管 VM 的物理電腦上存在硬體或軟體問題,導致 VM 崩潰。涉及完全硬體故障或其他硬體問題的主機錯誤可能會阻止虛擬機的實時遷移

VM 實例位於“雲”中,它仍然是執行您的工作負載的物理機器。不幸的是,此實例出現硬體或軟體故障,您無能為力。

GCP 引入了一種稱為實時遷移的東西,它可以防止這種情況發生。

Compute Engine 提供實時遷移功能,即使在發生主機系統事件(例如軟體或硬體更新)時也能保持虛擬機實例執行,但我想現在配置此事件為時已晚。

實時遷移使您的實例在以下期間保持執行:

  • 定期基礎設施維護和升級。
  • 數據中心的網路和電網維護。
  • 記憶體、CPU、網卡、磁碟、電源等硬體出現故障。這是盡最大努力完成的;如果硬體完全失敗或以其他方式阻止實時遷移,VM 會崩潰並自動重新啟動並記錄 hostError。

實時遷移不會更改 VM 本身的任何屬性或屬性。實時遷移過程只是將正在執行的 VM 從一台主機轉移到同一區域內的另一台主機。

可能的解決方法

正如您提到的磁碟是持久的並且在 GCP 中仍然可見,您可以嘗試將它們重新附加到另一個 VM。如何指南可以在創建和附加磁碟文件中找到。

引用自:https://serverfault.com/questions/1093513