Centos

CA 重新啟動期間 FreeIPA 安裝失敗

  • March 7, 2020

我正在嘗試設置一個簡單的流浪盒來使用 FreeIPA 進行測試。我使用的是 CentOS 7 映像,並在盒子中安裝了最少的額外東西,並使用一個非常簡單的 FreeIPA 定義開始。我嘗試過使用簡單的 shell 命令,也嘗試過使用ansible-freeipa。在這兩種情況下,我都看到了相同的錯誤,儘管它發生的頻率似乎不同。在簡單的 shell 命令中,它只有大約 50% 的時間失敗,但使用 Ansible 似乎是 100%。

失敗給了我一個類似下面的錯誤。

fatal: [ipaserver.test.hadoop.com]: FAILED! => {"changed": false, "module_stderr": "Shared connection to ipaserver.test.hadoop.com closed.\r\n", "module_stdout": "\u001b[?1034hTraceback (most recent call last):\r\n File "/root/.ansible/tmp/ansible-tmp-1583188576.27-186488091977372/AnsiballZ_ipaserver_setup_ca.py", line 102, in \r\n _ansiballz_main()\r\n File "/root/.ansible/tmp/ansible-tmp-1583188576.27-186488091977372/AnsiballZ_ipaserver_setup_ca.py", line 94, in _ansiballz_main\r\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\r\n File "/root/.ansible/tmp/ansible-tmp-1583188576.27-186488091977372/AnsiballZ_ipaserver_setup_ca.py", line 40, in invoke_module\r\n runpy.run_module(mod_name='ansible.modules.ipaserver_setup_ca', init_globals=None, run_name='main', alter_sys=True)\r\n File "/usr/lib64/python2.7/runpy.py", line 176, in run_module\r\n fname, loader, pkg_name)\r\n File "/usr/lib64/python2.7/runpy.py", line 82, in _run_module_code\r\n mod_name, mod_fname, mod_loader, pkg_name)\r\n File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code\r\n exec code in run_globals\r\n File "/tmp/ansible_ipaserver_setup_ca_payload_Pc9wnM/ansible_ipaserver_setup_ca_payload.zip/ansible/modules/ipaserver_setup_ca.py", line 354, in \r\n File "/tmp/ansible_ipaserver_setup_ca_payload_Pc9wnM/ansible_ipaserver_setup_ca_payload.zip/ansible/modules/ipaserver_setup_ca.py", line 345, in main\r\n File "/usr/lib/python2.7/site-packages/ipaserver/install/ca.py", line 391, in install_step_1\r\n ca.start('pki-tomcat')\r\n File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py", line 464, in start\r\n self.service.start(instance_name, capture_output=capture_output, wait=wait)\r\n File "/usr/lib/python2.7/site-packages/ipaplatform/redhat/services.py", line 192, in start\r\n self.wait_until_running()\r\n File "/usr/lib/python2.7/site-packages/ipaplatform/redhat/services.py", line 186, in wait_until_running\r\n raise RuntimeError('CA did not start in %ss' % timeout)\r\nRuntimeError: CA did not start in 300.0s\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

查看 /var/log/messages,我看到錯誤發生在系統時間 23:25 到 23:27 之間的某個時間。根據錯誤,在重新啟動 CA 時會發生這種情況。它似乎在第一次啟動時就可以正常啟動。

Mar 2 23:25:42 localhost systemd: Stopped PKI Tomcat Server pki-tomcat.
Mar 2 23:25:43 localhost systemd: Starting PKI Tomcat Server pki-tomcat...
Mar 2 23:26:13 localhost pkidaemon: -----------------------
Mar 2 23:26:13 localhost pkidaemon: Banner is not installed
Mar 2 23:26:13 localhost pkidaemon: -----------------------
Mar 2 23:27:07 localhost pkidaemon: ----------------------
Mar 2 23:27:08 localhost pkidaemon: Enabled all subsystems
Mar 2 23:27:08 localhost pkidaemon: ----------------------
Mar 2 23:27:18 localhost systemd: pki-tomcatd@pki-tomcat.service start-pre operation timed out. Terminating.
Mar 2 23:27:18 localhost systemd: Failed to start PKI Tomcat Server pki-tomcat.
Mar 2 23:27:18 localhost systemd: Unit pki-tomcatd@pki-tomcat.service entered failed state.
Mar 2 23:27:18 localhost systemd: pki-tomcatd@pki-tomcat.service failed.

但是,當我查看 PKI 日誌時,在此時間範圍內沒有任何內容。這些是 /var/log/pki/pki-ca-spawn.20200302231442.log 中的最後幾行:

2020-03-02 23:18:32 pkispawn : INFO END spawning subsystem 'CA' of instance 'pki-tomcat'
2020-03-02 23:18:32 pkispawn : INFO ... archiving configuration into '/var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442'
2020-03-02 23:18:32 pkispawn : INFO ....... cp -p /etc/sysconfig/pki/tomcat/pki-tomcat/ca/deployment.cfg /var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chmod 660 /var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chown 17:17 /var/log/pki/pki-tomcat/ca/archive/spawn_deployment.cfg.20200302231442
2020-03-02 23:18:32 pkispawn : INFO ... archiving manifest into '/var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442'
2020-03-02 23:18:32 pkispawn : INFO ....... cp -p /etc/sysconfig/pki/tomcat/pki-tomcat/ca/manifest /var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chmod 660 /var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442
2020-03-02 23:18:32 pkispawn : DEBUG ........... chown 17:17 /var/log/pki/pki-tomcat/ca/archive/spawn_manifest.20200302231442

/var/log/pki/pki-tomcat/ca/debug 相同:

[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: getConn: mNumConns now 4
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: returnConn: mNumConns now 5
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: In LdapBoundConnFactory::getConn()
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: masterConn is connected: true
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: getConn: conn is connected true
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: getConn: mNumConns now 4
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: returnConn: mNumConns now 5
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: CMSServlet.java: renderTemplate
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: CMSServlet.java: xml parameter detected, returning xml
[02/Mar/2020:23:25:00][http-bio-8080-exec-14]: CMSServlet: curDate=Mon Mar 02 23:25:00 UTC 2020 id=caDisplayCertFromRequest time=144

/var/log/pki/pki-tomcat/ca/system 有一些錯誤,但在 23:25 之後沒有:

0.localhost-startStop-1 - [02/Mar/2020:23:15:08 UTC] [13] [3] authz instance DirAclAuthz initialization failed and skipped, error=Property internaldb.ldapconn.port missing value
0.http-bio-8443-exec-3 - [02/Mar/2020:23:17:53 UTC] [3] [3] CASigningUnit: Object certificate not found. Error Certificate object not found
0.http-bio-8443-exec-3 - [02/Mar/2020:23:17:54 UTC] [11] [3] UGSubsystem: Get User Error netscape.ldap.LDAPException: error result (32); matchedDN = ou=People,o=ipaca
0.Thread-16 - [02/Mar/2020:23:25:00 UTC] [8] [3] Publishing: Could not publish certificate serial number 0x7. Error Failed to publish using rule: No rules enabled

我不知道是什麼原因造成的。有任何想法嗎?vagrant 文件和 hosts 文件都位於下面的 GitHub 儲存庫中:https ://github.com/davidov541/HadoopOnVagrant/tree/AnsibleRetrofit/FreeIPA

我最終將我的 vagrant box 大小增加到 2 GB 以解決我看到的另一個問題,並且自昨天以來重新創建了 10 到 20 次後,我再也沒有看到這個問題。基於此,我認為問題是由於 tomcat 沒有足夠的記憶體來啟動它需要的,導致我們看到的行為。

最後的ansible錯誤:

'CA did not start in %ss' % timeout)\r\nRuntimeError: CA did not start in 300.0s\r\n"

這讓我想到了它出現在郵件列表中的一些東西,並且通過將腳本上的starting_timeout 變數設置為更高的值來修復它。

解決方案在這裡描述:

https://www.freeipa.org/page/HowTo/FreeIPA_on_banana_pi

也許可以嘗試一下,我不能保證它會為你解決它,但是……

引用自:https://serverfault.com/questions/1005438