WiredTiger 儲存引擎在 MongoDB 中報告大量回滾

January 31, 2019

我們有一個由三個成員組成的 MongoDB 複製集：

       "members" : [
               {
                       "_id" : 6,
                       "host" : "10.0.0.17:27017",
                       "arbiterOnly" : false,
                       "buildIndexes" : true,
                       "hidden" : false,
                       "priority" : 2,
                       "tags" : {
                       },
                       "slaveDelay" : NumberLong(0),
                       "votes" : 1
               },
               {
                       "_id" : 7,
                       "host" : "10.0.0.18:27017",
                       "arbiterOnly" : false,
                       "buildIndexes" : true,
                       "hidden" : false,
                       "priority" : 2,
                       "tags" : {
                       },
                       "slaveDelay" : NumberLong(0),
                       "votes" : 1
               },
               {
                       "_id" : 8,
                       "host" : "10.0.0.19:27017",
                       "arbiterOnly" : false,
                       "buildIndexes" : true,
                       "hidden" : false,
                       "priority" : 2,
                       "tags" : {
                       },
                       "slaveDelay" : NumberLong(0),
                       "votes" : 1
               }
       ],

集群處於中等負載下，每秒不超過幾十個請求。 db.serverStatus()在主報告上，幾乎所有事務都回滾：

"transaction begins" : 2625009877,
"transaction checkpoint currently running" : 0,
"transaction checkpoint generation" : 22618,
"transaction checkpoint max time (msecs)" : 5849,
"transaction checkpoint min time (msecs)" : 153,
"transaction checkpoint most recent time (msecs)" : 1869,
"transaction checkpoint scrub dirty target" : 0,
"transaction checkpoint scrub time (msecs)" : 0,
"transaction checkpoint total time (msecs)" : 11017082,
"transaction checkpoints" : 22617,
"transaction checkpoints skipped because database was clean" : 0,
"transaction failures due to cache overflow" : 0,
"transaction fsync calls for checkpoint after allocating the transaction ID" : 22617,
"transaction fsync duration for checkpoint after allocating the transaction ID (usecs)" : 354402,
"transaction range of IDs currently pinned" : 0,
"transaction range of IDs currently pinned by a checkpoint" : 0,
"transaction range of IDs currently pinned by named snapshots" : 0,
"transaction range of timestamps currently pinned" : 8589934583,
"transaction range of timestamps pinned by the oldest timestamp" : 8589934583,
"transaction sync calls" : 0,
"transactions committed" : 30213144,
"transactions rolled back" : 2594972913,
"update conflicts" : 578

基本上，我的問題是：這裡發生了什麼？有這麼多事務和這麼多回滾是正常的嗎？如果不是，那麼根本原因是什麼並且需要修復它？

更新。：我們升級到3.6.8-2.0（這是3.6系列中最新的Percona包），問題仍然存在。

中的許多指標db.serverStatus().wiredTiger可能會令人困惑，因為它們反映的是底層 WiredTiger 儲存引擎指標和術語，而不是 MongoDB API。諸如事務、會話和回滾之類的術語具有與最終使用者 MongoDB 功能不同的儲存內部上下文。一些公開的指標對於最終使用者監控不是很有用，但它們可以為熟悉底層儲存 API 的開發人員提供診斷洞察力。
這裡發生了什麼？
WiredTiger 儲存引擎使用多版本並發控制 (MVCC)來提供對正在讀取和寫入數據的內部執行緒的並發訪問。MongoDB 伺服器有一個集成層，它使用底層儲存引擎 API 實現通過 MongoDB API（由驅動程序使用）公開的命令。
在 WiredTiger API 中有內部會話和事務，因此內部執行緒可以處理一致的數據快照。內部事務可以通過送出（數據已寫入）或回滾（有意中止或由於錯誤而中止事務）來結束。
有這麼多事務和這麼多回滾是正常的嗎？
是的，這很正常。通過 MongoDB 集成層的只讀查詢使用 WiredTiger 事務 API 進行一致讀取，但由於它們從來沒有數據送出，因此有意中止事務並將添加到“事務回滾”指標中。
對於其他案例，例如寫入衝突（對同一文件的並發內部更新，將透明地重試），“事務回滾”指標也可以增加。
如果不是，那麼根本原因是什麼以及如何解決？
該指標不應成為關注或監控的特別重點。

引用自：https://serverfault.com/questions/951640

WiredTiger 儲存引擎在 MongoDB 中報告大量回滾

相關問答

如何附加到 Windows 批處理文件的下一行

無法在 Debian 伸展上安裝 MongoDB 3.4

在 Docker 中為 MongoDB 創建一個新使用者

循環法的高可用性 - rsync 和數據庫複製或集群？

如何僅授權來自 MongoDB Atlas 集群的 Fargate ECS 服務的 IP

MongoDB 崩潰時自動重啟