Cassandra

Opscenter 維修服務超時。錯誤:請求的範圍與本地範圍相交………

  • June 24, 2014

我的生產集群自 4 月 16 日起啟用了修復服務,預設 9 天完成,修復將正確完成。但是,自 5 月 22 日起,Opscenter 將自動禁用它:

從 /var/log/opscenter/opscenterd.log:

[...]
2014-06-03 21:13:47-0400 [zs_prod] ERROR: Repair task (<Node 10.1.0.22='6417880425364517165'>, (-4019838962446882275L, -4006140687792135587L), set(['zs_logging', 'OpsCenter'])) timed out after 3600 seconds.
2014-06-03 22:16:44-0400 [zs_prod] ERROR: Repair task (<Node 10.1.0.22='6417880425364517165'>, (-4006140687792135587L, -4006140687792135586L), set(['zs_logging', 'OpsCenter'])) timed out after 3600 seconds.
2014-06-03 22:16:44-0400 [zs_prod] ERROR: More than 100 errors during repair service, shutting down repair service
2014-06-03 22:16:44-0400 [zs_prod]  INFO: Stopping repair service
[...]

從 /var/log/opscenter/repair_service/zs_prod.log:

[...]
2014-06-03 22:16:44-0400 [zs_prod] ERROR: Repair task (<Node 10.1.0.22='6417880425364517165'>, (-4006140687792135587L, -4006140687792135586L), set(['zs_logging', 'OpsCenter'])) timed out after 3600 seconds.
2014-06-03 22:16:44-0400 [zs_prod] ERROR: Task (<Node 10.1.0.22='6417880425364517165'>, (-4006140687792135587L, -4006140687792135586L), set(['zs_logging', 'OpsCenter'])) has failed 1 times.
2014-06-03 22:16:44-0400 [zs_prod] ERROR: 101 errors have ocurred out of 100 allowed.
2014-06-03 22:16:44-0400 [zs_prod] ERROR: More than 100 errors during repair service, shutting down repair service
2014-06-03 22:16:44-0400 [zs_prod]  INFO: Stopping repair service

在修復失敗的節點上,來自 /var/log/cassandra/system.log:

ERROR [RMI TCP Connection(93502)-10.1.0.22] 2014-06-03 20:12:28,858 StorageService.java (line 2560) Repair session failed:
java.lang.IllegalArgumentException: Requested range intersects a local range but is not fully contained in one; this would lead to i
mprecise repair
       at org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:164)
       at org.apache.cassandra.repair.RepairSession.<init>(RepairSession.java:128)
       at org.apache.cassandra.repair.RepairSession.<init>(RepairSession.java:117)
       at org.apache.cassandra.service.ActiveRepairService.submitRepairSession(ActiveRepairService.java:97)
       at org.apache.cassandra.service.StorageService.forceKeyspaceRepair(StorageService.java:2620)
       at org.apache.cassandra.service.StorageService$5.runMayThrow(StorageService.java:2556)
       at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

這些錯誤(僅在修復服務正在執行時發生)是這些節點遇到的唯一錯誤。在修復任務之外,Cassandra 集群執行良好。

我正在執行 Opscenter 4.1.2,並在 linux 虛擬機上安裝了 6 個節點的 DSE 4.0.2 集群。這些節點執行 Ubuntu Server 12.04 64 位的 vanilla 安裝,並根據提供的安裝文件安裝和保護 DSE。

我在我的開發集群上也遇到了這個問題一段時間(使用 DSE 4.0.0、4.0.1 和 4.0.2),但我認為這是因為我的一些配置錯誤。這個問題也在某個時候自發地出現了。

Cassandra 集群以良好的寫入吞吐量執行得非常順利。它非常穩定並且有足夠的資源可以使用。我們沒有註意到依賴它的應用程序有任何問題。

這是 OpsCenter 中的一個已知錯誤,已在 4.1.3 版本中修復(參見http://www.datastax.com/documentation/opscenter/4.1/opsc/release_notes/opscReleaseNotes413.html,最後一期)

我認為除了升級 OpsCenter 之外沒有其他解決方法(這應該很容易做到)

引用自:https://serverfault.com/questions/604657