Linux

一個 ColumnFamily 僅將數據放在 4 個節點中的 3 個上

  • January 12, 2012

我已經在 cassandra-user 郵件列表上發布了這個,但是還沒有得到任何回复,我想知道 serverfault.com 上的人是否會有任何想法。

我似乎遇到了相當奇怪的(至少對我而言!)Cassandra 的問題/行為。

我在 Cassandra 0.8.7 上執行一個 4 節點集群。對於有問題的鍵空間,我有 RF=3,SimpleStrategy 和 KeySpace 內的多個 ColumnFamilies。然而,在 ColumnFamilies 中,數據似乎僅分佈在 4 個節點中的 3 個節點上。

有問題的 ColumnFamily 旁邊的集群上的數據似乎或多或少是相等的。

# nodetool -h localhost ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                              127605887595351923798765477786913079296     
192.168.81.2    datacenter1 rack1       Up     Normal  7.27 GB         25.00%  0                                           
192.168.81.3    datacenter1 rack1       Up     Normal  7.74 GB         25.00%  42535295865117307932921825928971026432      
192.168.81.4    datacenter1 rack1       Up     Normal  7.38 GB         25.00%  85070591730234615865843651857942052864      
192.168.81.5    datacenter1 rack1       Up     Normal  7.32 GB         25.00%  127605887595351923798765477786913079296     

密鑰空間相關位的架構如下:

[default@A] show schema;
create keyspace A
 with placement_strategy = 'SimpleStrategy'
 and strategy_options = [{replication_factor : 3}];
[...]
create column family UserDetails
 with column_type = 'Standard'
 and comparator = 'IntegerType'
 and default_validation_class = 'BytesType'
 and key_validation_class = 'BytesType'
 and memtable_operations = 0.571875
 and memtable_throughput = 122
 and memtable_flush_after = 1440
 and rows_cached = 0.0
 and row_cache_save_period = 0
 and keys_cached = 200000.0
 and key_cache_save_period = 14400
 and read_repair_chance = 1.0
 and gc_grace = 864000
 and min_compaction_threshold = 4
 and max_compaction_threshold = 32
 and replicate_on_write = true
 and row_cache_provider = 'ConcurrentLinkedHashCacheProvider';

現在的症狀 - 每個節點上的“nodetool -h localhost cfstats”的輸出。請注意 node1 上的數字。

節點1

Column Family: UserDetails
SSTable count: 0
Space used (live): 0
Space used (total): 0
Number of Keys (estimate): 0
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0

節點2

Column Family: UserDetails
SSTable count: 3
Space used (live): 112952788
Space used (total): 164953743
Number of Keys (estimate): 384
Memtable Columns Count: 159419
Memtable Data Size: 74910890
Memtable Switch Count: 59
Read Count: 135307426
Read Latency: 25.900 ms.
Write Count: 3474673
Write Latency: 0.040 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 120
Key cache hit rate: 0.999971684189041
Row cache: disabled
Compacted row minimum size: 42511
Compacted row maximum size: 74975550
Compacted row mean size: 42364305

節點3

Column Family: UserDetails
SSTable count: 3
Space used (live): 112953137
Space used (total): 112953137
Number of Keys (estimate): 384
Memtable Columns Count: 159421
Memtable Data Size: 74693445
Memtable Switch Count: 56
Read Count: 135304486
Read Latency: 25.552 ms.
Write Count: 3474616
Write Latency: 0.036 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 109
Key cache hit rate: 0.9999716840888175
Row cache: disabled
Compacted row minimum size: 42511
Compacted row maximum size: 74975550
Compacted row mean size: 42364305

節點4

Column Family: UserDetails
SSTable count: 3
Space used (live): 117070926
Space used (total): 119479484
Number of Keys (estimate): 384
Memtable Columns Count: 159979
Memtable Data Size: 75029672
Memtable Switch Count: 60
Read Count: 135294878
Read Latency: 19.455 ms.
Write Count: 3474982
Write Latency: 0.028 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 119
Key cache hit rate: 0.9999752235777154
Row cache: disabled
Compacted row minimum size: 2346800
Compacted row maximum size: 62479625
Compacted row mean size: 42591803

當我轉到 node1 上的“數據”目錄時,沒有關於 UserDetails ColumnFamily 的文件。

我嘗試進行手動修復,希望它能治愈這種情況,但是沒有任何運氣。

# nodetool -h localhost repair A UserDetails
INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges.
INFO 15:19:54,647 Sending AEService tree for #<TreeRequest manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2, (A,UserDetails), (85070591730234615865843651857942052864,127605887595351923798765477786913079296]>
INFO 15:19:54,742 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (85070591730234615865843651857942052864,127605887595351923798765477786913079296]
INFO 15:19:54,750 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (85070591730234615865843651857942052864,127605887595351923798765477786913079296]
INFO 15:19:54,751 Repair session manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec (on cfs [Ljava.lang.String;@3491507b, range (85070591730234615865843651857942052864,127605887595351923798765477786913079296]) completed successfully
INFO 15:19:54,816 Sending AEService tree for #<TreeRequest manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd, /192.168.81.2, (A,UserDetails), (42535295865117307932921825928971026432,85070591730234615865843651857942052864]>
INFO 15:19:54,865 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (42535295865117307932921825928971026432,85070591730234615865843651857942052864]
INFO 15:19:54,874 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (42535295865117307932921825928971026432,85070591730234615865843651857942052864]
INFO 15:19:54,874 Repair session manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd (on cfs [Ljava.lang.String;@7e541d08, range (42535295865117307932921825928971026432,85070591730234615865843651857942052864]) completed successfully
INFO 15:19:54,909 Sending AEService tree for #<TreeRequest manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243, /192.168.81.2, (A,UserDetails), (127605887595351923798765477786913079296,0]>
INFO 15:19:54,967 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (127605887595351923798765477786913079296,0]
INFO 15:19:54,974 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (127605887595351923798765477786913079296,0]
INFO 15:19:54,975 Repair session manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243 (on cfs [Ljava.lang.String;@48c651f2, range (127605887595351923798765477786913079296,0]) completed successfully
INFO 15:19:54,975 Repair command #8 completed successfully

當我使用 SimpleStrategy 時,我希望密鑰在節點之間或多或少地被平均分割,但情況似乎並非如此。

有沒有人遇到過類似的行為?有沒有人有任何建議我可以做些什麼來將一些數據帶入node1?顯然,這種數據拆分意味著 node2、node3 和 node4 需要做所有的讀取工作,這並不理想。

任何建議都非常感謝。

親切的問候,巴特

出現了模式的問題——而不是多行(每個使用者 1 行),我們有一個包含超過 800.000 列的大行。

我懷疑發生的是:

  • 這一行一直被作業系統記憶體記憶體——因此我們沒有看到任何 IO
  • 然後,Cassandra 使用所有 CPU 時間一遍又一遍地序列化大量行以從中獲取數據

我們改變了應用程序執行此操作的方式,即它為單個使用者的詳細資訊儲存單行,問題就消失了。

SimpleStrategy意味著 Cassandra 在不考慮機架、數據中心或其他地理位置的情況下分發數據。這是了解數據分佈的重要資訊,但不足以全面分析您的情況。

如果您想了解行如何在集群中分佈,這也是您使用的分區器的問題。隨機分區器在決定應該擁有它們的集群成員之前對行鍵進行雜湊處理。保持順序的分區器不會,這會在集群上創建熱點(包括完全不使用節點!),即使您的節點具有相等的環劃分。您可以試驗 Cassandra 如何在您的一個節點上使用以下命令分發不同的密鑰,以查看 Cassandra 認為不同密鑰(實際或假設)屬於哪些節點的位置:

nodetool -h localhost getendpoints <keyspace> <cf> <key>

如果其他列族在集群上正確分佈它們的數據,我會查看您正在使用的分區器和鍵。

引用自:https://serverfault.com/questions/340414