一個 ColumnFamily 僅將數據放在 4 個節點中的 3 個上
我已經在 cassandra-user 郵件列表上發布了這個,但是還沒有得到任何回复,我想知道 serverfault.com 上的人是否會有任何想法。
我似乎遇到了相當奇怪的(至少對我而言!)Cassandra 的問題/行為。
我在 Cassandra 0.8.7 上執行一個 4 節點集群。對於有問題的鍵空間,我有 RF=3,SimpleStrategy 和 KeySpace 內的多個 ColumnFamilies。然而,在 ColumnFamilies 中,數據似乎僅分佈在 4 個節點中的 3 個節點上。
有問題的 ColumnFamily 旁邊的集群上的數據似乎或多或少是相等的。
# nodetool -h localhost ring Address DC Rack Status State Load Owns Token 127605887595351923798765477786913079296 192.168.81.2 datacenter1 rack1 Up Normal 7.27 GB 25.00% 0 192.168.81.3 datacenter1 rack1 Up Normal 7.74 GB 25.00% 42535295865117307932921825928971026432 192.168.81.4 datacenter1 rack1 Up Normal 7.38 GB 25.00% 85070591730234615865843651857942052864 192.168.81.5 datacenter1 rack1 Up Normal 7.32 GB 25.00% 127605887595351923798765477786913079296
密鑰空間相關位的架構如下:
[default@A] show schema; create keyspace A with placement_strategy = 'SimpleStrategy' and strategy_options = [{replication_factor : 3}]; [...] create column family UserDetails with column_type = 'Standard' and comparator = 'IntegerType' and default_validation_class = 'BytesType' and key_validation_class = 'BytesType' and memtable_operations = 0.571875 and memtable_throughput = 122 and memtable_flush_after = 1440 and rows_cached = 0.0 and row_cache_save_period = 0 and keys_cached = 200000.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'ConcurrentLinkedHashCacheProvider';
現在的症狀 - 每個節點上的“nodetool -h localhost cfstats”的輸出。請注意 node1 上的數字。
節點1
Column Family: UserDetails SSTable count: 0 Space used (live): 0 Space used (total): 0 Number of Keys (estimate): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0
節點2
Column Family: UserDetails SSTable count: 3 Space used (live): 112952788 Space used (total): 164953743 Number of Keys (estimate): 384 Memtable Columns Count: 159419 Memtable Data Size: 74910890 Memtable Switch Count: 59 Read Count: 135307426 Read Latency: 25.900 ms. Write Count: 3474673 Write Latency: 0.040 ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 120 Key cache hit rate: 0.999971684189041 Row cache: disabled Compacted row minimum size: 42511 Compacted row maximum size: 74975550 Compacted row mean size: 42364305
節點3
Column Family: UserDetails SSTable count: 3 Space used (live): 112953137 Space used (total): 112953137 Number of Keys (estimate): 384 Memtable Columns Count: 159421 Memtable Data Size: 74693445 Memtable Switch Count: 56 Read Count: 135304486 Read Latency: 25.552 ms. Write Count: 3474616 Write Latency: 0.036 ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 109 Key cache hit rate: 0.9999716840888175 Row cache: disabled Compacted row minimum size: 42511 Compacted row maximum size: 74975550 Compacted row mean size: 42364305
節點4
Column Family: UserDetails SSTable count: 3 Space used (live): 117070926 Space used (total): 119479484 Number of Keys (estimate): 384 Memtable Columns Count: 159979 Memtable Data Size: 75029672 Memtable Switch Count: 60 Read Count: 135294878 Read Latency: 19.455 ms. Write Count: 3474982 Write Latency: 0.028 ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 119 Key cache hit rate: 0.9999752235777154 Row cache: disabled Compacted row minimum size: 2346800 Compacted row maximum size: 62479625 Compacted row mean size: 42591803
當我轉到 node1 上的“數據”目錄時,沒有關於 UserDetails ColumnFamily 的文件。
我嘗試進行手動修復,希望它能治愈這種情況,但是沒有任何運氣。
# nodetool -h localhost repair A UserDetails INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges. INFO 15:19:54,647 Sending AEService tree for #<TreeRequest manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2, (A,UserDetails), (85070591730234615865843651857942052864,127605887595351923798765477786913079296]> INFO 15:19:54,742 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (85070591730234615865843651857942052864,127605887595351923798765477786913079296] INFO 15:19:54,750 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (85070591730234615865843651857942052864,127605887595351923798765477786913079296] INFO 15:19:54,751 Repair session manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec (on cfs [Ljava.lang.String;@3491507b, range (85070591730234615865843651857942052864,127605887595351923798765477786913079296]) completed successfully INFO 15:19:54,816 Sending AEService tree for #<TreeRequest manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd, /192.168.81.2, (A,UserDetails), (42535295865117307932921825928971026432,85070591730234615865843651857942052864]> INFO 15:19:54,865 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (42535295865117307932921825928971026432,85070591730234615865843651857942052864] INFO 15:19:54,874 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (42535295865117307932921825928971026432,85070591730234615865843651857942052864] INFO 15:19:54,874 Repair session manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd (on cfs [Ljava.lang.String;@7e541d08, range (42535295865117307932921825928971026432,85070591730234615865843651857942052864]) completed successfully INFO 15:19:54,909 Sending AEService tree for #<TreeRequest manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243, /192.168.81.2, (A,UserDetails), (127605887595351923798765477786913079296,0]> INFO 15:19:54,967 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (127605887595351923798765477786913079296,0] INFO 15:19:54,974 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (127605887595351923798765477786913079296,0] INFO 15:19:54,975 Repair session manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243 (on cfs [Ljava.lang.String;@48c651f2, range (127605887595351923798765477786913079296,0]) completed successfully INFO 15:19:54,975 Repair command #8 completed successfully
當我使用 SimpleStrategy 時,我希望密鑰在節點之間或多或少地被平均分割,但情況似乎並非如此。
有沒有人遇到過類似的行為?有沒有人有任何建議我可以做些什麼來將一些數據帶入node1?顯然,這種數據拆分意味著 node2、node3 和 node4 需要做所有的讀取工作,這並不理想。
任何建議都非常感謝。
親切的問候,巴特
出現了模式的問題——而不是多行(每個使用者 1 行),我們有一個包含超過 800.000 列的大行。
我懷疑發生的是:
- 這一行一直被作業系統記憶體記憶體——因此我們沒有看到任何 IO
- 然後,Cassandra 使用所有 CPU 時間一遍又一遍地序列化大量行以從中獲取數據
我們改變了應用程序執行此操作的方式,即它為單個使用者的詳細資訊儲存單行,問題就消失了。
SimpleStrategy意味著 Cassandra 在不考慮機架、數據中心或其他地理位置的情況下分發數據。這是了解數據分佈的重要資訊,但不足以全面分析您的情況。
如果您想了解行如何在集群中分佈,這也是您使用的分區器的問題。隨機分區器在決定應該擁有它們的集群成員之前對行鍵進行雜湊處理。保持順序的分區器不會,這會在集群上創建熱點(包括完全不使用節點!),即使您的節點具有相等的環劃分。您可以試驗 Cassandra 如何在您的一個節點上使用以下命令分發不同的密鑰,以查看 Cassandra 認為不同密鑰(實際或假設)屬於哪些節點的位置:
nodetool -h localhost getendpoints <keyspace> <cf> <key>
如果其他列族在集群上正確分佈它們的數據,我會查看您正在使用的分區器和鍵。