Hadoop

Hadoop HDFS:從命令行設置文件塊大小?

  • September 5, 2011

當我將文件載入到 HDFS 時,我需要將文件的塊大小設置為低於集群塊大小的某個值。例如,如果 HDFS 使用 64mb 塊,我可能希望用 32mb 塊複製一個大文件。

我之前使用org.apache.hadoop.fs.FileSystem.create()函式在 Hadoop 工作負載中完成了此操作,但是有沒有辦法從命令行執行此操作?

您可以通過使用 hadoop fs 命令設置 -Ddfs.block.size=something 來做到這一點。例如:

hadoop fs -Ddfs.block.size=1048576  -put ganglia-3.2.0-1.src.rpm /home/hcoyote

正如您在此處看到的,塊大小更改為您在命令行中定義的大小(在我的情況下,預設值為 64MB,但我在此處將其更改為 1MB)。

:;  hadoop fsck -blocks -files -locations /home/hcoyote/ganglia-3.2.0-1.src.rpm 
FSCK started by hcoyote from /10.1.1.111 for path /home/hcoyote/ganglia-3.2.0-1.src.rpm at Mon Aug 15 14:34:14 CDT 2011
/home/hcoyote/ganglia-3.2.0-1.src.rpm 1376561 bytes, 2 block(s):  OK
0. blk_5365260307246279706_901858 len=1048576 repl=3 [10.1.1.115:50010, 10.1.1.105:50010, 10.1.1.119:50010]
1. blk_-6347324528974215118_901858 len=327985 repl=3 [10.1.1.106:50010, 10.1.1.105:50010, 10.1.1.104:50010]

Status: HEALTHY
Total size:    1376561 B
Total dirs:    0
Total files:   1
Total blocks (validated):  2 (avg. block size 688280 B)
Minimally replicated blocks:   2 (100.0 %)
Over-replicated blocks:    0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Mis-replicated blocks:     0 (0.0 %)
Default replication factor:    3
Average block replication: 3.0
Corrupt blocks:        0
Missing replicas:      0 (0.0 %)
Number of data-nodes:      12
Number of racks:       1
FSCK ended at Mon Aug 15 14:34:14 CDT 2011 in 0 milliseconds


The filesystem under path '/home/hcoyote/ganglia-3.2.0-1.src.rpm' is HEALTHY

引用自:https://serverfault.com/questions/300135