1. 今天遇到一個情況,就是alluxio不能正常訪問,經(jīng)過日志查看,發(fā)現(xiàn)下面錯誤。
衛(wèi)輝網(wǎng)站制作公司哪家好,找創(chuàng)新互聯(lián)公司!從網(wǎng)頁設(shè)計、網(wǎng)站建設(shè)、微信開發(fā)、APP開發(fā)、響應式網(wǎng)站建設(shè)等網(wǎng)站項目制作,到程序開發(fā),運營維護。創(chuàng)新互聯(lián)公司2013年至今到現(xiàn)在10年的時間,我們擁有了豐富的建站經(jīng)驗和運維經(jīng)驗,來保證我們的工作的順利進行。專注于網(wǎng)站建設(shè)就選創(chuàng)新互聯(lián)公司。
2018-05-14 03:35:58,680 ERROR logger.type (HdfsUnderFileSystem.java:open) - 4 try to open hdfs://sandy-bridge/user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001 : Cannot obtain block length for LocatedBlock{BP-1941630157-10.16.13.73-1486732586674:blk_1322900685_252817168; getBlockSize()=254; corrupt=false; offset=0; locs=[10.16.13.189:1019, 10.16.13.84:1019, 10.16.13.128:1019]; storageIDs=[DS-30126b4d-afdf-449a-8de1-e479c1abf33d, DS-ed2e905e-fa43-4f51-801f-3305da180d2a, DS-0e1946c8-dccb-4143-8d74-c11d8d429d02]; storageTypes=[DISK, DISK, DISK]} java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1941630157-10.16.13.73-1486732586674:blk_1322900685_252817168; getBlockSize()=254; corrupt=false; offset=0; locs=[10.16.13.189:1019, 10.16.13.84:1019, 10.16.13.128:1019]; storageIDs=[DS-30126b4d-afdf-449a-8de1-e479c1abf33d, DS-ed2e905e-fa43-4f51-801f-3305da180d2a, DS-0e1946c8-dccb-4143-8d74-c11d8d429d02]; storageTypes=[DISK, DISK, DISK]} at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:400) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:305) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:242) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:235) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1487) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) at alluxio.underfs.hdfs.HdfsUnderFileSystem.open(HdfsUnderFileSystem.java:387) at alluxio.underfs.BaseUnderFileSystem.open(BaseUnderFileSystem.java:124) at alluxio.master.journal.JournalReader.getNextInputStream(JournalReader.java:114) at alluxio.master.journal.JournalTailer.processNextJournalLogFiles(JournalTailer.java:118) at alluxio.master.AbstractMaster.start(AbstractMaster.java:140) at alluxio.master.file.FileSystemMaster.start(FileSystemMaster.java:419) at alluxio.master.DefaultAlluxioMaster.startMasters(DefaultAlluxioMaster.java:263) at alluxio.master.FaultTolerantAlluxioMaster.start(FaultTolerantAlluxioMaster.java:91) at alluxio.ServerUtils.run(ServerUtils.java:38)
2. 首先是懷疑文件log.00000000000000000001損壞,經(jīng)過hfs fsck的檢查,并沒有發(fā)現(xiàn)corruption,但是Total size: 0,這是個問題。
[hdfs@hdfs-namenode hdfs]$ hdfs fsck /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001 Connecting to namenode via http://hdfs-namenode.eu-central-1.compute.internal:50070/fsck?ugi=hdfs&path=%2Fuser%2Falluxio%2Fjournal%2FFileSystemMaster%2Fcompleted%2Flog.00000000000000000001 FSCK started by hdfs (auth:KERBEROS_SSL) from /10.16.13.73 for path /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001 at Mon May 14 03:53:11 UTC 2018 Status: HEALTHY Total size: 0 B (Total open files size: 254 B) Total dirs: 0 Total files: 0 Total symlinks: 0 (Files currently being written: 1) Total blocks (validated): 0 (Total open file blocks (not validated): 1) Minimally replicated blocks: 0 Over-replicated blocks: 0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor: 3 Average block replication: 0.0 Corrupt blocks: 0 Missing replicas: 0 Number of data-nodes: 41 Number of racks: 1 FSCK ended at Mon May 14 03:53:11 UTC 2018 in 1 milliseconds The filesystem under path '/user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001' is HEALTHY
3. 將這個問題件mv走,再啟動alluxio HA master,啟動成功。
[hdfs@hdfs-namenode hdfs]$ hdfs dfs -mv /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001 /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001.bak [hdfs@hdfs-namenode hdfs]$ hdfs dfs -ls /user/alluxio/journal/FileSystemMaster/completed/ Found 2 items -rw-r--r-- 3 alluxio alluxio 254 2018-01-29 09:32 /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001.bak -rw-r--r-- 3 alluxio alluxio 397 2018-05-14 03:03 /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000002
4. 其中嘗試過,將文件再mv回來,但是alluxio依然啟動失敗,還是最開始的錯誤。
5. 直接cat這個文件,發(fā)現(xiàn)也不能訪問。
hdfs dfs -cat /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000001.bak cat: Cannot obtain block length for LocatedBlock{BP-1941630157-10.16.13.73-1486732586674:blk_1322900685_252817168; getBlockSize()=254; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.16.13.189:1019,DS-30126b4d-afdf-449a-8de1-e479c1abf33d,DISK], DatanodeInfoWithStorage[10.16.13.128:1019,DS-0e1946c8-dccb-4143-8d74-c11d8d429d02,DISK], DatanodeInfoWithStorage[10.16.13.84:1019,DS-ed2e905e-fa43-4f51-801f-3305da180d2a,DISK]]}
6. 而正常的文件,輸出如下:
[hdfs@hdfs-namenode hdfs]$ hdfs dfs -cat /user/alluxio/journal/FileSystemMaster/completed/log.00000000000000000002 NOT_PERSISTED(0,@ HPXhdatadownloadz_20180510130731077.zip" NOT_PERSISTED(0,@ HPXhdatadownloadzdatadownload Z 6Perrier_%3F%3F_20180101_20180104_20180510130731077.zip" NOT_PERSISTED(0,@ HPXhdatadownloadzdatadownload Z 6Perrier_%3F%3F_20180101_20180104_20180510130731077.zip" NOT_PERSISTED(0,@ HPXhdatadownloadzdatadownload Z 6Perrier_%3F%3F_20180101_20180104_20180510130731077.zip" NOT_PERSISTED(0,@ HPXhdatadownloadz datadownload Z 6Perrier_%3F%3F_20180101_20180104_20180510130731077.zip"
7. Alluxio master是啟動成功了,但是丟了一部分數(shù)據(jù)。
這個問題,有時間,還要繼續(xù)研究一下,看是否能將數(shù)據(jù)找回。