HBase數(shù)據(jù)導(dǎo)入ImportTsv

ImportTsv 工具是通過map reduce 完成的。所以要啟動(dòng)yarn. 工具要使用jar包，所以注意配置classpath。ImportTsv默認(rèn)是通過hbase api 插入數(shù)據(jù)的

成都創(chuàng)新互聯(lián)公司服務(wù)項(xiàng)目包括昭化網(wǎng)站建設(shè)、昭化網(wǎng)站制作、昭化網(wǎng)頁制作以及昭化網(wǎng)絡(luò)營銷策劃等。多年來，我們專注于互聯(lián)網(wǎng)行業(yè)，利用自身積累的技術(shù)優(yōu)勢(shì)、行業(yè)經(jīng)驗(yàn)、深度合作伙伴關(guān)系等，向廣大中小型企業(yè)、政府機(jī)構(gòu)等提供互聯(lián)網(wǎng)行業(yè)的解決方案，昭化網(wǎng)站推廣取得了明顯的社會(huì)效益與經(jīng)濟(jì)效益。目前，我們服務(wù)的客戶以成都為中心已經(jīng)輻射到昭化省份的部分城市，未來相信會(huì)繼續(xù)擴(kuò)大服務(wù)區(qū)域并繼續(xù)獲得客戶的支持與信任！

[hadoop-user@rhel work]$ cat /home/hadoop-user/.bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

JAVA_HOME=/usr/java/jdk1.8.0_171-amd64

PATH=$PATH:$JAVA_HOME/bin

CLASSPATH=$CLASSPATH:$JAVA_HOME/lib

HADOOP_HOME=/home/hadoop-user/hadoop-2.8.0

PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib

HBASE_HOME=/home/hadoop-user/hbase-2.0.0

PATH=$PATH:$HBASE_HOME/bin

CLASSPATH=$CLASSPATH:$HBASE_HOME/lib

ZOOKEEPER_HOME=/home/hadoop-user/zookeeper-3.4.12

PATH=$PATH:$ZOOKEEPER_HOME/bin

PHOENIX_HOME=/home/hadoop-user/apache-phoenix-5.0.0-alpha-HBase-2.0-bin

PATH=$PATH:$PHOENIX_HOME/bin

export PATH

創(chuàng)建表

hbase(main):033:0> create 'test','cf'

創(chuàng)建要導(dǎo)入的文件

[hadoop-user@rhel work]$ cat /home/hadoop-user/work/sample1.csv

row10,"mjj10"

row11,"mjj11"

row12,"mjj12"

row14,"mjj13"

將文件放入hdfs

[hadoop-user@rhel work]$ hdfs dfs -put /home/hadoop-user/work/sample1.csv /sample1.csv

ImportTsv導(dǎo)入命令

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv

注: HBASE_ROW_KEY表示文件rowid的位置，后面是列的定義。這里意思是導(dǎo)入的列為列族為cf,列名為a。要導(dǎo)入的文件是hdfs中/sample1.csv

幫助中的解釋

Usage: importtsv -Dimporttsv.columns=a,b,c

Imports the given input directory of TSV data into the specified table.

The column names of the TSV data must be specified using the -Dimporttsv.columns

option. This option takes the form of comma-separated column names, where each

column name is either a simple column family, or a columnfamily:qualifier. The special

column name HBASE_ROW_KEY is used to designate that this column should be used

as the row key for each imported record. You must specify exactly one column

to be the row key, and you must specify a column name for every column that exists in the

input data. Another special columnHBASE_TS_KEY designates that this column should be

used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.

You must specify at most one column as timestamp key for each imported record.

Record with invalid timestamps (blank, non-numeric) will be treated as bad record.

Note: if you use this option, then 'importtsv.timestamp' option will be ignored.

注意: ImportTsv導(dǎo)入的內(nèi)容，phoenix看不到。事實(shí)上，hbase創(chuàng)建的表，Phoenix看不到。phoenix創(chuàng)建的表，hbase能看到，但是內(nèi)容是編碼后的內(nèi)容。

importtsv 工具默認(rèn)使用hbase put api導(dǎo)數(shù)據(jù).當(dāng)使用選項(xiàng) -Dimporttsv.bulk.output時(shí)，將會(huì)先生成HFILE文件的內(nèi)部格式的文件。

The importtsv tool, by default, uses the HBase Put API to insert data into the HBase

table using TableOutputFormat in its map phase. But when the -Dimporttsv.bulk.output option is specified, it instead generates HBase internal format (HFile) files on HDFS

by using HFileOutputFormat. Therefore, we can then use the completebulkload tool to load the generated files into a running cluster. The following steps are to use the bulk output and load tools:

生成HFILE格式的文件命令

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.bulk.output=/hfiles_tsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv

注: 生成hfile格式的文件，存放于hdfs中的/hfile_tsv目錄中，目錄會(huì)由命令自己創(chuàng)建。

[hadoop-user@rhel work]$ hdfs dfs -ls /hfiles_tsv/cf

18/06/28 10:49:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

-rw-r--r-- 1 hadoop-user supergroup 5125 2018-06-28 10:40 /hfiles_tsv/cf/0e466616d42a4a128fb60caa7dbe075a

注: 0e466616d42a4a128fb60caa7dbe075a的命名格式跟WEB中region的命名格式很像

通過

hadoop jar hbase-server-2.0.0.jar completebulkload /hfiles_tsv 'test'

出現(xiàn)異常； Exception in thread "main" java.lang.ClassNotFoundException: completebulkload

HBASE文檔中兩種導(dǎo)入方式:

There are two ways to invoke this utility, with explicit classname and via the driver:

Explicit Classname

$ bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles

Driver

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar completebulkload

本文標(biāo)題：HBase數(shù)據(jù)導(dǎo)入ImportTsv
標(biāo)題URL：http://weahome.cn/article/ihoiig.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

HBase數(shù)據(jù)導(dǎo)入ImportTsv

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管