Hive存儲是基于hadoop hdfs文件系統(tǒng)的,通過默認(rèn)內(nèi)嵌的Derby 數(shù)據(jù)庫或外部數(shù)據(jù)庫系統(tǒng)(如MySQL)組織元數(shù)據(jù)訪問,下面就通過實(shí)際案例描述其存儲過程。
創(chuàng)新互聯(lián)公司專注于企業(yè)營銷型網(wǎng)站、網(wǎng)站重做改版、宜州網(wǎng)站定制設(shè)計(jì)、自適應(yīng)品牌網(wǎng)站建設(shè)、H5技術(shù)、商城網(wǎng)站建設(shè)、集團(tuán)公司官網(wǎng)建設(shè)、成都外貿(mào)網(wǎng)站制作、高端網(wǎng)站制作、響應(yīng)式網(wǎng)頁設(shè)計(jì)等建站業(yè)務(wù),價格優(yōu)惠性價比高,為宜州等各大城市提供網(wǎng)站開發(fā)制作服務(wù)。
1, 在hive 中創(chuàng)建表,然后把外部csv文件導(dǎo)入其中(外部文件為Batting.csv, 內(nèi)部表為temp_batting):
hive>create table temp_batting(col_value STRING);
hive> show tables;
OK
temp_batting
...
hive>LOAD DATAINPATH'hive/data/Batting.csv' OVERWRITE INTO TABLE temp_batting;
2, 查看外部mysql數(shù)據(jù)庫,可以看到新創(chuàng)建的temp_batting表:
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+--------------+----------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID |TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT |VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+--------------+----------------+--------------------+----------
| 66 | 1432707070 | 1 | 0 | root | 0 | 66 | temp_batting | MANAGED_TABLE |NULL | NULL |
| |
+--------+-------------+-------+------------------+-------+-----------+-------+--------------+----------------+--------------------+----------
...
查看其在hdfs上存儲路徑:
mysql> select * from SDS;
+-------+-------+--------------------------------------------------+---------------+---------------------------+--------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT |IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS |OUTPUT_FORMAT | SERDE_ID |
+-------+-------+--------------------------------------------------+---------------+---------------------------+--------------------------------
| 66 | 71 | org.apache.hadoop.mapred.TextInputFormat | | |hdfs://localhost:9000/user/hive/warehouse/temp_batting | -1 |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 66 |
可以看到是:
hdfs://localhost:9000/user/hive/warehouse/temp_batting
3,到hadoop 的hdfs文件系統(tǒng)中查看這個表路徑:
[root@lr rli]# hadoop dfs -ls /user/hive/warehouse
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
...
drwxr-xr-x - root supergroup 02015-05-27 14:16 /user/hive/warehouse/temp_batting
...
[root@lr rli]# hadoop dfs -ls/user/hive/warehouse/temp_batting
DEPRECATED: Use of this script to execute hdfs command isdeprecated.
Instead use the hdfs command for it.
Found 1 items
-rwxr-xr-x 1 root supergroup 6398990 2015-05-2714:02 /user/hive/warehouse/temp_batting/Batting.csv
可以看到其文件大小及內(nèi)容。
結(jié)論:
Hive通過關(guān)聯(lián)數(shù)據(jù)庫系統(tǒng)記錄文件的存儲路徑,屬性等,實(shí)際數(shù)據(jù)存在hdfs系統(tǒng)中,當(dāng)通過select等操作生成相應(yīng)的map/reduce進(jìn)程進(jìn)一步數(shù)據(jù)分析處理。