當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Hive3.1.2安装指南

發(fā)布時(shí)間：2023/12/20 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive3.1.2安装指南小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Hive3.1.2安裝指南

1、安裝Hive3.1.2

首先需要下載Hive安裝包文件， Hive官網(wǎng)下載地址

#解壓安裝包 hadoop@hadoop-master:~$ sudo tar xf apache-hive-3.1.2-bin.tar.gz -C /usr/local/ hadoop@hadoop-master:~$ cd /usr/local/ hadoop@hadoop-master:/usr/local$ sudo mv apache-hive-3.1.2-bin hive#把hbase目錄權(quán)限賦予給hadoop用戶： hadoop@hadoop-master:/usr/local$ sudo chown -R hadoop:hadoop hive

為了方便使用，我們把hive命令加入到環(huán)境變量中去，命令如下：

#配置環(huán)境變量 hadoop@hadoop-master:/usr/local$ vim ~/.bashrc hadoop@hadoop-master:/usr/local$ tail -3 ~/.bashrc export HIVE_HOME=/usr/local/hive export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hbase/bin:$HIVE_HOME/bin export HADOOP_HOME=/usr/local/hadoop#使環(huán)境變量立即生效 hadoop@hadoop-master:/usr/local$ source ~/.bashrc

將hive-default.xml.template重命名為hive-default.xml

hadoop@hadoop-master:/usr/local$ cd /usr/local/hive/conf/ hadoop@hadoop-master:/usr/local/hive/conf$ cp -a hive-default.xml.template hive-default.xml

然后新建一個(gè)配置文件hive-site.xml，添加如下配置信息：

hadoop@hadoop-master:/usr/local/hive/conf$ nano hive-site.xml hadoop@hadoop-master:/usr/local/hive/conf$ cat hive-site.xml <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value><description>JDBC connect string for a JDBC metastore</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBC metastore</description></property><property><name>javax.jdo.option.ConnectionUserName</name><value>hive</value><description>username to use against metastore database</description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>hive</value><description>password to use against metastore database</description></property> </configuration>

2、安裝并配置mysql

這里我們采用MySQL數(shù)據(jù)庫(kù)保存Hive的元數(shù)據(jù)，而不是采用Hive自帶的derby來(lái)存儲(chǔ)元數(shù)據(jù)。

首先需要安裝MySQL

#更新軟件源 hadoop@hadoop-master:~$ sudo apt-get update#安裝mysql hadoop@hadoop-master:~$ sudo apt-get -y install mysql-server#修改MySQL的編碼 hadoop@hadoop-master:~$ egrep -v "^#|^$" /etc/mysql/mysql.conf.d/mysqld.cnf ...... [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp character_set_server=utf8 #添加此行 lc-messages-dir = /usr/share/mysql ......

將下載后的mysql jdbc包解壓

#解壓安裝包 hadoop@hadoop-master:~$ tar xf mysql-connector-java-5.1.40.tar.gz#將mysql-connector-java-5.1.40-bin.jar拷貝到/usr/local/hive/lib目錄下 hadoop@hadoop-master:~$ cp -a mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar /usr/local/hive/lib/

新建hive數(shù)據(jù)庫(kù)

#這個(gè)hive數(shù)據(jù)庫(kù)與hive-site.xml中l(wèi)ocalhost:3306/hive的hive對(duì)應(yīng)，用來(lái)保存hive元數(shù)據(jù) mysql> create database hive; #配置mysql允許hive接入： #將所有數(shù)據(jù)庫(kù)的所有表的所有權(quán)限賦給hive用戶，后面的hive是配置hive-site.xml中配置的連接密碼 mysql> grant all on *.* to hive@localhost identified by 'hive'; #刷新mysql系統(tǒng)權(quán)限關(guān)系表 mysql> flush privileges;

替換guava.jar包

#查看hadoop安裝目錄下share/hadoop/common/lib內(nèi)guava.jar版本 hadoop@hadoop-master:~$ ll -d /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar -rw-r--r-- 1 hadoop hadoop 2747878 9月 12 2019 /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar#查看hive安裝目錄下lib內(nèi)guava.jar的版本 hadoop@hadoop-master:~$ ll -d /usr/local/hive/lib/guava-19.0.jar -rw-r--r-- 1 hadoop hadoop 2308517 9月 27 2018 /usr/local/hive/lib/guava-19.0.jar#如果兩者不一致，刪除版本低的，并拷貝高版本的 hadoop@hadoop-master:~$ rm -rf /usr/local/hive/lib/guava-19.0.jar hadoop@hadoop-master:~$ cp -a /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /usr/local/hive/lib/#重啟數(shù)據(jù)庫(kù) hadoop@hadoop-master:~$ sudo systemctl restart mysql

使用schematool工具

#Hive現(xiàn)在包含一個(gè)用于 Hive Metastore 架構(gòu)操控的脫機(jī)工具，名為 schematool.此工具可用于初始化當(dāng)前 Hive 版本的 Metastore 架構(gòu)。此外，其還可處理從較舊版本到新版本的架構(gòu)升級(jí)。hadoop@hadoop-master:~$ cd /usr/local/hive hadoop@hadoop-master:/usr/local/hive$ ./bin/schematool -dbType mysql -initSchema

啟動(dòng)hive之前，請(qǐng)確保hadoop集群已啟動(dòng)

#啟動(dòng)hive hadoop@hadoop-master:/usr/local/hive$ hive

使用mysql作為元數(shù)據(jù)庫(kù)時(shí)登陸使用mysql作為元數(shù)據(jù)庫(kù)時(shí)登陸啟動(dòng)Hive過(guò)程中，可能出現(xiàn)的錯(cuò)誤和解決方案如下：

【錯(cuò)誤1】

【錯(cuò)誤】 java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument【原因】 com.google.common.base.Preconditions.checkArgument 這是因?yàn)閔ive內(nèi)依賴的guava.jar和hadoop內(nèi)的版本不一致造成的。【解決方法】 1.查看hadoop安裝目錄下share/hadoop/common/lib內(nèi)guava.jar版本 2.查看hive安裝目錄下lib內(nèi)guava.jar的版本如果兩者不一致，刪除版本低的，并拷貝高版本的問(wèn)題解決！

【錯(cuò)誤2】

【錯(cuò)誤】 org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : “VERSION” in Catalog “” Schema “”. DataNucleus requires this table to perform its persistence operations.【解決方案】進(jìn)入hive安裝目錄(比如/usr/local/hive)，執(zhí)行如下命令：./bin/schematool -dbType mysql -initSchema

【錯(cuò)誤3】

【錯(cuò)誤】在啟動(dòng)Hive時(shí)，有可能會(huì)出現(xiàn)Hive metastore database is not initialized的錯(cuò)誤，這里給出解決方案。【原因】以前曾經(jīng)安裝了Hive或MySQL，重新安裝Hive和MySQL以后，導(dǎo)致版本、配置不一致。【解決方案】使用schematool工具。Hive現(xiàn)在包含一個(gè)用于 Hive Metastore 架構(gòu)操控的脫機(jī)工具，名為 schematool.此工具可用于初始化當(dāng)前 Hive 版本的 Metastore 架構(gòu)。此外，其還可處理從較舊版本到新版本的架構(gòu)升級(jí)。所以，解決上述錯(cuò)誤，你可以在終端執(zhí)行如下命令:cd /usr/local/hive ./bin/schematool -dbType mysql -initSchema

執(zhí)行后，再啟動(dòng)Hive，應(yīng)該就正常了。

啟動(dòng)進(jìn)入Hive的交互式執(zhí)行環(huán)境以后，會(huì)出現(xiàn)如下命令提示符：

hive>

可以在里面輸入SQL語(yǔ)句，如果要退出Hive交互式執(zhí)行環(huán)境，可以輸入如下命令：

hive>exit;

3、Hive的常用HiveQL操作

3.1 Hive基本數(shù)據(jù)類型

首先，我們簡(jiǎn)單敘述一下HiveQL的基本數(shù)據(jù)類型。

Hive支持基本數(shù)據(jù)類型和復(fù)雜類型, 基本數(shù)據(jù)類型主要有數(shù)值類型(INT、FLOAT、DOUBLE ) 、布爾型和字符串, 復(fù)雜類型有三種:ARRAY、MAP 和 STRUCT。

基本數(shù)據(jù)類型
- TINYINT: 1個(gè)字節(jié)
- SMALLINT: 2個(gè)字節(jié)
- INT: 4個(gè)字節(jié)
- BIGINT: 8個(gè)字節(jié)
- BOOLEAN: TRUE/FALSE
- FLOAT: 4個(gè)字節(jié)，單精度浮點(diǎn)型
- DOUBLE: 8個(gè)字節(jié)，雙精度浮點(diǎn)型STRING 字符串
復(fù)雜數(shù)據(jù)類型
- ARRAY: 有序字段
- MAP: 無(wú)序字段
- STRUCT: 一組命名的字段

3.2 常用的HiveQL操作命令

Hive常用的HiveQL操作命令主要包括：數(shù)據(jù)定義、數(shù)據(jù)操作。接下來(lái)詳細(xì)介紹一下這些命令即用法。

數(shù)據(jù)定義：主要用于創(chuàng)建修改和刪除數(shù)據(jù)庫(kù)、表、視圖、函數(shù)和索引。

創(chuàng)建、修改和刪除數(shù)據(jù)庫(kù)

create database if not exists hive; #創(chuàng)建數(shù)據(jù)庫(kù) show databases; #查看Hive中包含數(shù)據(jù)庫(kù) show databases like 'h.*'; #查看Hive中以h開(kāi)頭數(shù)據(jù)庫(kù) describe databases; #查看hive數(shù)據(jù)庫(kù)位置等信息 alter database hive set dbproperties; #為hive設(shè)置鍵值對(duì)屬性 use hive; #切換到hive數(shù)據(jù)庫(kù)下 drop database if exists hive; #刪除不含表的數(shù)據(jù)庫(kù) drop database if exists hive cascade; #刪除數(shù)據(jù)庫(kù)和它中的表

注意，除 dbproperties屬性外，數(shù)據(jù)庫(kù)的元數(shù)據(jù)信息都是不可更改的，包括數(shù)據(jù)庫(kù)名和數(shù)據(jù)庫(kù)所在的目錄位置，沒(méi)有辦法刪除或重置數(shù)據(jù)庫(kù)屬性。

創(chuàng)建、修改和刪除表

#創(chuàng)建內(nèi)部表(管理表) create table if not exists hive.usr(name string comment 'username',pwd string comment 'password',address struct<street:string,city:string,state:string,zip:int>,comment 'home address',identify map<int,tinyint> comment 'number,sex') comment 'description of the table' tblproperties('creator'='me','time'='2016.1.1'); #創(chuàng)建外部表 create external table if not exists usr2(name string,pwd string,address struct<street:string,city:string,state:string,zip:int>,identify map<int,tinyint>) row format delimited fields terminated by ','location '/usr/local/hive/warehouse/hive.db/usr'; #創(chuàng)建分區(qū)表 create table if not exists usr3(name string,pwd string,address struct<street:string,city:string,state:string,zip:int>,identify map<int,tinyint>) partitioned by(city string,state string); #復(fù)制usr表的表模式 create table if not exists hive.usr1 like hive.usr;show tables in hive; show tables 'u.*'; #查看hive中以u(píng)開(kāi)頭的表 describe hive.usr; #查看usr表相關(guān)信息 alter table usr rename to custom; #重命名表#為表增加一個(gè)分區(qū) alter table usr2 add if not exists partition(city=”beijing”,state=”China”) location '/usr/local/hive/warehouse/usr2/China/beijing'; #修改分區(qū)路徑 alter table usr2 partition(city=”beijing”,state=”China”)set location '/usr/local/hive/warehouse/usr2/CH/beijing'; #刪除分區(qū) alter table usr2 drop if exists partition(city=”beijing”,state=”China”) #修改列信息 alter table usr change column pwd password string after address;alter table usr add columns(hobby string); #增加列 alter table usr replace columns(uname string); #刪除替換列 alter table usr set tblproperties('creator'='liming'); #修改表屬性 alter table usr2 partition(city=”beijing”,state=”China”) #修改存儲(chǔ)屬性 set fileformat sequencefile; use hive; #切換到hive數(shù)據(jù)庫(kù)下 drop table if exists usr1; #刪除表 drop database if exists hive cascade; #刪除數(shù)據(jù)庫(kù)和它中的表

視圖和索引的創(chuàng)建、修改和刪除

主要語(yǔ)法如下，用戶可自行實(shí)現(xiàn)。

create view view_name as....; #創(chuàng)建視圖 alter view view_name set tblproperties(…); #修改視圖

因?yàn)橐晥D是只讀的，所以對(duì)于視圖只允許改變?cè)獢?shù)據(jù)中的 tblproperties屬性。

#刪除視圖 drop view if exists view_name;#創(chuàng)建索引 create index index_name on table table_name(partition_name/column_name) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild....;

這里org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler是一個(gè)索引處理器，即一個(gè)實(shí)現(xiàn)了索引接口的Java類，另外Hive還有其他的索引實(shí)現(xiàn)。

alter index index_name on table table_name partition(...) rebulid; #重建索引

如果使用 deferred rebuild，那么新索引成空白狀態(tài)，任何時(shí)候可以進(jìn)行第一次索引創(chuàng)建或重建。

show formatted index on table_name; #顯示索引 drop index if exists index_name on table table_name; #刪除索引

用戶自定義函數(shù)

在新建用戶自定義函數(shù)(UDF)方法前，先了解一下Hive自帶的那些函數(shù)。show functions; 命令會(huì)顯示Hive中所有的函數(shù)名稱：

hive> show functions; OK ! != $sum0 % & * ......

若想要查看具體函數(shù)使用方法可使用describe function函數(shù)名：

hive> describe function abs; OK abs(x) - returns the absolute value of x Time taken: 0.027 seconds, Fetched: 1 row(s)

首先編寫(xiě)自己的UDF前需要繼承UDF類并實(shí)現(xiàn)evaluate()函數(shù)，或是繼承GenericUDF類實(shí)現(xiàn)initialize()函數(shù)、evaluate()函數(shù)和getDisplayString()函數(shù)，還有其他的實(shí)現(xiàn)方法，感興趣的用戶可以自行學(xué)習(xí)。

另外，如果用戶想在Hive中使用該UDF需要將我們編寫(xiě)的Java代碼進(jìn)行編譯，然后將編譯后的UDF二進(jìn)制類文件(.class文件)打包成一個(gè)JAR文件，然后在Hive會(huì)話中將這個(gè)JAR文件加入到類路徑下，在通過(guò)create function語(yǔ)句定義好使用這個(gè)Java類的函數(shù)。

add jar <jar文件的絕對(duì)路徑>; #創(chuàng)建函數(shù) create temporary function function_name; drop temporary function if exists function_name; #刪除函數(shù)

3.3 數(shù)據(jù)操作

主要實(shí)現(xiàn)的是將數(shù)據(jù)裝載到表中(或是從表中導(dǎo)出)，并進(jìn)行相應(yīng)查詢操作，對(duì)熟悉SQL語(yǔ)言的用戶應(yīng)該不會(huì)陌生。

向表中裝載數(shù)據(jù)

這里我們以只有兩個(gè)屬性的簡(jiǎn)單表為例來(lái)介紹。首先創(chuàng)建表stu和course，stu有兩個(gè)屬性id與name，course有兩個(gè)屬性cid與sid。

#創(chuàng)建hive庫(kù) hive> create database if not exists hive; OK Time taken: 0.186 seconds#創(chuàng)建表stu hive> create table stu(id int,name string) row format delimited fields terminated by ' '; OK Time taken: 0.082 seconds#創(chuàng)建表course hive> create table course(cid int,sid int) row format delimited fields terminated by ' '; OK Time taken: 0.044 seconds

向表中裝載數(shù)據(jù)有兩種方法：從文件中導(dǎo)入和通過(guò)查詢語(yǔ)句插入。

1、從文件中導(dǎo)入
假如這個(gè)表中的記錄存儲(chǔ)于文件stu.txt中，內(nèi)容如下。

hadoop@hadoop-master:~$ mkdir -p /usr/local/hadoop/examples hadoop@hadoop-master:~$ vim /usr/local/hadoop/examples/stu.txt hadoop@hadoop-master:~$ cat /usr/local/hadoop/examples/stu.txt 1 xiapi 2 xiaoxue 3 qingqing

下面我們把這個(gè)文件中的數(shù)據(jù)裝載到表stu中，操作如下：

hive> use hive; OK Time taken: 0.027 secondshive> load data local inpath '/usr/local/hadoop/examples/stu.txt' overwrite into table stu; Loading data to table hive.stu OK Time taken: 0.173 secondshive> select * from stu; OK 1 xiapi 2 xiaoxue 3 qingqing Time taken: 0.096 seconds, Fetched: 3 row(s)

如果stu.txt文件存儲(chǔ)在HDFS 上，則不需要 local 關(guān)鍵字。

2、通過(guò)查詢語(yǔ)句插入

使用如下命令，創(chuàng)建stu1表，它和stu表屬性相同，我們要把從stu表中查詢得到的數(shù)據(jù)插入到stu1中：

hive> create table stu1 as select id,name from stu;

上面是創(chuàng)建表，并直接向新表插入數(shù)據(jù)；若表已經(jīng)存在，向表中插入數(shù)據(jù)需執(zhí)行以下命令：

insert overwrite table stu1 select id,name from stu where(條件);

這里關(guān)鍵字overwrite的作用是替換掉表(或分區(qū))中原有數(shù)據(jù)，換成into關(guān)鍵字，直接追加到原有內(nèi)容后。

從表中導(dǎo)出數(shù)據(jù)

1、可以簡(jiǎn)單拷貝文件或文件夾

命令如下：

hadoop fs -cp source_path target_path;

2、寫(xiě)入臨時(shí)文件

命令如下：

hive> insert overwrite local directory '/usr/local/hadoop/tmp/stu' select id,name from stu;

查詢操作

和SQL的查詢完全一樣，這里不再贅述。主要使用select…from…where…等語(yǔ)句，再結(jié)合關(guān)鍵字group by、having、like、rlike等操作。這里我們簡(jiǎn)單介紹一下SQL中沒(méi)有的case…when…then…句式、join操作和子查詢操作。

case…when…then…句式和if條件語(yǔ)句類似，用于處理單個(gè)列的查詢結(jié)果，語(yǔ)句如下：

select id,name,case when id=1 then 'first' when id=2 then 'second'else 'third'

結(jié)果如下：

hive> select id,name,case when id=1 then 'first' when id=2 then 'second' else 'third' end from stu; OK 1 xiapi first 2 xiaoxue second 3 qingqing third Time taken: 0.108 seconds, Fetched: 3 row(s)

連接
連接(join)是將兩個(gè)表中在共同數(shù)據(jù)項(xiàng)上相互匹配的那些行合并起來(lái), HiveQL 的連接分為內(nèi)連接、左向外連接、右向外連接、全外連接和半連接 5 種。

1、內(nèi)連接(等值連接)
內(nèi)連接使用比較運(yùn)算符根據(jù)每個(gè)表共有的列的值匹配兩個(gè)表中的行。

首先，我們先把以下內(nèi)容插入到course表中(自行完成)

hive> select * from course; OK 1 3 2 1 3 1 Time taken: 0.098 seconds, Fetched: 3 row(s)

下面查詢stu和course表中學(xué)號(hào)相同的所有行，命令如下：

hive> select stu.*, course.* from stu join course on(stu .id=course .sid); ...... OK 1 xiapi 2 1 1 xiapi 3 1 3 qingqing 1 3 Time taken: 19.167 seconds, Fetched: 3 row(s)

2、左連接
左連接的結(jié)果集包括“LEFT OUTER”子句中指定的左表的所有行, 而不僅僅是連接列所匹配的行。如果左表的某行在右表中沒(méi)有匹配行, 則在相關(guān)聯(lián)的結(jié)果集中右表的所有選擇列均為空值，命令如下：

hive> select stu.*, course.* from stu left outer join course on(stu .id=course .sid); .... OK 1 xiapi 2 1 1 xiapi 3 1 2 xiaoxue NULL NULL 3 qingqing 1 3 Time taken: 18.285 seconds, Fetched: 4 row(s)

3、右連接
右連接是左向外連接的反向連接,將返回右表的所有行。如果右表的某行在左表中沒(méi)有匹配行,則將為左表返回空值。命令如下：

hive> select stu.*, course.* from stu right outer join course on(stu .id=course .sid); .... OK 3 qingqing 1 3 1 xiapi 2 1 1 xiapi 3 1 Time taken: 17.139 seconds, Fetched: 3 row(s)

4、全連接
全連接返回左表和右表中的所有行。當(dāng)某行在另一表中沒(méi)有匹配行時(shí),則另一個(gè)表的選擇列表包含空值。如果表之間有匹配行,則整個(gè)結(jié)果集包含基表的數(shù)據(jù)值。命令如下：

hive> select stu.*, course.* from stu full outer join course on(stu .id=course .sid); .... OK 1 xiapi 3 1 1 xiapi 2 1 2 xiaoxue NULL NULL 3 qingqing 1 3 Time taken: 16.741 seconds, Fetched: 4 row(s)

5、半連接
半連接是 Hive 所特有的, Hive 不支持 in 操作,但是擁有替代的方案; left semi join, 稱為半連接, 需要注意的是連接的表不能在查詢的列中,只能出現(xiàn)在 on 子句中。命令如下：

hive> select stu.* from stu left semi join course on(stu .id=course .sid); .... OK 1 xiapi 3 qingqing Time taken: 17.892 seconds, Fetched: 2 row(s)

子查詢
標(biāo)準(zhǔn) SQL 的子查詢支持嵌套的 select 子句,HiveQL 對(duì)子查詢的支持很有限,只能在from 引導(dǎo)的子句中出現(xiàn)子查詢。

注意，在定義或是操作表時(shí)，不要忘記指定所需數(shù)據(jù)庫(kù)。

4、Hive簡(jiǎn)單編程實(shí)踐

下面我們以詞頻統(tǒng)計(jì)算法為例，來(lái)介紹怎么在具體應(yīng)用中使用Hive。詞頻統(tǒng)計(jì)算法又是最能體現(xiàn)MapReduce思想的算法之一，這里我們可以對(duì)比它在MapReduce中的實(shí)現(xiàn)，來(lái)說(shuō)明使用Hive后的優(yōu)勢(shì)。

MapReduce實(shí)現(xiàn)詞頻統(tǒng)計(jì)的代碼可以通過(guò)下載Hadoop源碼后，在 $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar 包中找到(wordcount類)，wordcount類由63行Java代碼編寫(xiě)而成。下面首先簡(jiǎn)單介紹一下怎么使用MapReduce中wordcount類來(lái)統(tǒng)計(jì)單詞出現(xiàn)的次數(shù)，具體步驟如下：

1、創(chuàng)建input目錄，output目錄會(huì)自動(dòng)生成。其中input為輸入目錄，output目錄為輸出目錄。命令如下：

hadoop@hadoop-master:~$ hdfs dfs -mkdir input

2、然后，在input文件夾中創(chuàng)建兩個(gè)測(cè)試文件file1.txt和file2.txt，命令如下：

hadoop@hadoop-master:~$ echo "hello world" > file1.txt hadoop@hadoop-master:~$ echo "hello hadoop" > file2.txt

3、上傳

hadoop@hadoop-master:~$ hdfs dfs -put /home/hadoop/file* input/hadoop@hadoop-master:~$ hdfs dfs -ls input/ Found 2 items -rw-r--r-- 1 hadoop supergroup 12 2022-04-25 16:02 input/file1.txt -rw-r--r-- 1 hadoop supergroup 13 2022-04-25 16:02 input/file2.txt

4、執(zhí)行如下hadoop命令：

hadoop@hadoop-master:~$ cd /usr/local/hadoop/hadoop@hadoop-master:/usr/local/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount input output

5、我們可以到output文件夾中查看結(jié)果，結(jié)果如下：

hadoop@hadoop-master:/usr/local/hadoop$ hdfs dfs -cat output/* hadoop 1 hello 2 world 1

下面我們通過(guò)HiveQL實(shí)現(xiàn)詞頻統(tǒng)計(jì)功能，此時(shí)只要編寫(xiě)下面7行代碼，而且不需要進(jìn)行編譯生成jar來(lái)執(zhí)行。HiveQL實(shí)現(xiàn)命令如下：

hive> create table docs(line string); OKhive> load data inpath 'file:///usr/local/hadoop/input' overwrite into table docs; Loading data to table hive.docs OK Time taken: 0.65 secondshive> select * from docs; OK hello world hello hadoop Time taken: 1.06 seconds, Fetched: 2 row(s) hive> create table word_count as > select word, count(1) as count from> (select explode(split(line,' '))as word from docs) w> group by word> order by word;> order by word;hive> select * from word_count; OK hadoop 1 hello 2 world 1 Time taken: 0.117 seconds, Fetched: 3 row(s)

由上可知，采用Hive實(shí)現(xiàn)最大的優(yōu)勢(shì)是，對(duì)于非程序員，不用學(xué)習(xí)編寫(xiě)Java MapReduce代碼了，只需要用戶學(xué)習(xí)使用HiveQL就可以了，而這對(duì)于有SQL基礎(chǔ)的用戶而言是非常容易的。

本文參考：http://dblab.xmu.edu.cn/blog/2440-2/

總結(jié)

以上是生活随笔為你收集整理的Hive3.1.2安装指南的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

指南

上一篇：设计模式之创建型——工厂模式（3种）
下一篇： html blank.gif 1x1,B