當前位置：首頁 >

pig安装应用

發布時間：2023/12/20 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 pig安装应用小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

pig安裝
基本應用
- - bug
數據集運算

pig安裝

1、客戶端主機安裝軟件并解壓

hadoop@ddai-desktop:~$ cd /opt/ hadoop@ddai-desktop:/opt$ sudo tar xvzf /home/hadoop/pig-0.17.0.tar.gz hadoop@ddai-desktop:/opt$ sudo chown -R hadoop:hadoop pig-0.17.0/

2、修改參數

hadoop@ddai-desktop:~$ cd /opt/pig-0.17.0/ hadoop@ddai-desktop:/opt/pig-0.17.0$ cd conf/ hadoop@ddai-desktop:/opt/pig-0.17.0/conf$ mv log4j.properties.template log4j.properties hadoop@ddai-desktop:/opt/pig-0.17.0/conf$ vim pig.properties pig.logfile=/opt/pig-0.17.0/logs log4jconf=/opt/pig-0.17.0/conf/log4j.propertiesexectype=mapreduce

3、修改環境變量并生效

hadoop@ddai-desktop:~$ vim /home/hadoop/.profile export PIG_HOME=/opt/pig-0.17.0 export PATH=$PATH:$PIG_HOME/bin

hadoop@ddai-desktop:~$ source /home/hadoop/.profile

運行pig

1、主節點運行hadoop服務
2、客戶端主機啟動pig

基本應用

（1）創建test目錄，上傳到hdfs

grunt> mkdir /test grunt> copyFromlocal A.txt /test; grunt> copyFromlocal B.txt /test; grunt> copyFromlocal TP.txt /test; grunt> copyFromlocal MP.txt /test;

（2）裝載A.txt到變量a，變量b為a的列$0+列$1

grunt> a = load '/test/A.txt' using PigStorage(',') as (c1:int,c2:double,c3:float); grunt> b = foreach a generate $0+$1 as b1;604000 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s). 21/08/13 16:29:47 WARN newplan.BaseOperatorPlan: Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).grunt> dump b; (1.0) (4.0) grunt> describe b; b: {b1: double}

（3）變量c為b的b1列減去1

grunt> c = foreach b generate b1-1; grunt> dump c;

（4）變量d為a的第1列，是0輸出（c1,c2），不是0輸出（c1,c3）

grunt> d = foreach a generate c1,($0==0?$1:$2); grunt> dump d;

（5）變量f為a的c1>0并且c2>1的輸出

grunt> f = filter a by c1>0 and c2>1; grunt> dump f;

（6）裝載Tuple數據TP.txt到變量tp，變量g為tp產生的輸出

grunt> tp = load '/test/TP.txt' as t:tuple(c1:int,c2:int,c3:int); grunt> describe tp; grunt> dump tp;

grunt> g = foreach tp generate t.c1,t.c2,t.c3; grunt> describe g; grunt> dump g;

（7）對g進行分組，輸出Bag數據到變量bg

grunt> bg = group g by c1; grunt> describe bg; grunt> dump bg;

grunt> illustrate bg;

grunt> x = foreach bg generate g.c1; grunt> dump x;

（8）裝載Map數據MP.txt到變量mp，變量h為mp產生的輸出

grunt> mp = load '/test/MP.txt' as (m:map[]); grunt> describe mp; mp: {m: map[]} grunt> h = foreach mp generate m#'Pig'; grunt> describe h; h: {bytearray}grunt> dump h;

bug

bin/hadoop dfsadmin -safemode leave //在bin下執行 //若配置環境變量，使用以下命令 hadoop dfsadmin -safemode leave

解決后

數據集運算

（1）加載數據

grunt> a = load '/test/A.txt' using PigStorage(',') as (a1:int, a2:int, a3:int); grunt> b = load '/test/B.txt' using PigStorage(',') as (b1:int, b2:int, b3:int);

（2）a與b并集

grunt> c = union a, b; grunt> dump c;

（3）將c分割為d和e，其中d的第一列數據值為0，e的第一列的數據為1（$0表示數據集的第一列）

grunt> split c into d if $0 == 0, e if $0 == 1; grunt> dump d; grunt> dump e;

（4）選擇c中的一部分數據

grunt> f = filter c by $1 > 3; grunt> dump f;

（5）對數據進行分組

grunt> g = group c by $2; grunt> dump g;

（6）將所有的元素集合到一起

grunt> h = group c all; grunt> dump h;

（7）查看h中元素個數

grunt> i = foreach h generate COUNT($1); grunt> dump i;

（8）連表查詢，條件是a.$2 == b.$2

grunt> j = join a by $2, b by $2; grunt> dump j;

（9）變量k為c的$1和$1 * $2的輸出

grunt> k = foreach c generate $1, $1 * $2; grunt> dump k;

總結

以上是生活随笔為你收集整理的pig安装应用的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。