日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pig安装应用

發布時間:2023/12/20 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 pig安装应用 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

  • pig安裝
  • 基本應用
      • bug
  • 數據集運算

pig安裝

1、客戶端主機安裝軟件并解壓

hadoop@ddai-desktop:~$ cd /opt/ hadoop@ddai-desktop:/opt$ sudo tar xvzf /home/hadoop/pig-0.17.0.tar.gz hadoop@ddai-desktop:/opt$ sudo chown -R hadoop:hadoop pig-0.17.0/

2、修改參數

hadoop@ddai-desktop:~$ cd /opt/pig-0.17.0/ hadoop@ddai-desktop:/opt/pig-0.17.0$ cd conf/ hadoop@ddai-desktop:/opt/pig-0.17.0/conf$ mv log4j.properties.template log4j.properties hadoop@ddai-desktop:/opt/pig-0.17.0/conf$ vim pig.properties pig.logfile=/opt/pig-0.17.0/logs log4jconf=/opt/pig-0.17.0/conf/log4j.propertiesexectype=mapreduce

3、修改環境變量并生效

hadoop@ddai-desktop:~$ vim /home/hadoop/.profile export PIG_HOME=/opt/pig-0.17.0 export PATH=$PATH:$PIG_HOME/bin

hadoop@ddai-desktop:~$ source /home/hadoop/.profile

運行pig

1、主節點運行hadoop服務
2、客戶端主機啟動pig

基本應用


(1)創建test目錄,上傳到hdfs

grunt> mkdir /test grunt> copyFromlocal A.txt /test; grunt> copyFromlocal B.txt /test; grunt> copyFromlocal TP.txt /test; grunt> copyFromlocal MP.txt /test;

(2)裝載A.txt到變量a,變量b為a的列$0+列$1

grunt> a = load '/test/A.txt' using PigStorage(',') as (c1:int,c2:double,c3:float); grunt> b = foreach a generate $0+$1 as b1;604000 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s). 21/08/13 16:29:47 WARN newplan.BaseOperatorPlan: Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).grunt> dump b; (1.0) (4.0) grunt> describe b; b: {b1: double}


(3)變量c為b的b1列減去1

grunt> c = foreach b generate b1-1; grunt> dump c;


(4)變量d為a的第1列,是0輸出(c1,c2),不是0輸出(c1,c3)

grunt> d = foreach a generate c1,($0==0?$1:$2); grunt> dump d;


(5)變量f為a的c1>0并且c2>1的輸出

grunt> f = filter a by c1>0 and c2>1; grunt> dump f;


(6)裝載Tuple數據TP.txt到變量tp,變量g為tp產生的輸出

grunt> tp = load '/test/TP.txt' as t:tuple(c1:int,c2:int,c3:int); grunt> describe tp; grunt> dump tp;

grunt> g = foreach tp generate t.c1,t.c2,t.c3; grunt> describe g; grunt> dump g;


(7)對g進行分組,輸出Bag數據到變量bg

grunt> bg = group g by c1; grunt> describe bg; grunt> dump bg;

grunt> illustrate bg;

grunt> x = foreach bg generate g.c1; grunt> dump x;


(8)裝載Map數據MP.txt到變量mp,變量h為mp產生的輸出

grunt> mp = load '/test/MP.txt' as (m:map[]); grunt> describe mp; mp: {m: map[]} grunt> h = foreach mp generate m#'Pig'; grunt> describe h; h: {bytearray}grunt> dump h;

bug

bin/hadoop dfsadmin -safemode leave //在bin下執行 //若配置環境變量,使用以下命令 hadoop dfsadmin -safemode leave

解決后

數據集運算

(1)加載數據

grunt> a = load '/test/A.txt' using PigStorage(',') as (a1:int, a2:int, a3:int); grunt> b = load '/test/B.txt' using PigStorage(',') as (b1:int, b2:int, b3:int);

(2)a與b并集

grunt> c = union a, b; grunt> dump c;


(3)將c分割為d和e,其中d的第一列數據值為0,e的第一列的數據為1($0表示數據集的第一列)

grunt> split c into d if $0 == 0, e if $0 == 1; grunt> dump d; grunt> dump e;



(4)選擇c中的一部分數據

grunt> f = filter c by $1 > 3; grunt> dump f;


(5)對數據進行分組

grunt> g = group c by $2; grunt> dump g;


(6)將所有的元素集合到一起

grunt> h = group c all; grunt> dump h;

(7)查看h中元素個數

grunt> i = foreach h generate COUNT($1); grunt> dump i;


(8)連表查詢,條件是a.$2 == b.$2

grunt> j = join a by $2, b by $2; grunt> dump j;

(9)變量k為c的$1和$1 * $2的輸出

grunt> k = foreach c generate $1, $1 * $2; grunt> dump k;

總結

以上是生活随笔為你收集整理的pig安装应用的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。