RHadoop安装和使用
環(huán)境 hortonworks 2.3版本,ambari2.1.1, hadoop版本2.7.1
?
1. 下載RHadoop相關(guān)軟件包
從地址(https://cran.r-project.org/src/base/R-3/)下載R語言的tar包
我下載的是:
https://cran.r-project.org/src/base/R-3/R-3.2.3.tar.gz
https://github.com/RevolutionAnalytics/rmr2/releases/download/3.3.1/rmr2_3.3.1.tar.gz
https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.8.tar.gz
https://github.com/RevolutionAnalytics/rhbase/blob/master/build/rhbase_1.2.1.tar.gz
?
2. centos6.5 上安裝R
然后安裝相關(guān)依賴包:
#yum install gcc-gfortran
#yum install gcc gcc-c++
#yum install readline-devel
#yum install libXt-devel
?
# tar xvf R-3.2.3.tar.gz
# cd R-3.2.3
# ./configure
# make
# make install
?
3:確認(rèn)Java環(huán)境變量
RHadoop依賴于rJava包,安裝rJava前確認(rèn)已經(jīng)配置了Java環(huán)境變量,然后進(jìn)行R對(duì)jvm建立連接。
[root@dataserver R-3.2.3]# cat /etc/profile結(jié)尾添加
########################################
export JAVA_HOME=/usr/java/jdk1.7.0_79
export JRE_HOME=/usr/java/jdk1.7.0_79/jre
export PATH=/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export CLASSPATH=.:/lib/dt.jar:/lib/tool.jar
export HADOOP_CMD=/usr/bin/hadoop
export HADOOP_STREAMING=/usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar
export HADOOP_HOME=/usr/hdp/current/hadoop-client
export JAVA_HOME JRE_HOME PATH CLASSPATH
########################################
[root@dataserver R-3.2.3]# R CMD javareconf
?
4:安裝相關(guān)的依賴包,確保RHadoop軟件包能正常使用
[root@dataserver R-3.2.3]# R?
> install.packages("rJava")
> install.packages("reshape2")
> install.packages("Rcpp")
> install.packages("iterators")
> install.packages("itertools")
> install.packages("digest")
> install.packages("RJSONIO")
> install.packages("functional")
> install.packages("bitops")
> install.packages("caTools")
> quit()
或者
install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))
?
5:安裝RHadoop軟件包
[root@dataserver R-3.2.3]# export HADOOP_CMD=/usr/bin/hadoop
[root@dataserver R-3.2.3]# export HADOOP_STREAMING=/usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar
[root@dataserver R-3.0.2]# R CMD INSTALL rhdfs_1.0.8.tar.gz
[root@dataserver R-3.0.2]# R CMD INSTALL rmr2_3.3.1.tar.gz
[root@dataserver R-3.0.2]# R CMD INSTALL rhbase_1.2.1.tar.gz
?
6:使用RHadoop軟件包
[root@dataserver R-3.2.3]# R
> library(rhdfs)
> hdfs.init()
> hdfs.ls("/")
?
?
[root@dataserver R-3.2.3]# export HADOOP_HOME=/usr/hdp/current/hadoop-client
> library(rmr2)
?
?
普通的R語言程序:
> small.ints = 1:10
> sapply(small.ints, function(x) x^2)
MapReduce的R語言程序:
> small.ints = to.dfs(1:10)
> mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
> from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")
?
如果出現(xiàn)如下異常:
Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directoryat java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)... 23 more Caused by: java.io.IOException: error=2, No such file or directoryat java.lang.UNIXProcess.forkAndExec(Native Method)at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)at java.lang.ProcessImpl.start(ProcessImpl.java:134)at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)... 24 more需要做個(gè)鏈接:
ln -s /usr/local/bin/Rscript /usr/bin/Rscript?
?
如果在centos7上安裝R就簡單多了:
步驟如下:
yum install epel-release
yum install R
總結(jié)
以上是生活随笔為你收集整理的RHadoop安装和使用的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 一、【Collection、泛型】
- 下一篇: 《锋利的JQuery》读书笔记