rHadoop分布式安装与配置
非常建議大家找以下兩個(gè)文檔看一下:
RHadoop2.0.2u2_Installation_Configuration_for_RedHat.pdf
Microsoft Word - RHadoop and MapR v2.0.pdf
rhadoop包含下邊三個(gè)包:rhdfs、rmr2、rhbase
下載鏈接:https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads
第一步:介紹
基本Hadoop配置(拓?fù)浣Y(jié)構(gòu)圖如下)
rmr2包使得在R中編碼的MapReduce作業(yè)能夠在Hadoop集群上執(zhí)行。 為了執(zhí)行這些MapReduce作業(yè),必須在Hadoop集群的每個(gè)Task節(jié)點(diǎn)上安裝Revolution R Enterprise和rmr2軟件包(拓?fù)浣Y(jié)構(gòu)圖如下)
rhdfs包提供與HDFS的連接。 rhdfs安裝在Hadoop的Name Node節(jié)點(diǎn)上
rhbase包提供與HBase的連接。 必須在可以訪問(wèn)HBase主服務(wù)器的節(jié)點(diǎn)上安裝軟件包和Revolution R Enterprise。 該軟件包使用Thrift API與HBase通信,因此Apache Thrift服務(wù)器也必須安裝在同一節(jié)點(diǎn)上。
?
第二步:安裝
?
一、在所有的節(jié)點(diǎn)上安裝Revolution R Enterprise 和 rmr2
1. Use the link(s) in your Revolution Analytics welcome letter to download the following??installation files.
§ Revo-Ent-6.2.0-RHEL5.tar.gz or Revo-Ent-6.2.0-RHEL6.tar.gz
§ RHadoop-2.0.2u2.tar.gz
2. Unpack the contents of the Revolution R Enterprise installation bundle. At the prompt,??type:
tar -xzf Revo-Ent-6.2.0-RHEL5.tar.gz
Note: If installing on Red Hat Enterprise Linux 5.x, replace RHEL6 with RHEL5 in the??previous tar command.
3. Change directory to the versioned Revolution directory. At the prompt, type:
cd RevolutionR_6.2.0
4. Install Revolution R Enterprise. At the prompt, type:
./install.py --no-ask -d
5. Unpack the contents of the RHadoop installation bundle. At the prompt, type:
cd ..
tar -xzf RHadoop-2.0.2u2.tar.gz
6. Change directory to the versioned RHadoop directory. At the prompt, type:
cd RHadoop_2.0.2
7. Install rmr2 and its dependent R packages. At the prompt, type:
R CMD INSTALL digest_0.6.3.tar.gz plyr_1.8.tar.gz stringr_0.6.2.tar.gz RJSONIO_1.0-.tar.gz??Rcpp_0.10.3.tar.gz functional_0.1.tar.gz quickcheck_1.0.tar.gz rmr2_2.0.2.tar.gz
8. Update the environment variables needed by rmr2. The values for the environments will??depend upon your Hadoop distribution.
HADOOP_CMD – The complete path to the “hadoop” executable
HADOOP_STREAMING – The complete path to the Hadoop Streaming jar file
Examples of both of these environment variables are shown below:
export HADOOP_CMD=/usr/bin/hadoop
export HADOOP_STREAMING=/usr/lib/hadoop/contrib/streaming/hadoop-streaming-<version>.jar
?
二、安裝rhdfs
1. Install the rJava packages. At the prompt, type:
R CMD INSTALL rJava_0.9-4.tar.gz
2. Update the environment variable needed by rhdfs. The value for the environment??variable will depend upon your hadoop distribution.
HADOOP_CMD – The complete path to the “hadoop” executable
An example of the environment variable is shown below:
export HADOOP_CMD=/usr/bin/hadoop
注:重要! 只需要在使用rhdfs包的節(jié)點(diǎn)上設(shè)置此環(huán)境變量(即本文檔前面所述的Edge節(jié)點(diǎn))。 此外,建議將此環(huán)境變量添加到文件/ etc / profile中,以便所有用戶(hù)都可以使用它。
3. Install rhdfs. At the prompt, type:
R CMD INSTALL rhdfs_1.0.5.tar.gz
?
?
三、安裝 rhbase
1.安裝Apache Thrift
重要! rhbase需要Apache Thrift Server。 如果您尚未配置和安裝thrift,則需要構(gòu)建并安裝Apache Thrift。 參考網(wǎng)站:http://thrift.apache.org/
2.安裝依賴(lài)包
yum -y install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel??python-devel ruby-devel openssl-devel
3.解壓
tar -xzf thrift-0.8.0.tar.gz
4. 5.Build the thrift library。 我們只需要Thrift的C ++接口,所以我們?cè)跊](méi)有ruby或python的情況下構(gòu)建。 在提示符下鍵入以下兩個(gè)命令
cd thrift-0.8.0
./configure --without-ruby --without-python
make
6. Install the thrift library. At the prompt, type:
make install
7. Create a symbolic link to the thrift library so it can be loaded by the rhbase??package. Example of symbolic link:
ln -s /usr/local/lib/libthrift-0.8.0.so /usr/lib
8. Setup the PKG_CONFIG_PATH environment variable. At the prompt, type :
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig
9. Install the rhdfs package. At the prompt, type:
cd ..
R CMD INSTALL rhbase_1.1.tar.gz
?
第三步:測(cè)試以確保包已配置并正常工作
您應(yīng)該執(zhí)行兩組測(cè)試來(lái)驗(yàn)證配置是否正常工作。 第一組測(cè)試將檢查是否可以加載和初始化已安裝的軟件包
1. Invoke R. At the prompt, type:
R
?
2. Load and initialize the rmr2 package, and execute some simple commands
At the R prompt, type the following commands: (Note: the “>” symbol in the following code is the??‘R’ prompt and should not be typed.)
> library(rmr2)
> from.dfs(to.dfs(1:100))
> from.dfs(mapreduce(to.dfs(1:100)))
? If any errors occur check the following:
a. Revolution R Enterprise is installed on each node in the cluster.
b. Check that rmr2, and its dependent packages are installed on each node in the cluster.
c. Make sure that a link to Rscript executable is in the PATH on each node in the Hadoop??cluster.
d. The user that invoked ‘R’ has read and write permissions to HDFS
e. HADOOP_CMD environment variable is set, exported and its value is the complete path of the??“hadoop” executable.
f. HADOOP_STREAMING environment variable is set, exported and its value is the complete path to??the Hadoop Streaming jar file.
g. If you encounter errors like the following (see below), check the ‘stderr’ log file for the??job, and resolve any errors reported. The easiest way to find the log files is to use the??tracking URL (i.e. http://<my_ip_address>:50030/jobdetails.jsp?jobid=job_201208162037_0011)
?
3. Load and initialize the rhdfs package.
At the R prompt, type the following commands: (Note: the “>” symbol in the following??code is the ‘R’ prompt and should not be typed.)
> library(rhdfs)
> hdfs.init()
> hdfs.ls("/")
? If any error occurs check the following:
a. rJava package is installed, configured and loaded.
b. HADOOP_CMD is set and its value is set to the complete path of the “hadoop”??executable, and exported.
?
4. Load and initialize the rhbase package.
At the R prompt, type the following commands: (Note: the “>” symbol in the following??code is the ‘R’ prompt and should not be typed.)
> library(rhbase)
> hb.init()
> hb.list.tables()
? If any error occurs check the following:
a. Thrift Server is running (refer to your Hadoop documentation for more details)
b. The default port for the Thrift Server is 9090. Be sure there is not a port conflict??with other running processes
c. Check to be sure you are not running the Thrift Server in hsha or nonblocking mode. If??necessary use the threadpool command line parameter to start the server
(i.e. /usr/bin/hbase thrift –threadpool start)
?
?
5. Using the standard R mechanism for checking packages, you can verify that your??configuration is working properly.
Go to the directory where the R package source (rmr2, rhdfs, rhbase) exist. Type the??following commands for each package.
Important!: Be aware that running the tests for the rmr2 package may take a significant??time (hours) to complete
R CMD check rmr2_2.0.2.tar.gz
R CMD check rhdfs_1.0.5.tar.gz
R CMD check rhbase_1.1.tar.gz
? If any error occurs, refer to the trouble shooting information in the previous??sections:
? Note: errors referring to missing package pdflatex can be ignored
Error in texi2dvi("Rd2.tex", pdf = (out_ext == "pdf"), quiet = FALSE, :
pdflatex is not available
Error in running tools::texi2dvi
?
總結(jié)
以上是生活随笔為你收集整理的rHadoop分布式安装与配置的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Django博客--3.创作后台开启
- 下一篇: 计算机图画大赛作品六年级,打字能手显本领