當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop伪分布式配置试用+eclipse配置使用

發布時間：2025/6/15 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hadoop伪分布式配置试用+eclipse配置使用小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

參考文檔：http://hadoop.apache.org/common/docs/current/single_node_setup.html

環境：WinXP+cygwin+hadoop-0.20.2

解壓hadoop到E:\hadoop-0.20.2，配置文件

conf/hadoop-env.sh:

[plain]?view plaincopy

#?The?java?implementation?to?use.??Required.??

export?JAVA_HOME=/cygdrive/e/Java/jdk1.6.0_29??

conf/core-site.xml:

[plain]?view plaincopy

<configuration>??

?????<property>??

?????????<name>fs.default.name</name>??

?????????<value>hdfs://localhost:9000</value>??

?????</property>??

</configuration>??

conf/hdfs-site.xml :

[plain]?view plaincopy

<configuration>??

?????<property>??

?????????<name>dfs.replication</name>??

?????????<value>1</value>??

?????</property>??

</configuration>??

conf/mapred-site.xml :

[plain]?view plaincopy

<configuration>??

?????<property>??

?????????<name>mapred.job.tracker</name>??

?????????<value>localhost:9001</value>??

?????</property>??

</configuration>??

Setup passphraseless ssh

在cygwin控制臺，輸入ssh-host-config，一路yes，提示CYGWIN值時，輸入netsec。

在windows系統服務中啟動Cygwin sshd服務。

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the?${HADOOP_LOG_DIR}?directory (defaults to${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode?-?http://localhost:50070/
JobTracker?-?http://localhost:50030/

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh

按照文檔執行到$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'，會出現錯誤

[java]?view plaincopy

11/11/29?13:01:32?INFO?mapred.JobClient:?Task?Id?:?attempt_201111291300_0001_m_000014_0,?Status?:?FAILED??

java.io.FileNotFoundException:?File?E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201111291300_0001/attempt_201111291300_0001_m_000014_0/work/tmp?does?not?exist.??

????????at?org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)??

????????at?org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)??

????????at?org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)??

????????at?org.apache.hadoop.mapred.Child.main(Child.java:155)??

解決方法：（來自 http://www.hadoopor.com/archiver/tid-1978.html ），在mapred-site.xml中添加

[plain]?view plaincopy

<property>??

????<name>mapred.child.tmp</name>??

????<value>/hadoop-0.20.2/tmp</value>??

</property>??

值使用‘E:\hadoop-0.20.2\tmp’也可以。

-------------------------------

如果最開始在core-site.xml中修改了默認tmp文件夾位置，如

[plain]?view plaincopy

<property>??

????<name>hadoop.tmp.dir</name>??

????<value>/hadoop-0.20.2/tmp</value>??

</property>??

（注：如果這里使用‘E:\hadoop-0.20.2\tmp’會導致JobTracker因為路徑名問題無法打開。）

程序會一直卡在一行死循環，停滯不前……

[plain]?view plaincopy

2011-11-29?13:54:56,515?INFO?org.apache.hadoop.mapred.TaskTracker:?attempt_201111291353_0001_r_000000_0?0.30769235%?reduce?>?copy?(12?of?13?at?0.00?MB/s)?>???

2011-11-29?13:54:59,515?INFO?org.apache.hadoop.mapred.TaskTracker:?attempt_201111291353_0001_r_000000_0?0.30769235%?reduce?>?copy?(12?of?13?at?0.00?MB/s)?>???

2011-11-29?13:55:05,515?INFO?org.apache.hadoop.mapred.TaskTracker:?attempt_201111291353_0001_r_000000_0?0.30769235%?reduce?>?copy?(12?of?13?at?0.00?MB/s)?>???

解決方法：猜測是hadoop.tmp.dir和mapred.child.tmp同文件夾導致的問題，將mapred.child.tmp的值修改為/hadoop-0.20.2/tasktmp，問題得到解決。

eclipse中配置使用Hadoop（轉）

參考鏈接：Hadoop學習全程記錄——在Eclipse中運行第一個MapReduce程序

1.復制 hadoop安裝目錄/contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar 到 eclipse安裝目錄/plugins/ 下。

2.重啟eclipse，配置hadoop installation directory。
如果安裝插件成功，打開Window-->Preferens，你會發現Hadoop Map/Reduce選項，在這個選項里你需要配置Hadoop installation directory。配置完成后退出。

3.配置Map/Reduce Locations。
在Window-->Show View中打開Map/Reduce Locations。
在Map/Reduce Locations中新建一個Hadoop Location。在這個View中，右鍵-->New Hadoop Location。在彈出的對話框中你需要配置Location name，如myubuntu，還有Map/Reduce Master和DFS Master。這里面的Host、Port分別為mapred-site.xml、core-site.xml中配置的地址及端口。

4.新建項目。
File-->New-->Other-->Map/Reduce Project
項目名可以隨便取，如hadoop-test。
復制 hadoop安裝目錄/src/example/org/apache/hadoop/example/WordCount.java到剛才新建的項目下面。

5.上傳模擬數據文件夾。
為了運行程序，我們需要一個輸入的文件夾，和輸出的文件夾。輸出文件夾，在程序運行完成后會自動生成。我們需要給程序一個輸入文件夾。
在當前目錄（如hadoop安裝目錄）下新建文件夾input，并在文件夾下新建兩個文件file01、file02，這兩個文件內容分別如下：

file01：Hello?World?Bye?World?
file02：Hello?Hadoop?Goodbye?Hadoop?

6.運行項目。
a..在新建的項目hadoop-test，點擊WordCount.java，右鍵-->Run As-->Run Configurations
b..在彈出的Run Configurations對話框中，點Java Application，右鍵-->New，這時會新建一個application名為WordCount
c..配置運行參數，點Arguments，在Program arguments中輸入“你要傳給程序的輸入文件夾和你要求程序將計算結果保存的文件夾”，如

[plain]?view plaincopy

hdfs://localhost:9000/user/Administrator/input01?hdfs://localhost:9000/user/Administrator/output01??

7.點擊Run，運行程序。

8.等運行結束后，可以在終端中用命令bin/hadoop?fs?-cat?output01/*? 查看生成的文件內容。

總結

以上是生活随笔為你收集整理的Hadoop伪分布式配置试用+eclipse配置使用的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Hadoop学习全程记录——在Eclip
下一篇： eclipse安装hadoop插件及配置