日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop 2.0.0-alpha尝鲜安装和hello world

發布時間:2023/12/4 编程问答 42 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Hadoop 2.0.0-alpha尝鲜安装和hello world 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
僅供測試學習的文章,不推薦在生產環境使用2.0,因為2.0采用YARN,hive,hbase,mahout等需要map/reduceV1的可能無法使用hadoop 2.0或者會出現意外情況。
5月23日,apache發布了hadoop 2.0的測試版。正好跟家呆著沒事干,小小的體會了一下map/reduce V2。
環境,virtual box虛擬機ubuntu server 12.04,openjdk-7。
簡單介紹一下,2.0.0是從hadoop 0.23.x發展出來的。取消了jobtracker和tasktracker,或者說,是把這兩個封裝到了container里面。使用YARN替代了原來的map/reduce。
YARN號稱是第二代map/reduce,速度比一代更快,且支持集群服務器數量更大。hadoop 0.20.x和由其發展過來的1.0.x支持集群數量建議在3000臺左右,最大支持到4000臺。而hadoop 2.0和YARN宣稱支持6000-10000臺,CPU核心數支持200000顆。從集群數量和運算能力上說,似乎還是提高了不少的。并且加入了namenode的HA,也就是高可用。我說似乎,因為沒有在實際生產環境測試速度。而namenode的HA,因為是虛擬機測試,也就沒有測試。只是簡單的看了一下。
2.0的文件結構相比1.0有所變化,更加清晰明了了。可執行文件在bin/下,server啟動放到了sbin/下,map/red,streaming,pipes的jar包放到了share/下。很容易找到。
安裝包解壓縮后,先進入etc/hadoop/目錄下,按照單機版方式配置幾個配置文件。有core-site.xml,hdfs-site.xml,但是沒有了mapred-site.xml,取而代之的是yarn-site.xml
假設已經按照單機配置配好了,那么進入$HADOOP_HOME/bin/目錄下執行如下
./hadoop namenode -format
#先格式化
cd ../sbin/
#進入sbin目錄,這里放的都是server啟動腳本
./hadoop-daemon.sh start namenode
./hadoop-daemon.sh start datanode
./hadoop-daemon.sh start secondarynamenode

#備份服起不起都無所謂,不影響使用,不過可以用來試試HA功能
#下面較重要,2.0取消了jobtracker和tasktracker,以YARN來代替,所以如果運行start jobtracker一類的,會報錯。
#且hadoop,hdfs,map/reduce功能都分離出了單獨腳本,所以不能用hadoop-daemon.sh啟動所有了。
./yarn-daemon.sh start resourcemanager
#這個就相當于原來的jobtracker,用作運算資源分配的進程,跟namenode可放在一起。
./yarn-daemon.sh start nodemanager
#這個相當于原來的tasktracker,每臺datanode或者叫slave的服務器上都要啟動。
ps aux一下,如果看到4個java進程,就算啟動成功了,訪問http://localhost:50070看看hdfs情況。且由于取消了jobtracker,所以也就沒有50030端口來查看任務情況了,這個以后再說吧。
然后來試試編寫第一個map/reduce V2的程序。其實從程序的編寫方式來說跟V1沒有任何區別,只是最后調用方式變化了一下。hadoop 2.0為了保證兼容性,用戶接口方面對于用戶來說,還是跟原來是一樣的。
這樣一段數據
20120503 ? ? ? ?04 ? ? ?2012-05-03 04:49:22 ? ? ? ? ? ? ? ? ? ? 222.139.35.72 ? Log_ASF ProductVer="5.12.0425.2111"20120503 ? ? ? ?04 ? ? ?2012-05-03 04:49:21 ? ? ? ? ? ? ? ? ? ? 113.232.38.239 ?Log_ASF ProductVer="5.09.0119.1112"
假設就2條不一樣的吧,一共20條。
還是用python來寫map/red腳本
#!/usr/bin/python
#-*- encoding:UTF-8 -*-
#map.py
import sys

debug = True
if debug:
????????????????lzo = 0
else:
????????????????lzo = 1

count='0'
for line in sys.stdin:
????????????????try:
????????????????????????????????flags = line[:-1].split('\t')
????????????????????????????????if len(flags) == 0:
????????????????????????????????????????????????break
????????????????????????????????if len(flags) != 5+lzo:
????????????????????????????????????????????????continue

????????????????????????????????stat_date = flags[2+lzo].split(' ')[0]
????????????????????????????????version = flags[5+lzo].split('"')[1]

????????????????????????????????str = stat_date+','+version+'\t'+count
????????????????????????????????print str

????????????????except Exception,e:
????????????????????????????????print e
------------------------------------------------------------------

#!/usr/bin/python
#-*- encoding:UTF-8 -*-
#reduce.py
import sys

import string

res = {}
#聲明字典

for line in sys.stdin:
????????????????try:
????????????????????????????????flags = line[:-1].split('\t')
????????????????????????????????if len(flags) != 2:
????????????????????????????????????????????????continue
????????????????????????????????field_key = flags[0]
????????????????????????????????if res.has_key(field_key) == False:
????????????????????????????????????????????????res[field_key] = 0
????????????????????????????????res[field_key] += 1
????????????????except Exception,e:
????????????????????????????????pass

for key in res.keys():
????????????????print key+','+'%s' % (res[key])
然后把范例數據復制到hdfs上面用

./hadoop fs -mkdir /tmp
./hadoop fs -copyFromLocal /root/asf /tmp/asf
測試一下,還跟以前hadoop一樣。不過兩種streaming的方式都可以
./hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.0.0-alpha.jar -mapper /opt/hadoop/mrs/map.py -reducer /opt/hadoop/mrs/red.py -input /tmp/asf -output /asf

或者

./yarn jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.0.0-alpha.jar -mapper /opt/hadoop/mrs/map.py -reducer /opt/hadoop/mrs/red.py -input /tmp/asf -output /asf
然后
./hadoop fs -cat /asf/part-00000文件
2012-05-03,5.09.0119.1112,22012-05-03,5.12.0425.2111,18
結果正確。
附map/reduce V2執行日志:
root@localhost:/opt/hadoop/bin# ./yarn jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.0.0-alpha.jar -mapper /opt/hadoop/mrs/map.py -reducer /opt/hadoop/mrs/red.py -input /tmp/asf -output /asf
12/06/01 23:26:40 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
12/06/01 23:26:41 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
12/06/01 23:26:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/06/01 23:26:41 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
12/06/01 23:26:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/06/01 23:26:42 WARN snappy.LoadSnappy: Snappy native library not loaded
12/06/01 23:26:42 INFO mapred.FileInputFormat: Total input paths to process : 1
12/06/01 23:26:42 INFO mapreduce.JobSubmitter: number of splits:1
12/06/01 23:26:42 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
12/06/01 23:26:42 WARN conf.Configuration: mapred.create.symlink is deprecated. Instead, use mapreduce.job.cache.symlink.create
12/06/01 23:26:42 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
12/06/01 23:26:42 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
12/06/01 23:26:42 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
12/06/01 23:26:42 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/06/01 23:26:42 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
12/06/01 23:26:42 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
12/06/01 23:26:42 WARN mapred.LocalDistributedCacheManager: LocalJobRunner does not support symlinking into current working dir.
12/06/01 23:26:42 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
12/06/01 23:26:42 INFO mapreduce.Job: Running job: job_local_0001
12/06/01 23:26:42 INFO mapred.LocalJobRunner: OutputCommitter set in config null
12/06/01 23:26:42 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
12/06/01 23:26:42 INFO mapred.LocalJobRunner: Waiting for map tasks
12/06/01 23:26:42 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000000_0
12/06/01 23:26:42 INFO mapred.Task:????Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52b5ef94
12/06/01 23:26:42 INFO mapred.MapTask: numReduceTasks: 1
12/06/01 23:26:42 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/06/01 23:26:42 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/06/01 23:26:42 INFO mapred.MapTask: soft limit at 83886080
12/06/01 23:26:42 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/06/01 23:26:42 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/06/01 23:26:42 INFO streaming.PipeMapRed: PipeMapRed exec [/opt/hadoop/mrs/map.py]
12/06/01 23:26:42 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
12/06/01 23:26:42 WARN conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
12/06/01 23:26:42 WARN conf.Configuration: map.input.start is deprecated. Instead, use mapreduce.map.input.start
12/06/01 23:26:42 WARN conf.Configuration: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
12/06/01 23:26:42 WARN conf.Configuration: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
12/06/01 23:26:42 WARN conf.Configuration: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
12/06/01 23:26:42 WARN conf.Configuration: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
12/06/01 23:26:42 WARN conf.Configuration: map.input.length is deprecated. Instead, use mapreduce.map.input.length
12/06/01 23:26:42 WARN conf.Configuration: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
12/06/01 23:26:42 WARN conf.Configuration: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
12/06/01 23:26:42 WARN conf.Configuration: map.input.file is deprecated. Instead, use mapreduce.map.input.file
12/06/01 23:26:42 WARN conf.Configuration: mapred.job.id is deprecated. Instead, use mapreduce.job.id
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: MRErrorThread done
12/06/01 23:26:43 INFO streaming.PipeMapRed: Records R/W=20/1
12/06/01 23:26:43 INFO streaming.PipeMapRed: mapRedFinished
12/06/01 23:26:43 INFO mapred.LocalJobRunner:
12/06/01 23:26:43 INFO mapred.MapTask: Starting flush of map output
12/06/01 23:26:43 INFO mapred.MapTask: Spilling map output
12/06/01 23:26:43 INFO mapred.MapTask: bufstart = 0; bufend = 560; bufvoid = 104857600
12/06/01 23:26:43 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214320(104857280); length = 77/6553600
12/06/01 23:26:43 INFO mapred.MapTask: Finished spill 0
12/06/01 23:26:43 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of committing
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Records R/W=20/1
12/06/01 23:26:43 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000000_0
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Map task executor complete.
12/06/01 23:26:43 INFO mapred.Task:????Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@25d71236
12/06/01 23:26:43 INFO mapred.Merger: Merging 1 sorted segments
12/06/01 23:26:43 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 574 bytes
12/06/01 23:26:43 INFO mapred.LocalJobRunner:
12/06/01 23:26:43 INFO streaming.PipeMapRed: PipeMapRed exec [/opt/hadoop/mrs/red.py]
12/06/01 23:26:43 WARN conf.Configuration: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
12/06/01 23:26:43 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/06/01 23:26:43 INFO streaming.PipeMapRed: Records R/W=20/1
12/06/01 23:26:43 INFO streaming.PipeMapRed: MRErrorThread done
12/06/01 23:26:43 INFO streaming.PipeMapRed: mapRedFinished
12/06/01 23:26:43 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing
12/06/01 23:26:43 INFO mapred.LocalJobRunner:
12/06/01 23:26:43 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/06/01 23:26:43 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/asf/_temporary/0/task_local_0001_r_000000
12/06/01 23:26:43 INFO mapred.LocalJobRunner: Records R/W=20/1 > reduce
12/06/01 23:26:43 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/06/01 23:26:43 INFO mapreduce.Job: Job job_local_0001 running in uber mode : false
12/06/01 23:26:43 INFO mapreduce.Job:????map 100% reduce 100%
12/06/01 23:26:43 INFO mapreduce.Job: Job job_local_0001 completed successfully
12/06/01 23:26:43 INFO mapreduce.Job: Counters: 32
????????????????File System Counters
????????????????????????????????FILE: Number of bytes read=205938
????????????????????????????????FILE: Number of bytes written=452840
????????????????????????????????FILE: Number of read operations=0
????????????????????????????????FILE: Number of large read operations=0
????????????????????????????????FILE: Number of write operations=0
????????????????????????????????HDFS: Number of bytes read=252230
????????????????????????????????HDFS: Number of bytes written=59
????????????????????????????????HDFS: Number of read operations=13
????????????????????????????????HDFS: Number of large read operations=0
????????????????????????????????HDFS: Number of write operations=4
????????????????Map-Reduce Framework
????????????????????????????????Map input records=20
????????????????????????????????Map output records=20
????????????????????????????????Map output bytes=560
????????????????????????????????Map output materialized bytes=606
????????????????????????????????Input split bytes=81
????????????????????????????????Combine input records=0
????????????????????????????????Combine output records=0
????????????????????????????????Reduce input groups=2
????????????????????????????????Reduce shuffle bytes=0
????????????????????????????????Reduce input records=20
????????????????????????????????Reduce output records=2
????????????????????????????????Spilled Records=40
????????????????????????????????Shuffled Maps =0
????????????????????????????????Failed Shuffles=0
????????????????????????????????Merged Map outputs=0
????????????????????????????????GC time elapsed (ms)=12
????????????????????????????????CPU time spent (ms)=0
????????????????????????????????Physical memory (bytes) snapshot=0
????????????????????????????????Virtual memory (bytes) snapshot=0
????????????????????????????????Total committed heap usage (bytes)=396361728
????????????????File Input Format Counters
????????????????????????????????Bytes Read=126115
????????????????File Output Format Counters
????????????????????????????????Bytes Written=59
12/06/01 23:26:43 INFO streaming.StreamJob: Output directory: /asf
當然map/reduce V2的功能還不止這些,還需要深入的研究一下。因為2.0雖然是0.23發展過來,但是跟0.23還有些不同,比如0.23中有ApplicationManager,2.0里好像沒有在外面露出來了。也許也封裝到container里面了。另外,那些xml的配置選項好像跟0.20.x也有很大不同了,具體還沒細看。HA功能是支持多個namenode,且多個namenode分管不同的datanode。可以支持手工從某臺namenode切換到另外一臺namenode。這樣做到高可用,據說未來會支持自動檢測切換。

總結

以上是生活随笔為你收集整理的Hadoop 2.0.0-alpha尝鲜安装和hello world的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。