hadoop运行wordcount实例,hdfs简单操作
1.查看hadoop版本
[hadoop@ltt1 sbin]$ hadoop version Hadoop 2.6.0-cdh5.12.0 Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd Compiled by jenkins on 2017-06-29T11:33Z Compiled with protoc 2.5.0 From source with checksum 7c45ae7a4592ce5af86bc4598c5b4 This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.0-cdh5.12.0.jar2.通過hadoop自帶的jar文件,可以簡單測試一些功能。
提君博客原創
查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表
[hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar An example program must be given as the first argument. Valid program names are:aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.dbcount: An example job that count the pageview counts from a database.distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.grep: A map/reduce program that counts the matches of a regex in the input.join: A job that effects a join over sorted, equally partitioned datasetsmultifilewc: A job that counts words from several files.pentomino: A map/reduce tile laying program to find solutions to pentomino problems.pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.randomwriter: A map/reduce program that writes 10GB of random data per node.secondarysort: An example defining a secondary sort to the reduce.sort: A map/reduce program that sorts the data written by the random writer.sudoku: A sudoku solver.teragen: Generate data for the terasortterasort: Run the terasortteravalidate: Checking results of terasortwordcount: A map/reduce program that counts the words in the input files.wordmean: A map/reduce program that counts the average length of the words in the input files.wordmedian: A map/reduce program that counts the median length of the words in the input files.wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.3.在hdfs上創建文件夾
hadoop fs -mkdir /input4.查看hdfs的更目錄列表
[hadoop@ltt1 ~]$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /input
drwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp
5.上傳本地文件到hdfs
hadoop fs -put $HADOOP_HOME/*.txt /input6.查看hdfs上input目錄下文件
[hadoop@ltt1 ~]$ hadoop fs -ls /input Found 3 items -rw-r--r-- 2 hadoop supergroup 85063 2017-09-17 08:15 /input/LICENSE.txt -rw-r--r-- 2 hadoop supergroup 14978 2017-09-17 08:15 /input/NOTICE.txt -rw-r--r-- 2 hadoop supergroup 1366 2017-09-17 08:15 /input/README.txt7.wordcount簡單測試。
提君博客原創
[hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount /input /output 17/09/17 08:19:12 INFO input.FileInputFormat: Total input paths to process : 3 17/09/17 08:19:13 INFO mapreduce.JobSubmitter: number of splits:3 17/09/17 08:19:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_0002 17/09/17 08:19:14 INFO impl.YarnClientImpl: Submitted application application_1505605169997_0002 17/09/17 08:19:14 INFO mapreduce.Job: The url to track the job: http://ltt1.bg.cn:9180/proxy/application_1505605169997_0002/ 17/09/17 08:19:14 INFO mapreduce.Job: Running job: job_1505605169997_0002 17/09/17 08:19:27 INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false 17/09/17 08:19:27 INFO mapreduce.Job: map 0% reduce 0% 17/09/17 08:19:39 INFO mapreduce.Job: map 33% reduce 0% 17/09/17 08:19:48 INFO mapreduce.Job: map 100% reduce 0% 17/09/17 08:19:50 INFO mapreduce.Job: map 100% reduce 100% 17/09/17 08:19:50 INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully 17/09/17 08:19:50 INFO mapreduce.Job: Counters: 50>>提君博客原創 ?http://www.cnblogs.com/tijun/ ?<<
File System CountersFILE: Number of bytes read=42705FILE: Number of bytes written=588235FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=101699HDFS: Number of bytes written=30167HDFS: Number of read operations=12HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=3Launched reduce tasks=1Data-local map tasks=2Rack-local map tasks=1Total time spent by all maps in occupied slots (ms)=47617Total time spent by all reduces in occupied slots (ms)=8244Total time spent by all map tasks (ms)=47617Total time spent by all reduce tasks (ms)=8244Total vcore-milliseconds taken by all map tasks=47617Total vcore-milliseconds taken by all reduce tasks=8244Total megabyte-milliseconds taken by all map tasks=48759808Total megabyte-milliseconds taken by all reduce tasks=8441856Map-Reduce FrameworkMap input records=2035Map output records=14239Map output bytes=155828Map output materialized bytes=42717Input split bytes=292Combine input records=14239Combine output records=2653Reduce input groups=2402Reduce shuffle bytes=42717Reduce input records=2653Reduce output records=2402Spilled Records=5306Shuffled Maps =3Failed Shuffles=0Merged Map outputs=3GC time elapsed (ms)=881CPU time spent (ms)=22320Physical memory (bytes) snapshot=690192384Virtual memory (bytes) snapshot=10862809088Total committed heap usage (bytes)=380243968Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=101407File Output Format Counters Bytes Written=30167
8.查看wordcount運行結果(由于結果太長,只舉出了部分結果)
[hadoop@ltt1 ~]$ hadoop fs -cat /output/* worldwide, 4 would 1 writing 2 writing, 4 written 19 xmlenc 1 year 1 you 12 your 5 zlib 1252.227-7014(a)(1)) 1 § 1 “AS 1 “Contributor 1 “Contributor” 1 “Covered 1 “Executable” 1 “Initial 1 “Larger 1 “Licensable” 1 “License” 1 “Modifications” 1 “Original 1 “Participant”) 1 “Patent 1 “Source 1 “Your”) 1 “You” 2 “commercial 3 “control” 1>>提君博客原創 ?http://www.cnblogs.com/tijun/ ?<<
至此,通過一個wordcount的一個小栗子,簡介實踐了一下hdfs的創建文件夾,上傳文件,查看目錄,運行wordcount實例。
提君博客原創
>>提君博客原創 ?http://www.cnblogs.com/tijun/ ?<<
轉載于:https://www.cnblogs.com/tijun/p/7544228.html
總結
以上是生活随笔為你收集整理的hadoop运行wordcount实例,hdfs简单操作的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 初识WCF2
- 下一篇: lvs + keepalive的安装配置