本地运行hadoop
生活随笔
收集整理的這篇文章主要介紹了
本地运行hadoop
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
本地開發(fā)使用windows Intellij,沒有搭建hadoop,搭建教程使用:這個
本地使用pom.xml添加依賴:
編寫MapReduce:
public class ETLApp {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {System.setproperty("hadoop.home.dir","C:/Users/user/Desktop/tools/winutils")Configuration configuration = new Configuration();FileSystem fileSystem = FileSystem.get(configuration);Path outputPath = new Path("projectData/input/etl");if(fileSystem.exists(outputPath)) {fileSystem.delete(outputPath, true);}Job job = Job.getInstance(configuration);job.setJarByClass(ETLApp.class);job.setMapperClass(ETLApp.MyMapper.class);job.setMapOutputKeyClass(NullWritable.class);job.setMapOutputValueClass(Text.class);FileInputFormat.setInputPaths(job, new Path("projectData/raw/trackinfo_20130721.data"));FileOutputFormat.setOutputPath(job, new Path("projectData/input/etl"));job.waitForCompletion(true);}static class MyMapper extends Mapper<LongWritable, Text, NullWritable, Text> {@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String log = value.toString();Map<String, String> info = logParser.parse(log);String ip = info.get("ip");String country = info.get("country");String province = info.get("province");String city = info.get("city");String url = info.get("url");String time = info.get("time");String pageId = ContentUtils.getPageId(url);StringBuilder builder = new StringBuilder();builder.append(ip).append("\t");builder.append(country).append("\t");builder.append(province).append("\t");builder.append(city).append("\t");builder.append(url).append("\t");builder.append(time).append("\t");builder.append(pageId).append("\t");context.write(NullWritable.get(), new Text(builder.toString()));}}}總結
以上是生活随笔為你收集整理的本地运行hadoop的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 报告|《国产服务器操作系统发展报告(20
- 下一篇: NASA 探测器发现木星表面出现“怪脸”