日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop编程调用HDFS

發布時間:2025/6/15 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Hadoop编程调用HDFS 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

主要介紹Hadoop家族產品,常用的項目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的項目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。

從2011年開始,中國進入大數據風起云涌的時代,以Hadoop為代表的家族軟件,占據了大數據處理的廣闊地盤。開源界及廠商,所有數據軟件,無一不向Hadoop靠攏。Hadoop也從小眾的高富帥領域,變成了大數據開發的標準。在Hadoop原有技術基礎之上,出現了Hadoop家族產品,通過“大數據”概念不斷創新,推出科技進步。

作為IT界的開發人員,我們也要跟上節奏,抓住機遇,跟著Hadoop一起雄起!

關于作者:

  • 張丹(Conan), 程序員Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog:?http://blog.fens.me
  • email: bsspirit@gmail.com

轉載請注明出處:
http://blog.fens.me/hadoop-hdfs-api/

前言

HDFS 全稱Hadoop分步文件系統(Hadoop Distributed File System),是Hadoop的核心部分之一。要實現MapReduce的分步式算法時,數據必需提前放在HDFS上。因此,對于HDFS的操作就變得非常重要。Hadoop的命令行,提供了一套完整命令接口,就像Linux命令一樣方便使用。

不過,有時候我們還需要在程序中直接訪問HDFS,我們可以通過API的方式進行HDFS操作。

目錄

  • 系統環境
  • ls操作
  • rmr操作
  • mkdir操作
  • copyFromLocal操作
  • cat操作
  • copyToLocal操作
  • 創建一個新文件,并寫入內容
  • 1. 系統環境

    Hadoop集群環境

    • Linux Ubuntu 64bit Server 12.04.2 LTS
    • Java 1.6.0_29
    • Hadoop 1.1.2
      • 如何搭建Hadoop集群環境? 請參考文章:Hadoop歷史版本安裝

        開發環境

        • Win7 64bit
        • Java 1.6.0_45
        • Maven 3
        • Hadoop 1.1.2
        • Eclipse Juno Service Release 2
          • 如何用Maven搭建Win7的Hadoop開發環境? 請參考文章:用Maven構建Hadoop項目

            注:hadoop-core-1.1.2.jar,已重新編譯,已解決了Win遠程調用Hadoop的問題,請參考文章:Hadoop歷史版本安裝

            Hadooop命令行:java FsShell

            ~ hadoop fsUsage: java FsShell[-ls ][-lsr ][-du ][-dus ][-count[-q] ][-mv ][-cp ][-rm [-skipTrash] ][-rmr [-skipTrash] ][-expunge][-put ... ][-copyFromLocal ... ][-moveFromLocal ... ][-get [-ignoreCrc] [-crc] ][-getmerge [addnl]][-cat ][-text ][-copyToLocal [-ignoreCrc] [-crc] ][-moveToLocal [-crc] ][-mkdir ][-setrep [-R] [-w] ][-touchz ][-test -[ezd] ][-stat [format] ][-tail [-f] ][-chmod [-R] PATH...][-chown [-R] [OWNER][:[GROUP]] PATH...][-chgrp [-R] GROUP PATH...][-help [cmd]]

            上面列出了30個命令,我只實現了一部分的HDFS的命令!

            新建文件:HdfsDAO.java,用來調用HDFS的API。

            public class HdfsDAO {//HDFS訪問地址private static final String HDFS = "hdfs://192.168.1.210:9000/";public HdfsDAO(Configuration conf) {this(HDFS, conf);}public HdfsDAO(String hdfs, Configuration conf) {this.hdfsPath = hdfs;this.conf = conf;}//hdfs路徑private String hdfsPath;//Hadoop系統配置private Configuration conf;//啟動函數public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.mkdirs("/tmp/new/two");hdfs.ls("/tmp/new");} //加載Hadoop配置文件public static JobConf config(){JobConf conf = new JobConf(HdfsDAO.class);conf.setJobName("HdfsDAO");conf.addResource("classpath:/hadoop/core-site.xml");conf.addResource("classpath:/hadoop/hdfs-site.xml");conf.addResource("classpath:/hadoop/mapred-site.xml");return conf;}//API實現public void cat(String remoteFile) throws IOException {...}public void mkdirs(String folder) throws IOException {...}... }

            2. ls操作

            說明:查看目錄文件

            對應Hadoop命令:

            ~ hadoop fs -ls / Found 3 items drwxr-xr-x - conan supergroup 0 2013-10-03 05:03 /home drwxr-xr-x - Administrator supergroup 0 2013-10-03 13:49 /tmp drwxr-xr-x - conan supergroup 0 2013-10-03 09:11 /user

            Java程序:

            public void ls(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FileStatus[] list = fs.listStatus(path);System.out.println("ls: " + folder);System.out.println("==========================================================");for (FileStatus f : list) {System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());}System.out.println("==========================================================");fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.ls("/");}

            控制臺輸出:

            ls: / ========================================================== name: hdfs://192.168.1.210:9000/home, folder: true, size: 0 name: hdfs://192.168.1.210:9000/tmp, folder: true, size: 0 name: hdfs://192.168.1.210:9000/user, folder: true, size: 0 ==========================================================

            3. mkdir操作

            說明:創建目錄,可以創建多級目錄

            對應Hadoop命令:

            ~ hadoop fs -mkdir /tmp/new/one ~ hadoop fs -ls /tmp/new Found 1 items drwxr-xr-x - conan supergroup 0 2013-10-03 15:35 /tmp/new/one

            Java程序:

            public void mkdirs(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);if (!fs.exists(path)) {fs.mkdirs(path);System.out.println("Create: " + folder);}fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.mkdirs("/tmp/new/two");hdfs.ls("/tmp/new");}

            控制臺輸出:

            Create: /tmp/new/two ls: /tmp/new ========================================================== name: hdfs://192.168.1.210:9000/tmp/new/one, folder: true, size: 0 name: hdfs://192.168.1.210:9000/tmp/new/two, folder: true, size: 0 ==========================================================

            4. rmr操作

            說明:刪除目錄和文件

            對應Hadoop命令:

            ~ hadoop fs -rmr /tmp/new/one Deleted hdfs://master:9000/tmp/new/one~ hadoop fs -ls /tmp/new Found 1 items drwxr-xr-x - Administrator supergroup 0 2013-10-03 15:38 /tmp/new/two

            Java程序:

            public void rmr(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.deleteOnExit(path);System.out.println("Delete: " + folder);fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.rmr("/tmp/new/two");hdfs.ls("/tmp/new");}

            控制臺輸出:

            Delete: /tmp/new/two ls: /tmp/new ========================================================== ==========================================================

            5. copyFromLocal操作

            說明:復制本地文件系統到HDFS

            對應Hadoop命令:

            ~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /tmp/new/~ hadoop fs -ls /tmp/new/ Found 1 items -rw-r--r-- 1 conan supergroup 210 2013-10-03 16:07 /tmp/new/item.csv

            Java程序:

            public void copyFile(String local, String remote) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyFromLocalFile(new Path(local), new Path(remote));System.out.println("copy from: " + local + " to " + remote);fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.copyFile("datafile/randomData.csv", "/tmp/new");hdfs.ls("/tmp/new");}

            控制臺輸出:

            copy from: datafile/randomData.csv to /tmp/new ls: /tmp/new ========================================================== name: hdfs://192.168.1.210:9000/tmp/new/item.csv, folder: false, size: 210 name: hdfs://192.168.1.210:9000/tmp/new/randomData.csv, folder: false, size: 36655 ==========================================================

            6. cat操作

            說明:查看文件內容

            對應Hadoop命令:

            ~ hadoop fs -cat /tmp/new/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0

            Java程序:

            public void cat(String remoteFile) throws IOException {Path path = new Path(remoteFile);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FSDataInputStream fsdis = null;System.out.println("cat: " + remoteFile);try { fsdis =fs.open(path);IOUtils.copyBytes(fsdis, System.out, 4096, false); } finally { IOUtils.closeStream(fsdis);fs.close();}}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.cat("/tmp/new/item.csv");}

            控制臺輸出:

            cat: /tmp/new/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0

            7. copyToLocal操作

            說明:從HDFS復制文件在本地操作系

            對應Hadoop命令:

            ~ hadoop fs -copyToLocal /tmp/new/item.csv /home/conan/datafiles/tmp/~ ls -l /home/conan/datafiles/tmp/ -rw-rw-r-- 1 conan conan 210 Oct 3 16:16 item.csv

            Java程序:

            public void download(String remote, String local) throws IOException {Path path = new Path(remote);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyToLocalFile(path, new Path(local));System.out.println("download: from" + remote + " to " + local);fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.download("/tmp/new/item.csv", "datafile/download");File f = new File("datafile/download/item.csv");System.out.println(f.getAbsolutePath());}

            控制臺輸出:

            2013-10-12 17:17:32 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable download: from/tmp/new/item.csv to datafile/download D:\workspace\java\myMahout\datafile\download\item.csv

            8. 創建一個新文件,并寫入內容

            說明:創建一個新文件,并寫入內容。

            • touchz:可以用來創建一個新文件,或者修改文件的時間戳。
            • 寫入內容沒有對應命令。

            對應Hadoop命令:

            ~ hadoop fs -touchz /tmp/new/empty~ hadoop fs -ls /tmp/new Found 3 items -rw-r--r-- 1 conan supergroup 0 2013-10-03 16:24 /tmp/new/empty -rw-r--r-- 1 conan supergroup 210 2013-10-03 16:07 /tmp/new/item.csv -rw-r--r-- 3 Administrator supergroup 36655 2013-10-03 16:09 /tmp/new/randomData.csv~ hadoop fs -cat /tmp/new/empty

            Java程序:

            public void createFile(String file, String content) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);byte[] buff = content.getBytes();FSDataOutputStream os = null;try {os = fs.create(new Path(file));os.write(buff, 0, buff.length);System.out.println("Create: " + file);} finally {if (os != null)os.close();}fs.close();}public static void main(String[] args) throws IOException {JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(conf);hdfs.createFile("/tmp/new/text", "Hello world!!");hdfs.cat("/tmp/new/text");}

            控制臺輸出:

            Create: /tmp/new/text cat: /tmp/new/text Hello world!!

            完整的文件:HdfsDAO.java
            https://github.com/bsspirit/maven_mahout_template/blob/mahout-0.8/src/main/java/org/conan/mymahout/hdfs/HdfsDAO.java

            轉載請注明出處:
            http://blog.fens.me/hadoop-hdfs-api/

    總結

    以上是生活随笔為你收集整理的Hadoop编程调用HDFS的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。