當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop SequenceFile

發(fā)布時(shí)間：2025/3/11 编程问答 47 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hadoop SequenceFile 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

apache原文：http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.html

概念：

SequenceFile是一個(gè)由二進(jìn)制序列化過(guò)的key/value的字節(jié)流組成的文本存儲(chǔ)文件，它可以在map/reduce過(guò)程中的input/output 的format時(shí)被使用。

在map/reduce過(guò)程中，map處理文件的臨時(shí)輸出就是使用SequenceFile處理過(guò)的。所以一般的SequenceFile均是在FileSystem中生成，供map調(diào)用的原始文件。

在存儲(chǔ)結(jié)構(gòu)上，SequenceFile主要由一個(gè)Header后跟多條Record組成。

Header主要包含了Key classname，Value classname，存儲(chǔ)壓縮算法，用戶(hù)自定義元數(shù)據(jù)等信息，此外，還包含了一些同步標(biāo)識(shí)，用于快速定位到記錄的邊界。

每條Record以鍵值對(duì)的方式進(jìn)行存儲(chǔ)，用來(lái)表示它的字符數(shù)組可依次解析成：記錄的長(zhǎng)度、Key的長(zhǎng)度、Key值和Value值，并且Value值的結(jié)構(gòu)取決于該記錄是否被壓縮。

數(shù)據(jù)壓縮有利于節(jié)省磁盤(pán)空間和加快網(wǎng)絡(luò)傳輸，SeqeunceFile支持兩種格式的數(shù)據(jù)壓縮，分別是：record compression和block compression。

record compression如上圖所示，是對(duì)每條記錄的value進(jìn)行壓縮。

block compression是將一連串的record組織到一起，統(tǒng)一壓縮成一個(gè)block，如上圖。

block信息主要存儲(chǔ)了：塊所包含的記錄數(shù)、每條記錄Key長(zhǎng)度的集合、每條記錄Key值的集合、每條記錄Value長(zhǎng)度的集合和每條記錄Value值的集合
注：每個(gè)block的大小是可通過(guò)io.seqfile.compress.blocksize屬性來(lái)指定的。

讀寫(xiě)實(shí)例代碼：

分舊API和新API

package filedemo;import java.net.URI;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.SequenceFile.Reader; import org.apache.hadoop.io.SequenceFile.Writer; import org.apache.hadoop.io.Text; import org.junit.Test;public class SequenceFileDemo2 {@Testpublic void writerOldApi() throws Exception {// String uri = "file:///D://B.txt"; //本地windowsString uri = "hdfs://hello110:9000/testdata/oldApi.seq";Configuration conf = new Configuration();FileSystem fs = FileSystem.get(new URI(uri), conf, "hadoop");Path path = new Path(uri);IntWritable key = new IntWritable();Text value = new Text();Writer writer = null;try {// 方法一 writer = SequenceFile.createWriter(fs, conf, path,// key.getClass(), value.getClass());該方法已過(guò)時(shí)writer = new Writer(fs, conf, path, key.getClass(), value.getClass());for (int i = 0; i < 100; i++) {key.set(i);value.set("now the number is :" + i);System.out.printf("[%s]\t[%s]\t%s\t%s\n", "寫(xiě)入",writer.getLength(), key, value);writer.append(key, value);}} finally {IOUtils.closeStream(writer);}}@Testpublic void readerOldApi() throws Exception {// String uri = "file:///D://B.txt"; //本地windowsString uri = "hdfs://hello110:9000/testdata/oldApi.seq";Configuration conf = new Configuration();FileSystem fs = FileSystem.get(new URI(uri), conf, "hadoop");Path path = new Path(uri);IntWritable key = new IntWritable();Text value=new Text();Reader reader =null;try {reader=new Reader(fs, path, conf);while(reader.next(key, value)){System.out.printf("[%s]\t%s\t%s\n", "讀取", "key", key);System.out.printf("[%s]\t%s\t%s\n", "讀取", "value", value);}} finally {IOUtils.closeStream(reader);}}@Testpublic void writerNewApi() throws Exception {/** windows環(huán)境使用hdfs://的時(shí)候，如果沒(méi)有使用FileSystem指定用戶(hù)，* 那么會(huì)以當(dāng)前windows用戶(hù)去訪問(wèn)，如果當(dāng)前windows用戶(hù)名和Linux的不同，則會(huì)報(bào)錯(cuò)：* Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=xxxx* 故而本demo用了 file://* 解決方法：* 1、修改windows當(dāng)前用戶(hù)名為相應(yīng)的linux-hadoop的用戶(hù)。win10系統(tǒng)較難修改用戶(hù)名。* 2、在linux上增加當(dāng)前windows用戶(hù)為hadoop的用戶(hù)*///String uri = "hdfs://hello110:9000/testdata/newApi.seq"; String uri = "file:///D://newApi.seq";Configuration conf = new Configuration();Path path = new Path(uri);IntWritable key = new IntWritable();Text value = new Text();Writer writer = null;try {writer = SequenceFile.createWriter(conf,Writer.file(path), Writer.keyClass(key.getClass()),Writer.valueClass(value.getClass()));for (int i = 0; i < 100; i++) {key.set(i);value.set("now the new number is :" + i);System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);writer.append(key, value);}} finally {IOUtils.closeStream(writer);}}@Testpublic void readerNewApi() throws Exception {String uri = "file:///D://newApi.seq";Configuration conf = new Configuration();Path path = new Path(uri);IntWritable key = new IntWritable();Text value = new Text();Reader reader=null;try {reader=new Reader(conf, Reader.file(path));while(reader.next(key, value)){System.out.printf("[%s]\t%s\t%s\n", "讀取", "key", key);System.out.printf("[%s]\t%s\t%s\n", "讀取", "value", value);}} finally {IOUtils.closeStream(reader);}}}

總結(jié)

以上是生活随笔為你收集整理的Hadoop SequenceFile的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： jQuery Mobile弹出对话框后不
下一篇：同步关键词lock