【翻译】Apache Hbase新特性--MOB支持(一)
2019獨(dú)角獸企業(yè)重金招聘Python工程師標(biāo)準(zhǔn)>>>
原文鏈接:http://blog.cloudera.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/
HBase MOBs特性的設(shè)計(jì)背景
Apache HBase is a distributed, scalable, performant, consistent key value database that can store a variety of binary data types. It excels at storing many relatively small values (<10K), and providing low-latency reads and writes.
However, there is a growing demand for storing documents, images, and other moderate objects (MOBs)? in HBase while maintaining low latency for reads and writes. One such use case is a bank that stores signed and scanned customer documents. As another example, transport agencies may want to store? snapshots of traffic and moving cars. These MOBs are generally write-once.
Apache HBase是一個(gè)分布式、可擴(kuò)展,高性能,一致的鍵值數(shù)據(jù)庫(kù),可以存儲(chǔ)多種多樣的二進(jìn)制數(shù)據(jù)。存儲(chǔ)小文件(小于10K)十分出色,讀寫(xiě)延遲低。
隨之而來(lái),對(duì)文檔、圖片和其他中等大小文件的存儲(chǔ)需求日益增長(zhǎng),并且要保持讀寫(xiě)低延遲。一個(gè)典型的場(chǎng)景就是銀行存儲(chǔ)客戶的簽字或掃描的文檔。另一個(gè)典型的場(chǎng)景,交通部門(mén)保存路況或過(guò)車快照。中等大小文件通常寫(xiě)入一次。
Unfortunately, performance can degrade in situations where many moderately sized values (100K to 10MB) are stored due to the ever-increasing? I/O pressure created by compactions. Consider the case where 1TB of photos from traffic cameras, each 1MB in size, are stored into HBase daily. Parts of the stored files are compacted multiple times via minor compactions and eventually, data is rewritten by major compactions. Along with accumulation of these MOBs, I/O created by compactions will slow down the compactions, further block memstore flushing, and eventually block updates. A big MOB store will trigger frequent region splits, reducing the availability of the affected regions.
In order to address these drawbacks, Cloudera and Intel engineers have implemented MOB support in an HBase branch (hbase-11339: HBase MOB). This branch will be merged to the master in HBase 1.1 or 1.2, and is already present and supported in CDH 5.4.x, as well.?
不幸的是,存儲(chǔ)文件大小在100k到10M之間時(shí),由于壓縮導(dǎo)致的持續(xù)增長(zhǎng)的讀寫(xiě)壓力,會(huì)導(dǎo)致性能下降。想象一下這樣的場(chǎng)景,交通攝像頭每天產(chǎn)生1TB的照片存到Hbase里,每個(gè)文件1MB。一部分文件被多次壓縮以達(dá)到最小化。數(shù)據(jù)因?yàn)閴嚎s被重復(fù)寫(xiě)入。隨著中等大小文件數(shù)量的積累,壓縮產(chǎn)生的讀寫(xiě)會(huì)使壓縮變慢,進(jìn)一步阻塞memstore刷新,最終阻止更新。大量的MOB存儲(chǔ)會(huì)觸發(fā)頻繁的region分割,相應(yīng)region的可用性下降。
為了解決這個(gè)問(wèn)題,Cloudera和Intel的工程師在Hbase的分支實(shí)現(xiàn)了對(duì)MOB的支持。 (hbase-11339: HBase MOB)。(譯者注:這個(gè)特性并沒(méi)有出現(xiàn)在1.1和1.2版本,而是被合入的2.0.0版本)。你可以在CDH 5.4.x中獲取。
Operations on MOBs are usually write-intensive, with rare updates or deletes and relatively infrequent reads. MOBs are usually stored together with their metadata. Metadata relating to MOBs may include, for instance, car number, speed, and color. Metadata are very small relative to the MOBs. Metadata are usually accessed for analysis, while MOBs are usually randomly accessed only when they are explicitly requested with row keys.
Users want to read and write the MOBs in HBase with low latency in the same APIs, and want strong consistency, security, snapshot and HBase replication between clusters, and so on. To meet these goals, MOBs were moved out of the main I/O path of HBase and into a new I/O path.
In this post, you will learn about this design approach, and why it was selected.
對(duì)MOB的操作通常集中在寫(xiě)入,很少更新或刪除,讀取不頻繁。MOB通常跟元數(shù)據(jù)一起被存儲(chǔ)。元數(shù)據(jù)相對(duì)MOB很小,通常用來(lái)統(tǒng)計(jì)分析,而MOB一般通過(guò)明確的row key來(lái)獲取。
用戶希望在Hbase中用相同的API來(lái)讀寫(xiě)MOB文件,并且集群之間保持低延遲,強(qiáng)一致、安全、快照和Hbase副本等特性。要達(dá)到這一目標(biāo),必須將MOB從 HBase主要的讀寫(xiě)目錄移到新的讀寫(xiě)目錄。
可行方案分析
There were a few possible approaches to this problem. The first approach we considered was to store MOBs in HBase with a tuned split and compaction policies—a bigger desired MaxFileSize decreases the frequency of region split, and fewer or no compactions can avoid the write amplification penalty. That approach would improve write latency and throughput considerably. However, along with the increasing number of stored files, there would be too many opened readers in a single store, even more than what is allowed by the OS. As a result, a lot of memory would be consumed and read performance would degrade.
解決這個(gè)問(wèn)題有潛在的方法。第一種,優(yōu)化分割(split)和壓縮策略——一個(gè)更大的MaxFileSize來(lái)降低region分割頻率,減少或者不壓縮來(lái)避免寫(xiě)入惡化。這樣會(huì)改善寫(xiě)入延遲,吞吐量好得多。但是,隨著文件數(shù)量的增長(zhǎng),一次存儲(chǔ)會(huì)打開(kāi)非常多的reader,甚至超過(guò)操作系統(tǒng)的限制。結(jié)果就是內(nèi)存被耗光,性能下降。
Another approach was to use an HBase + HDFS model to store the metadata and MOBs separately. In this model, a single file is linked by an entry in HBase. This is a client solution, and the transaction is controlled by the client—no HBase-side memories are consumed by MOBs. This approach would work for objects larger than 50MB, but for MOBs, many small files lead to inefficient HDFS usage since the default block size in HDFS is 128MB.
For example, let’s say a NameNode has 48GB of memory and each file is 100KB with three replicas. Each file takes more than 300 bytes in memory, so a NameNode with 48GB memory can hold about 160 million files, which would limit us to only storing 16TB MOB files in total.
另外一種方式可以采用HBase+HDFS的方式來(lái)分開(kāi)存儲(chǔ)元數(shù)據(jù)和MOB文件。一個(gè)文件對(duì)應(yīng)一個(gè)Hbase入口。這是客戶端的解決方案,事務(wù)在客戶端控制。MOB不會(huì)消耗Hbase的內(nèi)存。存儲(chǔ)的對(duì)象可以超過(guò)50MB。但是,大量的小文件使HDFS利用率不高,因?yàn)槟J(rèn)的塊大小是128M。
舉個(gè)例子,NameNode有48G內(nèi)存,每個(gè)文件100KB,3個(gè)副本。每個(gè)文件在內(nèi)存中占用300字節(jié),48G內(nèi)存可以存大約1.6億文件,限制了存儲(chǔ)的總文件大小僅僅16T。
As an improvement, we could have assembled the small MOB files into bigger ones—that is, a file could have multiple MOB entries–and store the offset and length in the HBase table for fast reading. However, maintaining data consistency and managing deleted MOBs and small MOB files in compactions are difficult. Furthermore, if we were to use this approach, we’d have to consider new security policies, lose atomicity properties of writes, and potentially lose the backup and disaster recovery provided by replication and snapshots.
我們可以許多小的MOB合成一個(gè)大文件,一個(gè)文件有多個(gè)MOB入口,通過(guò)存儲(chǔ)偏移量(offset)和長(zhǎng)度來(lái)加快讀取。不過(guò)維護(hù)數(shù)據(jù)一致性,管理刪除的文件和壓縮后的小文件十分困難。而且,我們還需要考慮安全策略,失去寫(xiě)數(shù)據(jù)的原子性,可能會(huì)丟失由復(fù)制和快照提供的備份和災(zāi)難恢復(fù)。
HBase MOB 架構(gòu)設(shè)計(jì)
In the end, because most of the concerns around storing MOBs in HBase involve the I/O created by compactions, the key was to move MOBs out of management by normal regions to avoid region splits and compactions there.
The HBase MOB design is similar to the HBase + HDFS approach because we store the metadata and MOBs separately. However, the difference lies in a server-side design: memstore caches the MOBs before they are flushed to disk, the MOBs are written into a HFile called “MOB file” in each flush, and each MOB file has multiple entries instead of single file in HDFS for each MOB. This MOB file is stored in a special region. All the read and write can be used by the current HBase APIs.
最后,由于大部分擔(dān)心來(lái)自于壓縮帶來(lái)的IO,最關(guān)鍵的是將MOB移出正常region的管理來(lái)避免region分割和壓縮。
HBase MOB設(shè)計(jì)類似于Hbase+HDFS的方式,將元數(shù)據(jù)和MOB分開(kāi)存。不同的是服務(wù)端的設(shè)計(jì)。中等大小文件在被刷到磁盤(pán)前緩存在memstore里,每次刷新,中等大小文件被寫(xiě)入特殊的HFile文件—“MOB File”。每個(gè)中等文件有多個(gè)MOB入口,而不像HDFS只有一個(gè)入口。MOB file被放在特殊的region。讀寫(xiě)都通過(guò)現(xiàn)有的Hbase API。
?
未完,見(jiàn)下一篇:https://my.oschina.net/u/234661/blog/1553060
轉(zhuǎn)載于:https://my.oschina.net/u/234661/blog/1553005
總結(jié)
以上是生活随笔為你收集整理的【翻译】Apache Hbase新特性--MOB支持(一)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: iOS 延迟1.5s 执行方法
- 下一篇: ssm知识点总结