當前位置：首頁 >

hbase优化实践

發布時間：2025/3/18 45 豆豆

生活随笔收集整理的這篇文章主要介紹了 hbase优化实践小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

hbase優化

一：垃圾回收優化：

region服務器處理過大的負載，內存分配策略無法安全地只依賴JRE對程序的行為的各種假設，需要使用JRE提供的選項調整垃圾回收策略應對。

寫入磁盤的數據客戶端不連續，導致Java虛擬機堆內存出現空洞。

年輕代空間：128~512M之間老生代：好幾G。

配置文件添加：

hbase-env.sh：

HBASEOPTS或者HBASEREGIONSERVER_OPT(推薦) 推薦配置：

exportHBASE_REGIONOBSERVER_OPTS="

-Xmx8g? \

-Xms8g? \

-Xmn128m\

-XX:+UseParNewGC\

-XX:+UseConcMarkSweepGC \

-XX:CMSInitiatingOccupancyFraction=70? \

-verbose:gc \

-XX:+PrintGCDetails\

-XX:+PrintGCTimeStamps? \

-Xloggc:$HBASE_HOME/logs/gc-${HOSTNAME}-hbase.log"

參照：

http://blog.csdn.net/kthq/article/details/8618052

http://swcdxd.iteye.com/blog/1859858

二：hbase壓縮

可用編碼器：GZIP/LZO/Snappy

Snappy性能稍好，多使用Snappy

hbase啟動檢查壓縮：

hbase.regionserver.codecs

snappy,lzo

啟用壓縮：

hbase> create 'test2', { NAME => 'cf2', COMPRESSION => 'SNAPPY' }

hbase> describe 'test'

DESCRIPTION? ENABLED

'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE? false

', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',

VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS

=> '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'fa

lse', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

1 row(s) in 0.1070 seconds

或者：hbase> disable 'test'

hbase> alter 'test', {NAME => 'cf', COMPRESSION => 'GZ'}

hbase> enable 'test'

三：優化拆分與合并

3.1管理拆分

hbase可能出現‘拆分/合并風暴’

關閉自動管理拆分，啟用手動

To disable automatic splitting, set hbase.hregion.max.filesize to a very large value,

such as 100 GB It is not recommended to set it to its absolute maximum value of Long.MAX_VALUE.

3.2 region熱點問題

/rowkey的設計一：salting前綴設計/

byte prefix = (byte) (Long.hashCode(System.currentTimeMillis()) % 8);

byte[] rowkey1 = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(System.currentTimeMillis()));

/rowkey的設計二：字段交換，提升權重/

value + System.currentTimeMillis();

/rowkey的設計三：隨機化/

MessageDigest md = MessageDigest.getInstance("MD5");

byte[] rowkey3 = md.digest(Bytes.toBytes(System.currentTimeMillis()));

/rowkey的設計四：時間順序/

long rowkey4 = Long.MAX_VALUE - System.currentTimeMillis();

還可以使用API中move（）region移動到另一個regionserver；或者UNassign移除受影響的表的region

3.3預拆分region

創建表指定需要的region數目

hbase>create 't1','f',SPLITS => ['10','20',30']

hbase>create 't14','f',SPLITS_FILE=>'splits.txt'

# create table with four regions based on random bytes keys

hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }

# create table with five regions based on hex keys

hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }

參考：http://hbase.apache.org/book.html#compression

四：負載均衡：

Use the shell to disable the balancer:

hbase(main):001:0> balance_switch false

true

0 row(s) in 0.3590 seconds

This turns the balancer OFF. To reenable, do:

hbase(main):001:0> balance_switch true

false

0 row(s) in 0.3590 seconds

五：合并region：

某些特出情況下，用戶需要合并region（刪除了大量數據）

$ bin/hbase org.apache.hadoop.hbase.util.Merge

（If you feel you have too many regions and want to consolidate them, Merge is the utility you need.

Merge must run be done when the cluster is down）

六：客戶端api優化：

6.1禁止自動刷寫

有大量的寫入操作

When performing a lot of Puts, make sure that setAutoFlush is set to false on your Table instance.

Otherwise, the Puts will be sent one at a time to the RegionServer.

Puts added via table.add(Put) and table.add( Put) wind up in the same write buffer.

If autoFlush = false, these messages are not sent until the write-buffer is filled.

To explicitly flush the messages, call flushCommits.

Calling close on the Table instance will invoke flushCommits.

6.2使用掃描緩存

比如：hbase作為mapreduce輸入源。

設置setCaching比默認值大多的值。

If HBase is used as an input source for a MapReduce job,

for example, make sure that the input Scan instance to the MapReduce job has setCaching set to something greater than the default (which is 1).

Using the default value means that the map-task will make call back to the region-server for every record processed.

Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed

6.3限定掃描范圍

6.4關閉resultScanner

七：配置優化；

7.1減少zookeeper超時

zookeeper.session.timeout

默認三分鐘

7.2增加regionserver處理線程

hbase.regionserver.handler.count

默認10

7.3增加region大小

管理較少的region可以集群運行更平穩

默認256M

7.4減少最大日志文件數目

對于寫壓力比較大的應用，降低值強迫服務器頻繁將數據寫到磁盤，刷寫到磁盤的數據的日志就可以丟棄了。

7.5啟用數據壓縮

總結

以上是生活随笔為你收集整理的hbase优化实践的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

HBase

上一篇： CentOS下安装xampp
下一篇： ArcGIS 10.3 for Serv