日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

RGW Bucket Shard优化

發布時間:2023/12/14 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 RGW Bucket Shard优化 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.bucket index背景簡介

bucket index是整個RGW里面一個非常關鍵的數據結構,用于存儲bucket的索引數據,默認情況下單個bucket的index全部存儲在一個shard文件(shard數量為0,主要以OMAP-keys方式存儲在leveldb中),隨著單個bucket內的Object數量增加,整個shard文件的體積也在不斷增長,當shard文件體積過大就會引發各種問題。

2. 問題及故障

2.1 故障現象描述

  • Flapping OSD's when RGW buckets have millions of objects
  • ● Possible causes
  • ○ The first issue here is when RGW buckets have millions of objects their
  • bucket index shard RADOS objects become very large with high
  • number OMAP keys stored in leveldb. Then operations like deep-scrub,
  • bucket index listing etc takes a lot of time to complete and this triggers
  • OSD's to flap. If sharding is not used this issue become worse because
  • then only one RADOS index objects will be holding all the OMAP keys.
  • RGW的index數據以omap形式存儲在OSD所在節點的leveldb中,當單個bucket存儲的Object數量高達百萬數量級的時候,
    deep-scrub和bucket list一類的操作將極大的消耗磁盤資源,導致對應OSD出現異常,
    如果不對bucket的index進行shard切片操作(shard切片實現了將單個bucket index的LevelDB實例水平切分到多個OSD上),數據量大了以后很容易出事。

  • ○ The second issue is when you have good amount of DELETEs it causes
  • loads of stale data in OMAP and this triggers leveldb compaction all the
  • time which is single threaded and non optimal with this kind of workload
  • and causes osd_op_threads to suicide because it is always compacting
  • hence OSD’s starts flapping.
  • RGW在處理大量DELETE請求的時候,會導致底層LevelDB頻繁進行數據庫compaction(數據壓縮,對磁盤性能損耗很大)操作,而且剛好整個compaction在LevelDB中又是單線程處理,很容易到達osdopthreads超時上限而導致OSD自殺。

    常見的問題有:

  • 對index pool進行scrub或deep-scrub的時候,如果shard對應的Object過大,會極大消耗底層存儲設備性能,造成io請求超時。
  • 底層deep-scrub的時候耗時過長,會出現request blocked,導致大量http請求超時而出現50x錯誤,從而影響到整個RGW服務的可用性。
  • 當壞盤或者osd故障需要恢復數據的時候,恢復一個大體積的shard文件將耗盡存儲節點性能,甚至可能因為OSD響應超時而導致整個集群出現雪崩。
  • 2.2 根因跟蹤

    當bucket index所在的OSD omap過大的時候,一旦出現異常導致OSD進程崩潰,這個時候就需要進行現場"救火",用最快的速度恢復OSD服務。
    先確定對應OSD的OMAP大小,這個過大會導致OSD啟動的時候消耗大量時間和資源去加載levelDB數據,導致OSD無法啟動(超時自殺)。
    特別是這一類OSD啟動需要占用非常大的內存消耗,一定要注意預留好內存。(物理內存40G左右,不行用swap頂上)


    image.png

    3. 臨時解決方案

    3.1 關閉集群scrub, deep-scrub提升集群穩定性

    $ ceph osd set noscrub $ ceph osd set nodeep-scrub

    3.2 調高timeout參數,減少OSD自殺的概率

    osd_op_thread_timeout = 90 #default is 15 osd_op_thread_suicide_timeout = 2000 #default is 150 If filestore op threads are hitting timeout filestore_op_thread_timeout = 180 #default is 60 filestore_op_thread_suicide_timeout = 2000 #default is 180 Same can be done for recovery thread also. osd_recovery_thread_timeout = 120 #default is 30 osd_recovery_thread_suicide_timeout = 2000

    3.2 手工壓縮OMAP

    在可以停OSD的情況下,可以對OSD進行compact操作,推薦在ceph 0.94.6以上版本,低于這個版本有bug。 https://github.com/ceph/ceph/pull/7645/files

  • ○ The third temporary step could be taken if OSD's have very large OMAP
  • directories you can verify it with command: du -sh /var/lib/ceph/osd/ceph-$id/current/omap, then do manual leveldb compaction for OSD's.
  • ■ ceph tell osd.$id compact or
  • ■ ceph daemon osd.$id compact or
  • ■ Add leveldb_compact_on_mount = true in [osd.$id] or [osd] section
  • and restart the OSD.
  • ■ This makes sure that it compacts the leveldb and then bring the
  • OSD back up/in which really helps.
  • #開啟noout操作 $ ceph osd set noout#停OSD服務 $ systemctl stop ceph-osd@<osd-id>#在ceph.conf中對應的[osd.id]加上下面配置 leveldb_compact_on_mount = true#啟動osd服務 $ systemctl start ceph-osd@<osd-id>#使用ceph -s命令觀察結果,最好同時使用tailf命令去觀察對應的OSD日志.等所有pg處于active+clean之后再繼續下面的操作 $ ceph -s #確認compact完成以后的omap大小: $ du -sh /var/lib/ceph/osd/ceph-$id/current/omap#刪除osd中臨時添加的leveldb_compact_on_mount配置#取消noout操作(視情況而定,建議線上還是保留noout): $ ceph osd unset noout

    4. 永久解決方案

    4.1 提前規劃好bucket shard

    • index pool一定要上SSD,這個是本文優化的前提,沒硬件支撐后面這些操作都是白搭。

    • 合理設置bucket 的shard 數量
      shard的數量并不是越多越好,過多的shard會導致部分類似list bucket的操作消耗大量底層存儲IO,導致部分請求耗時過長。
      shard的數量還要考慮到你OSD的故障隔離域和副本數設置。比如你設置index pool的size為2,并且有2個機柜,共24個OSD節點,理想情況下每個shard的2個副本都應該分布在2個機柜里面,比如當你shard設置為8的時候,總共有8*2=16個shard文件需要存儲,那么這16個shard要做到均分到2個機柜。同時如果你shard超過24個,這很明顯也是不合適的。

    • 控制好單個bucket index shard的平均體積,目前推薦單個shard存儲的Object信息條目在10-15W左右,過多則需要對相應的bucket做單獨reshard操作(注意這個是高危操作,謹慎使用)。比如你預計單個bucket最多存儲100W個Object,那么100W/8=12.5W,設置shard數為8是比較合理的。shard文件中每條omapkey記錄大概占用200 byte的容量,那么150000*200/1024/1024 ≈ 28.61 MB,也就是說要控制單個shard文件的體積在28MB以內。

    • 業務層面控制好每個bucket的Object上限,按每個shard文件平均10-15W Object為宜。

    4.1.1 配置Bucket Index Sharding

    To enable and configure bucket index sharding on all new buckets, use: redhat-bucket_sharding

    • the rgw_override_bucket_index_max_shards setting for simple configurations,
    • the bucket_index_max_shards setting for federated configurations

    Simple configurations:

    #1. 修改配置文件設置相應的參數。 Note that maximum number of shards is 7877. [global] rgw_override_bucket_index_max_shards = 10 #2. 重啟rgw服務,讓其生效 systemctl restart ceph-radosgw.target#3. 查看bucket shard數 rados -p default.rgw.buckets.index ls | wc -l 1000

    Federated configurations
    In federated configurations, each zone can have a different index_pool setting to manage failover. To configure a consistent shard count for zones in one region, set the bucket_index_max_shards setting in the configuration for that region. To do so:

    #1. Extract the region configuration to the region.json file: $ radosgw-admin region get > region.json#2. In the region.json file, set the bucket_index_max_shards setting for each named zone.#3. Reset the region: $ radosgw-admin region set < region.json#4. Update the region map: $ radosgw-admin regionmap update --name <name>#5. Replace <name> with the name of the Ceph Object Gateway user, for example: $ radosgw-admin regionmap update --name client.rgw.ceph-client

    上傳文件Demo:

    #_*_coding:utf-8_*_ #yum install python-boto import boto import boto.s3.connection #pip install filechunkio from filechunkio import FileChunkIO import math import threading import os import Queue class Chunk(object):num = 0offset = 0len = 0def __init__(self,n,o,l):self.num=nself.offset=oself.length=l class CONNECTION(object):def __init__(self,access_key,secret_key,ip,port,is_secure=False,chrunksize=8<<20): #chunksize最小8M否則上傳過程會報錯self.conn=boto.connect_s3(aws_access_key_id=access_key,aws_secret_access_key=secret_key,host=ip,port=port,is_secure=is_secure,calling_format=boto.s3.connection.OrdinaryCallingFormat())self.chrunksize=chrunksizeself.port=port#查詢def list_all(self):all_buckets=self.conn.get_all_buckets()for bucket in all_buckets:print u'容器名: %s' %(bucket.name)for key in bucket.list():print ' '*5,"%-20s%-20s%-20s%-40s%-20s" %(key.mode,key.owner.id,key.size,key.last_modified.split('.')[0],key.name)def list_single(self,bucket_name):try:single_bucket = self.conn.get_bucket(bucket_name)except Exception as e:print 'bucket %s is not exist' %bucket_namereturnprint u'容器名: %s' % (single_bucket.name)for key in single_bucket.list():print ' ' * 5, "%-20s%-20s%-20s%-40s%-20s" % (key.mode, key.owner.id, key.size, key.last_modified.split('.')[0], key.name)#普通小文件下載:文件大小<=8Mdef dowload_file(self,filepath,key_name,bucket_name):all_bucket_name_list = [i.name for i in self.conn.get_all_buckets()]if bucket_name not in all_bucket_name_list:print 'Bucket %s is not exist,please try again' % (bucket_name)returnelse:bucket = self.conn.get_bucket(bucket_name)all_key_name_list = [i.name for i in bucket.get_all_keys()]if key_name not in all_key_name_list:print 'File %s is not exist,please try again' % (key_name)returnelse:key = bucket.get_key(key_name)if not os.path.exists(os.path.dirname(filepath)):print 'Filepath %s is not exists, sure to create and try again' % (filepath)returnif os.path.exists(filepath):while True:d_tag = raw_input('File %s already exists, sure you want to cover (Y/N)?' % (key_name)).strip()if d_tag not in ['Y', 'N'] or len(d_tag) == 0:continueelif d_tag == 'Y':os.remove(filepath)breakelif d_tag == 'N':returnos.mknod(filepath)try:key.get_contents_to_filename(filepath)except Exception:pass# 普通小文件上傳:文件大小<=8Mdef upload_file(self,filepath,key_name,bucket_name):try:bucket = self.conn.get_bucket(bucket_name)except Exception as e:print 'bucket %s is not exist' % bucket_nametag = raw_input('Do you want to create the bucket %s: (Y/N)?' % bucket_name).strip()while tag not in ['Y', 'N']:tag = raw_input('Please input (Y/N)').strip()if tag == 'N':returnelif tag == 'Y':self.conn.create_bucket(bucket_name)bucket = self.conn.get_bucket(bucket_name)all_key_name_list = [i.name for i in bucket.get_all_keys()]if key_name in all_key_name_list:while True:f_tag = raw_input(u'File already exists, sure you want to cover (Y/N)?: ').strip()if f_tag not in ['Y', 'N'] or len(f_tag) == 0:continueelif f_tag == 'Y':breakelif f_tag == 'N':returnkey=bucket.new_key(key_name)if not os.path.exists(filepath):print 'File %s does not exist, please make sure you want to upload file path and try again' %(key_name)returntry:f=file(filepath,'rb')data=f.read()key.set_contents_from_string(data)except Exception:passdef delete_file(self,key_name,bucket_name):all_bucket_name_list = [i.name for i in self.conn.get_all_buckets()]if bucket_name not in all_bucket_name_list:print 'Bucket %s is not exist,please try again' % (bucket_name)returnelse:bucket = self.conn.get_bucket(bucket_name)all_key_name_list = [i.name for i in bucket.get_all_keys()]if key_name not in all_key_name_list:print 'File %s is not exist,please try again' % (key_name)returnelse:key = bucket.get_key(key_name)try:bucket.delete_key(key.name)except Exception:passdef delete_bucket(self,bucket_name):all_bucket_name_list = [i.name for i in self.conn.get_all_buckets()]if bucket_name not in all_bucket_name_list:print 'Bucket %s is not exist,please try again' % (bucket_name)returnelse:bucket = self.conn.get_bucket(bucket_name)try:self.conn.delete_bucket(bucket.name)except Exception:pass#隊列生成def init_queue(self,filesize,chunksize): #8<<20 :8*2**20chunkcnt=int(math.ceil(filesize*1.0/chunksize))q=Queue.Queue(maxsize=chunkcnt)for i in range(0,chunkcnt):offset=chunksize*ilength=min(chunksize,filesize-offset)c=Chunk(i+1,offset,length)q.put(c)return q#分片上傳objectdef upload_trunk(self,filepath,mp,q,id):while not q.empty():chunk=q.get()fp=FileChunkIO(filepath,'r',offset=chunk.offset,bytes=chunk.length)mp.upload_part_from_file(fp,part_num=chunk.num)fp.close()q.task_done()#文件大小獲取---->S3分片上傳對象生成----->初始隊列生成(--------------->文件切,生成切分對象)def upload_file_multipart(self,filepath,key_name,bucket_name,threadcnt=8):filesize=os.stat(filepath).st_sizetry:bucket=self.conn.get_bucket(bucket_name)except Exception as e:print 'bucket %s is not exist' % bucket_nametag=raw_input('Do you want to create the bucket %s: (Y/N)?' %bucket_name).strip()while tag not in ['Y','N']:tag=raw_input('Please input (Y/N)').strip()if tag == 'N':returnelif tag == 'Y':self.conn.create_bucket(bucket_name)bucket = self.conn.get_bucket(bucket_name)all_key_name_list=[i.name for i in bucket.get_all_keys()]if key_name in all_key_name_list:while True:f_tag=raw_input(u'File already exists, sure you want to cover (Y/N)?: ').strip()if f_tag not in ['Y','N'] or len(f_tag) == 0:continueelif f_tag == 'Y':breakelif f_tag == 'N':returnmp=bucket.initiate_multipart_upload(key_name)q=self.init_queue(filesize,self.chrunksize)for i in range(0,threadcnt):t=threading.Thread(target=self.upload_trunk,args=(filepath,mp,q,i))t.setDaemon(True)t.start()q.join()mp.complete_upload()#文件分片下載def download_chrunk(self,filepath,key_name,bucket_name,q,id):while not q.empty():chrunk=q.get()offset=chrunk.offsetlength=chrunk.lengthbucket=self.conn.get_bucket(bucket_name)resp=bucket.connection.make_request('GET',bucket_name,key_name,headers={'Range':"bytes=%d-%d" %(offset,offset+length)})data=resp.read(length)fp=FileChunkIO(filepath,'r+',offset=chrunk.offset,bytes=chrunk.length)fp.write(data)fp.close()q.task_done()def download_file_multipart(self,filepath,key_name,bucket_name,threadcnt=8):all_bucket_name_list=[i.name for i in self.conn.get_all_buckets()]if bucket_name not in all_bucket_name_list:print 'Bucket %s is not exist,please try again' %(bucket_name)returnelse:bucket=self.conn.get_bucket(bucket_name)all_key_name_list = [i.name for i in bucket.get_all_keys()]if key_name not in all_key_name_list:print 'File %s is not exist,please try again' %(key_name)returnelse:key=bucket.get_key(key_name)if not os.path.exists(os.path.dirname(filepath)):print 'Filepath %s is not exists, sure to create and try again' % (filepath)returnif os.path.exists(filepath):while True:d_tag = raw_input('File %s already exists, sure you want to cover (Y/N)?' % (key_name)).strip()if d_tag not in ['Y', 'N'] or len(d_tag) == 0:continueelif d_tag == 'Y':os.remove(filepath)breakelif d_tag == 'N':returnos.mknod(filepath)filesize=key.sizeq=self.init_queue(filesize,self.chrunksize)for i in range(0,threadcnt):t=threading.Thread(target=self.download_chrunk,args=(filepath,key_name,bucket_name,q,i))t.setDaemon(True)t.start()q.join()def generate_object_download_urls(self,key_name,bucket_name,valid_time=0):all_bucket_name_list = [i.name for i in self.conn.get_all_buckets()]if bucket_name not in all_bucket_name_list:print 'Bucket %s is not exist,please try again' % (bucket_name)returnelse:bucket = self.conn.get_bucket(bucket_name)all_key_name_list = [i.name for i in bucket.get_all_keys()]if key_name not in all_key_name_list:print 'File %s is not exist,please try again' % (key_name)returnelse:key = bucket.get_key(key_name)try:key.set_canned_acl('public-read')download_url = key.generate_url(valid_time, query_auth=False, force_http=True)if self.port != 80:x1=download_url.split('/')[0:3]x2=download_url.split('/')[3:]s1=u'/'.join(x1)s2=u'/'.join(x2)s3=':%s/' %(str(self.port))download_url=s1+s3+s2print download_urlexcept Exception:pass if __name__ == '__main__':#約定:#1:filepath指本地文件的路徑(上傳路徑or下載路徑),指的是絕對路徑#2:bucket_name相當于文件在對象存儲中的目錄名或者索引名#3:key_name相當于文件在對象存儲中對應的文件名或文件索引access_key = "FYT71CYU3UQKVMC8YYVY"secret_key = "rVEASbWAytjVLv1G8Ta8060lY3yrcdPTsEL0rfwr"ip='127.0.0.1'port=7480conn=CONNECTION(access_key,secret_key,ip,port)#查看所有bucket以及其包含的文件#conn.list_all()#簡單上傳,用于文件大小<=8M#conn.upload_file('/etc/passwd','passwd','test_bucket01')conn.upload_file('/tmp/test.log','test1','test_bucket12')#查看單一bucket下所包含的文件信息conn.list_single('test_bucket12')#簡單下載,用于文件大小<=8M# conn.dowload_file('/lhf_test/test01','passwd','test_bucket01')# conn.list_single('test_bucket01')#刪除文件# conn.delete_file('passwd','test_bucket01')# conn.list_single('test_bucket01')##刪除bucket# conn.delete_bucket('test_bucket01')# conn.list_all()#切片上傳(多線程),用于文件大小>8M,8M可修改,但不能小于8M,否則會報錯切片太小# conn.upload_file_multipart('/etc/passwd','passwd_multi_upload','test_bucket01')# conn.list_single('test_bucket01')# 切片下載(多線程),用于文件大小>8M,8M可修改,但不能小于8M,否則會報錯切片太小# conn.download_file_multipart('/lhf_test/passwd_multi_dowload','passwd_multi_upload','test_bucket01')#生成下載url#conn.generate_object_download_urls('passwd_multi_upload','test_bucket01')#conn.list_all()

    4.2 對bucket做reshard操作

    To reshard the bucket index pool: redhat-bucket_sharding

    #注意下面的操作一定要確保對應的bucket相關的操作都已經全部停止,之后使用下面命令備份bucket的index $ radosgw-admin bi list --bucket=<bucket_name> > <bucket_name>.list.backup#通過下面的命令恢復數據 $ radosgw-admin bi put --bucket=<bucket_name> < <bucket_name>.list.backup#查看bucket的index id $ radosgw-admin bucket stats --bucket=bucket-maillist {"bucket": "bucket-maillist","pool": "default.rgw.buckets.data","index_pool": "default.rgw.buckets.index","id": "0a6967a5-2c76-427a-99c6-8a788ca25034.54133.1", #注意這個id"marker": "0a6967a5-2c76-427a-99c6-8a788ca25034.54133.1","owner": "user","ver": "0#1,1#1","master_ver": "0#0,1#0","mtime": "2017-08-23 13:42:59.007081","max_marker": "0#,1#","usage": {},"bucket_quota": {"enabled": false,"max_size_kb": -1,"max_objects": -1} }#Reshard對應bucket的index操作如下: #使用命令將"bucket-maillist"的shard調整為4,注意命令會輸出osd和new兩個bucket的instance id$ radosgw-admin bucket reshard --bucket="bucket-maillist" --num-shards=4 *** NOTICE: operation will not remove old bucket index objects *** *** these will need to be removed manually *** old bucket instance id: 0a6967a5-2c76-427a-99c6-8a788ca25034.54133.1 new bucket instance id: 0a6967a5-2c76-427a-99c6-8a788ca25034.54147.1 total entries: 3#之后使用下面的命令刪除舊的instance id$ radosgw-admin bi purge --bucket="bucket-maillist" --bucket-id=0a6967a5-2c76-427a-99c6-8a788ca25034.54133.1#查看最終結果 $ radosgw-admin bucket stats --bucket=bucket-maillist {"bucket": "bucket-maillist","pool": "default.rgw.buckets.data","index_pool": "default.rgw.buckets.index","id": "0a6967a5-2c76-427a-99c6-8a788ca25034.54147.1", #id已經變更"marker": "0a6967a5-2c76-427a-99c6-8a788ca25034.54133.1","owner": "user","ver": "0#2,1#1,2#1,3#2","master_ver": "0#0,1#0,2#0,3#0","mtime": "2017-08-23 14:02:19.961205","max_marker": "0#,1#,2#,3#","usage": {"rgw.main": {"size_kb": 50,"size_kb_actual": 60,"num_objects": 3}},"bucket_quota": {"enabled": false,"max_size_kb": -1,"max_objects": -1} }

    總結

    以上是生活随笔為你收集整理的RGW Bucket Shard优化的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。