當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

zookeeper客户端库curator分析

發(fā)布時(shí)間：2024/2/28 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 zookeeper客户端库curator分析小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

zookeeper客戶端庫curator分析

- 前言
- 綜述
- zookeeper保證
- - 理解zookeeper的順序一致性
- 之前使用zookeeper客戶端踩到的坑
- curator 連接保證
- 連接狀態(tài)監(jiān)控以及重試機(jī)制
- 實(shí)例管理
- Recipes 場景支持
- - 基本操作
  - 監(jiān)聽watch
  - 實(shí)現(xiàn)的recipes
  - - Elections 選舉
    - locks 鎖
  - counters 計(jì)數(shù)器
  - - caches 緩存
    - Nodes/Watchers
    - Queues 隊(duì)列
    - 事務(wù)
  - tech note
- 參考鏈接

前言

筆者在日常工作中主要使用的編程語言是C++，但從事互聯(lián)網(wǎng)行業(yè)總離不開要和分布式共識協(xié)議下的注冊中心打交道。筆者所在的公司主要用zookeeper。最初是裸著用，出現(xiàn)了很多問題，后來痛定思痛，決定研究curator，并根據(jù)其思路開發(fā)一套c++的客戶端。下面是筆者在閱讀curator代碼和設(shè)計(jì)文檔的過程中的筆記。

本文對java源碼探究不深，原因一是筆者本身java水平不高，二是筆者閱讀curator代碼的目的只是為了學(xué)習(xí)其設(shè)計(jì)思路，啟發(fā)自己設(shè)計(jì)c++版本庫。

綜述

zookeeper不是為高可用性設(shè)計(jì)的，但它使用ZAB協(xié)議達(dá)到了極高的一致性。所以它經(jīng)常被選作注冊中心、配置中心、分布式鎖等場景。zookeeper是最終一致性系統(tǒng)，而很多實(shí)際應(yīng)用需要保證強(qiáng)一致。

官方文檔這樣描述Curator存在的意義：ZooKeeper is a very low level system that requires users to do a lot of housekeeping. See: Zookeeper FAQ. The Curator Framework is designed to hide as much of the details/tedium of this housekeeping as is possible.

目前看到有兩套兩款比較好的開源客戶端，對zookeeper的原生API進(jìn)行了包裝：zkClient和curator。后者是Nexflix的開源項(xiàng)目，目前運(yùn)作在Apache基金會名下，也是spring全家桶的選擇。

zookeeper保證

根據(jù)zookeeper官方文檔，zookeeper提供了如下保證：

Sequential Consistency - Updates from a client will be applied in the order that they were sent.
Atomicity - Updates either succeed or fail. No partial results.
Single System Image - A client will see the same view of the service regardless of the server that it connects to. i.e., a client will never see an older view of the system even if the client fails over to a different server with the same session. 如果client首先看到了新數(shù)據(jù)，再嘗試重連到存有舊數(shù)據(jù)的follower，該follower會拒絕該連接（client的zxid高于follower）
Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

根據(jù)我的實(shí)踐，認(rèn)為zookeeper只是一個(gè)最終一致性的分布式系統(tǒng)，并且歷史上zookeeper還經(jīng)常爆出違反分布式共識的bug，比如expired ephemeral node reappears after ZK leader change這個(gè)，session expired之后，臨時(shí)節(jié)點(diǎn)仍然存在

理解zookeeper的順序一致性

ZooKeeper Programmer’s Guide提到：

Sometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is:
Simultaneously Conistent Cross-Client Views
ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read.
So, ZooKeeper by itself doesn’t guarantee that changes occur synchronously across all servers, but ZooKeeper primitives can be used to construct higher level functions that provide useful client synchronization.

就是說zookeeper并不保證每次從其一個(gè)server讀到的值是最新的，它只保證這個(gè)server中的值是順序更新的，如果想要讀取最新的值，必須在get之前調(diào)用sync()(zoo_async)

之前使用zookeeper客戶端踩到的坑

zk session 處理

忽略了connecting事件，client與server心跳超時(shí)之后沒有將選主服務(wù)及時(shí)下線掉，導(dǎo)致雙主。
多個(gè)線程處理zk的連接狀態(tài)，導(dǎo)致產(chǎn)生了多套zk線程連接zkserver。
zk超時(shí)時(shí)間不合理，導(dǎo)致重連頻率太高，打爆zkserver。
所有的zkserver全部重置（zk server全部狀態(tài)被重置），這種情況下客戶端不會受到expired事件，我之前實(shí)現(xiàn)的客戶端也不會重新去建立zk session。導(dǎo)致之前的zkclient建立的session全部不可用，陷入無限重連而連不上的窘境。

多線程競態(tài)

zk自己的線程do_completion會調(diào)用watcher的回調(diào)函數(shù)，和業(yè)務(wù)線程產(chǎn)生競爭，導(dǎo)致core dump。

同步api

同步API沒有超時(shí)時(shí)間，如果zkserver狀態(tài)不對，會導(dǎo)致調(diào)用同步zk API的線程卡死。
供業(yè)務(wù)使用的api設(shè)計(jì)不當(dāng)，導(dǎo)致初始化時(shí)調(diào)用的同步版本api造成死鎖。

curator 連接保證

Curator會監(jiān)控所有的zookeeper連接，并且所有的操作都會有重試機(jī)制，因此curator可以保證：

所有的Curator operation（create、get.sync等）都會在zookeeper連接建立之后再進(jìn)行

所有的Curator operation都可以通過重試機(jī)制正確的處理zookeeper session loss/expireds事件

如果當(dāng)前session lost了，Curator operation可以保持一致重試直到成功

所有的curator client都會以一種合理的方式處理zookeeper連接問題。

連接狀態(tài)監(jiān)控以及重試機(jī)制

ConnectionStateListener

Curator will set the LOST state when it believes that the ZooKeeper session has expired. ZooKeeper connections have a session. When the session expires, clients must take appropriate action. In Curator, this is complicated by the fact that Curator internally manages the ZooKeeper connection. Curator will set the LOST state when any of the following occurs: a) ZooKeeper returns a Watcher.Event.KeeperState.Expired or KeeperException.Code.SESSIONEXPIRED; b) Curator closes the internally managed ZooKeeper instance; c) The session timeout elapses during a network partition. It is possible to get a RECONNECTED state after this but you should still consider any locks, etc. as dirty/unstable.

checkSessionExpiration如果一定時(shí)間內(nèi)收不到zkserver的任何時(shí)間，則認(rèn)為當(dāng)前連接已經(jīng)expire

實(shí)例管理

類構(gòu)造函數(shù)

/*** @param ensembleProvider the ensemble provider 連接ipstring* @param sessionTimeoutMs session timeout 就是我們設(shè)置的sessiontimeout超時(shí)時(shí)間* @param connectionTimeoutMs connection timeout 連接超時(shí)，這么久還沒有連上就不連了* @param watcher default watcher or null* @param retryPolicy the retry policy to use retryforever*/public CuratorZookeeperClient(EnsembleProvider ensembleProvider, int sessionTimeoutMs, int connectionTimeoutMs, Watcher watcher, RetryPolicy retryPolicy){this(new DefaultZookeeperFactory(), ensembleProvider, sessionTimeoutMs, connectionTimeoutMs, watcher, retryPolicy, false);}

狀態(tài)轉(zhuǎn)換時(shí)調(diào)用ConnectionStateListener

Recipes 場景支持

curator有不同的Recipes執(zhí)行不同的功能，并且都集成了zookeeper很多底層語句，比如節(jié)點(diǎn)選舉，會首先注冊整合path，再注冊和watch選舉znode。

Most Curator recipes will autocreate parent nodes of paths given to the recipe as CreateMode

基本操作

curator采用fluent風(fēng)格api，提供同步和異步(BackgroundCallback)兩種

監(jiān)聽watch

addListener加入的監(jiān)聽器不用重復(fù)添加

Zookeeper原生支持通過注冊Watcher來進(jìn)行事件監(jiān)聽，但是開發(fā)者需要反復(fù)注冊(Watcher只能單次注冊單次使用)。Cache是Curator中對事件監(jiān)聽的包裝，可以看作是對事件監(jiān)聽的本地緩存視圖，能夠自動為開發(fā)者處理反復(fù)注冊監(jiān)聽。Curator提供了三種Watcher(Cache)來監(jiān)聽結(jié)點(diǎn)的變化。可以兼用以下狀態(tài)的變化：

zk掛掉type=CONNECTION_SUSPENDED,，一段時(shí)間后type=CONNECTION_LOST
重啟zk：type=CONNECTION_RECONNECTED, data=null
更新子節(jié)點(diǎn)：type=CHILD_UPDATED
刪除子節(jié)點(diǎn)type=CHILD_REMOVED

實(shí)現(xiàn)的recipes

curator提供了各種recipes提供各種功能直接為上層業(yè)務(wù)使用。

Elections 選舉

LeaderSelector：只要takeLeadership不退出，當(dāng)前節(jié)點(diǎn)就一直是leader。實(shí)際上是用InterProcessMutex做的
LeaderLatch：一旦選舉出Leader，除非有客戶端掛掉重新觸發(fā)選舉，否則不會交出領(lǐng)導(dǎo)權(quán)。

locks 鎖

分布式鎖

counters 計(jì)數(shù)器

由于zk的寫是遞交到leader去寫的，而讀是follower就可以讀，所以不知道這個(gè)計(jì)數(shù)器會不會引起stale read

caches 緩存

有不同級別的緩存，比如node、path、tree。并且它會注冊watcher，如果節(jié)點(diǎn)有變更，curator會及時(shí)更新cache

Nodes/Watchers

這個(gè)主要指創(chuàng)建一些persist node，和與之對應(yīng)的watcher

Queues 隊(duì)列

zookeeper順序節(jié)點(diǎn)本身就可以作為隊(duì)列使用

事務(wù)

略

tech note

所有的watcher事件都應(yīng)該在同一個(gè)線程里執(zhí)行，然后再這個(gè)線程里對訪問的資源加鎖（這個(gè)操作應(yīng)該由zk庫在zk線程里自己完成）

認(rèn)真對待session生命周期，如果expired就需要重連，如果session已經(jīng)expired了，所有與這個(gè)session相關(guān)的操作也應(yīng)該失敗。session和臨時(shí)節(jié)點(diǎn)是綁定的，session expired了臨時(shí)節(jié)點(diǎn)也就沒了

zookeeper可以把sessionid和password保存起來，下次新建連接的時(shí)候可以直接用之前的

zookeeper不適合做消息隊(duì)列，因?yàn)?

zookeeper有1M的消息大小限制
zookeeper的children太多會極大的影響性能
znode太大也會影響性能
znode太大會導(dǎo)致重啟zkserver耗時(shí)10-15分鐘
zookeeper僅使用內(nèi)存作為存儲，所以不能存儲太多東西。

最好單線程操作zk客戶端，不要并發(fā)，臨界、競態(tài)問題太多

Curator session 生命周期管理：

CONNECTED：第一次建立連接成功時(shí)收到該事件
READONLY：標(biāo)明當(dāng)前連接是read-only狀態(tài)
SUSPENDED：連接目前斷開了(收到KeeperState.Disconnected事件，也就是說curator目前沒有連接到任何的zk server)，leader選舉、分布式鎖等操作遇到SUSPENED事件應(yīng)該暫停自己的操作直到重連成功。Curator官方建議把SUSPENDED事件當(dāng)作完全的連接斷開來處理。意思就是把收到SUSPENDED事件的時(shí)候就當(dāng)作自己注冊的所有臨時(shí)節(jié)點(diǎn)已經(jīng)掉了。
LOST：如下幾種情況會進(jìn)出LOST事件
- curator收到zkserver發(fā)來的EXPIRED事件。
- curator自己關(guān)掉當(dāng)前zookeeper session
- 當(dāng)curator斷定當(dāng)前session被zkserver認(rèn)為已經(jīng)expired時(shí)設(shè)置該事件。在Curator 3.x，Curator會有自己的定時(shí)器，如果收到SUSPENDED事件一直沒有沒有收到重連成功的事件，超時(shí)一定時(shí)間（2/3 * session_timeout）。curator會認(rèn)為當(dāng)前session已經(jīng)在server側(cè)超時(shí)，并進(jìn)入LOST事件。
RECONNECTED：重連成功

對于何時(shí)進(jìn)入LOST狀態(tài)，curator的建議：

When Curator receives a KeeperState.Disconnected message it changes its state to SUSPENDED (see TN12, errors, etc.). As always, our recommendation is to treat SUSPENDED as a complete connection loss. Exit all locks, leaders, etc. That said, since 3.x, Curator tries to simulate session expiration by starting an internal timer when KeeperState.Disconnected is received. If the timer expires before the connection is repaired, Curator changes its state to LOST and injects a session end into the managed ZooKeeper client connection. The duration of the timer is set to the value of the “negotiated session timeout” by calling ZooKeeper#getSessionTimeout().
The astute reader will realize that setting the timer to the full value of the session timeout may not be the correct value. This is due to the fact that the server closes the connection when 2/3 of a session have already elapsed. Thus, the server may close a session well before Curator’s timer elapses. This is further complicated by the fact that the client has no way of knowing why the connection was closed. There are at least three possible reasons for a client connection to close:

The server has not received a heartbeat within 2/3 of a session
The server crashed
Some kind of general TCP error which causes a connection to fail

In situtation 1, the correct value for Curator’s timer is 1/3 of a session - i.e. Curator should switch to LOST if the connection is not repaired within 1/3 of a session as 2/3 of the session has already lapsed from the server’s point of view. In situations 2 and 3 however, Curator’s timer should be the full value of the session (possibly plus some “slop” value). In truth, there is no way to completely emulate in the client the session timing as managed by the ZooKeeper server. So, again, our recommendation is to treat SUSPENDED as complete connection loss.

curator默認(rèn)使用100%的session timeout時(shí)間作為SUSPENDED到LOST的轉(zhuǎn)換時(shí)間，但是用戶可以根據(jù)需求配置為33%的session timeout以滿足上文所說的情況的場景

參考鏈接

基于Apache Curator框架的ZooKeeper使用詳解

Zookeeper客戶端Curator使用詳解

ZooKeeper和Curator相關(guān)經(jīng)驗(yàn)總結(jié)

Welcome to Curator

Curator Error Handling

Recipescurator支持的業(yè)務(wù)類型，比如選舉，計(jì)數(shù)，跨線程鎖等

基于Zookeeper實(shí)現(xiàn)的分布式互斥鎖 - InterProcessMutex

how to properly recreate ephemeral nodes and reset watches after a session expiry

總結(jié)

以上是生活随笔為你收集整理的zookeeper客户端库curator分析的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：一致性协议raft详解（四）：raft在
下一篇： Google Megastore介绍