當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

RabbitMQ Network Partitions

發(fā)布時(shí)間：2024/4/11 编程问答 68 豆豆

生活随笔收集整理的這篇文章主要介紹了 RabbitMQ Network Partitions 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

歡迎支持筆者新作：《深入理解Kafka:核心設(shè)計(jì)與實(shí)踐原理》和《RabbitMQ實(shí)戰(zhàn)指南》，同時(shí)歡迎關(guān)注筆者的微信公眾號：朱小廝的博客。

Clustering and Network Partitions
RabbitMQ clusters do not tolerate network partitions well. If you are thinking of clustering across a WAN, don’t. You should use federation or the shovel instead.
However, sometimes accidents happen. This page documents how to detect network partitions, some of the bad effects that may happen during partitions, and how to recover from them.
RabbitMQ stores information about queues, exchanges, bindings etc in Erlang’s distributed database, Mnesia. Many of the details of what happens around network partitions are related to Mnesia’s behaviour.

集群和網(wǎng)絡(luò)分區(qū)
RabbitMQ集群并不能很好的“忍受”網(wǎng)絡(luò)分區(qū)。如果你想將RabbitMQ集群建立在廣域網(wǎng)上，記住那是行不通的，除非你使用federation或者shovel等插件。

然而有時(shí)候會有一些意想不到的事情發(fā)生。本文主要講述了RabbitMQ集群如何檢測網(wǎng)絡(luò)分區(qū)，發(fā)生網(wǎng)絡(luò)分區(qū)帶來的影響以及如何恢復(fù)。

RabbitMQ會將queues, exchanges, bindings等信息存儲在Erlang的分布式數(shù)據(jù)庫——Mnesia中，許多圍繞網(wǎng)絡(luò)分區(qū)的一些細(xì)節(jié)都和這個(gè)Mnesia的行為有關(guān)。

Detecting network partitions
Mnesia will typically determine that a node is down if another node is unable to contact it for a minute or so (see the page on net_ticktime). If two nodes come back into contact, both having thought the other is down, Mnesia will determine that a partition has occurred. This will be written to the RabbitMQ log in a form like:

=ERROR REPORT==== 15-Oct-2012::18:02:30 === Mnesia(rabbit@smacmullen): ** ERROR ** mnesia_event got{inconsistent_database, running_partitioned_network, hare@smacmullen}

RabbitMQ nodes will record whether this event has ever occurred while the node is up, and expose this information through rabbitmqctl cluster_status and the management plugin.
rabbitmqctl cluster_status will normally show an empty list for partitions:

# rabbitmqctl cluster_status Cluster status of node rabbit@smacmullen ... [{nodes,[{disc,[hare@smacmullen,rabbit@smacmullen]}]},{running_nodes,[rabbit@smacmullen,hare@smacmullen]},{partitions,[]}] ...done.

However, if a network partition has occurred then information about partitions will appear there:

The management plugin API will return partition information for each node under partitions in /api/nodes. The management plugin UI will show a large red warning on the overview page if a partition has occurred.

檢測網(wǎng)絡(luò)分區(qū)
如果另一個(gè)節(jié)點(diǎn)在一分鐘（或者一個(gè)net_ticktime時(shí)間）內(nèi)不能連接上一個(gè)節(jié)點(diǎn)，那么Mnesia通常任務(wù)這個(gè)節(jié)點(diǎn)已經(jīng)掛了。就算之后兩個(gè)節(jié)點(diǎn)連通（譯者注：應(yīng)該是指網(wǎng)絡(luò)上的可連通），但是這兩個(gè)節(jié)點(diǎn)都認(rèn)為對方已經(jīng)掛了，Mnesia此時(shí)認(rèn)定發(fā)送了網(wǎng)絡(luò)分區(qū)的情況。這些會被記錄在RabbitMQ的日志中，如下所示：

=ERROR REPORT==== 15-Oct-2012::18:02:30 === Mnesia(rabbit@smacmullen): ** ERROR ** mnesia_event got{inconsistent_database, running_partitioned_network, hare@smacmullen}

當(dāng)一個(gè)節(jié)點(diǎn)起來的時(shí)候，RabbitMQ會記錄是否發(fā)生了網(wǎng)絡(luò)分區(qū)，你可以通過rabbitmqctl cluster_status這個(gè)命令或者管理插件看到相關(guān)信息。正常情況下，通過rabbitmqctl cluster_status命令查看到的信息中partitions那一項(xiàng)是空的，就像這樣：

然而當(dāng)網(wǎng)絡(luò)分區(qū)發(fā)生時(shí)，會變成這樣：

通過管理插件的API（under partitions in /api/nodes）可以獲取到在各個(gè)節(jié)點(diǎn)的分區(qū)信息.

通過Web UI可以在Overview這一頁看到一個(gè)大的紅色的告警窗口，就像這樣：

During a network partition
While a network partition is in place, the two (or more!) sides of the cluster can evolve independently, with both sides thinking the other has crashed. Queues, bindings, exchanges can be created or deleted separately.Mirrored queues which are split across the partition will end up with one master on each side of the partition, again with both sides acting independently. Other undefined and weird behaviour may occur.
It is important to understand that when network connectivity is restored, this state of affairs persists. The cluster will continue to act in this way until you take action to fix it.

網(wǎng)絡(luò)分區(qū)期間
當(dāng)一個(gè)集群發(fā)生網(wǎng)絡(luò)分區(qū)時(shí)，這個(gè)集群會分成兩部分（或者更多），它們各自為政，互相都認(rèn)為對方分區(qū)內(nèi)的節(jié)點(diǎn)已經(jīng)掛了，包括queues, bindings, exchanges這些信息的創(chuàng)建和銷毀都處于自身分區(qū)內(nèi)，與其他分區(qū)無關(guān)。如果原集群中配置了鏡像隊(duì)列，而這個(gè)鏡像隊(duì)列又牽涉到兩個(gè)（或者多個(gè)）網(wǎng)絡(luò)分區(qū)的節(jié)點(diǎn)時(shí)，每一個(gè)網(wǎng)絡(luò)分區(qū)中都會出現(xiàn)一個(gè)master節(jié)點(diǎn)（譯者注：如果rabbitmq版本較新，分區(qū)節(jié)點(diǎn)個(gè)數(shù)充足，也會出現(xiàn)新的slave節(jié)點(diǎn)。），對于各個(gè)網(wǎng)絡(luò)分區(qū)，此隊(duì)列都是互相獨(dú)立的。當(dāng)然也會有一些其他未知的、怪異的事情發(fā)生。

當(dāng)網(wǎng)絡(luò)（這里只網(wǎng)絡(luò)連通性，network connectivity）恢復(fù)時(shí)，網(wǎng)絡(luò)分區(qū)的狀態(tài)還是會保持，除非你采取了一些措施去解決他。

Partitions caused by suspend / resume
While we refer to “network” partitions, really a partition is any case in which the different nodes of a cluster can have communication interrupted without any node failing. In addition to network failures, suspending and resuming an entire OS can also cause partitions when used against running cluster nodes - as the suspended node will not consider itself to have failed, or even stopped, but the other nodes in the cluster will consider it to have done so.
While you could suspend a cluster node by running it on a laptop and closing the lid, the most common reason for this to happen is for a virtual machine to have been suspended by the hypervisor. While it’s fine to run RabbitMQ clusters in virtualised environments, you should make sure that VMs are not suspended while running. Note that some virtualisation features such as migration of a VM from one host to another will tend to involve the VM being suspended.
Partitions caused by suspend and resume will tend to be asymmetrical - the suspended node will not necessarily see the other nodes as having gone down, but will be seen as down by the rest of the cluster. This has particular implications for pause_minority mode.

掛起/恢復(fù)導(dǎo)致的分區(qū)
當(dāng)我們涉及到“網(wǎng)絡(luò)分區(qū)”時(shí)，當(dāng)集群中的不同的節(jié)點(diǎn)發(fā)生交互失敗中斷(communication interrupted)等，但是又沒有節(jié)點(diǎn)掛掉這種情況下，才是發(fā)生了分區(qū)。然而除了網(wǎng)絡(luò)失敗(network failures)原因，操作系統(tǒng)的掛起或者恢復(fù)也會導(dǎo)致集群內(nèi)節(jié)點(diǎn)的網(wǎng)絡(luò)分區(qū)。因?yàn)榘l(fā)生掛起的節(jié)點(diǎn)不會認(rèn)為自身已經(jīng)失敗或者停止工作，但是集群內(nèi)的其他節(jié)點(diǎn)會這么認(rèn)為。

如果一個(gè)集群中的一個(gè)節(jié)點(diǎn)運(yùn)行在一臺筆記本上，然后你合上了筆記本，這樣這個(gè)節(jié)點(diǎn)就掛起了。或者說一種更常見的現(xiàn)象，節(jié)點(diǎn)運(yùn)行在某臺虛擬機(jī)上，然后虛擬機(jī)的管理程序掛起了這個(gè)虛擬機(jī)節(jié)點(diǎn)，這樣也可能發(fā)生掛起。

由于掛起/恢復(fù)導(dǎo)致的分區(qū)并不對稱——掛起的節(jié)點(diǎn)將看不到其他節(jié)點(diǎn)是否消失，但是集群中剩余的節(jié)點(diǎn)可以觀察到，這一點(diǎn)貌似暗示了pause_minority這種模式（下面會涉及到）。

Recovering from a network partition
To recover from a network partition, first choose one partition which you trust the most. This partition will become the authority for the state of Mnesia to use; any changes which have occurred on other partitions will be lost.
Stop all nodes in the other partitions, then start them all up again. When they rejoin the cluster they will restore state from the trusted partition.
Finally, you should also restart all the nodes in the trusted partition to clear the warning.
It may be simpler to stop the whole cluster and start it again; if so make sure that the first node you start is from the trusted partition.

從網(wǎng)絡(luò)分區(qū)中恢復(fù)
未來從網(wǎng)絡(luò)分區(qū)中恢復(fù)，首先需要挑選一個(gè)信任的分區(qū)，這個(gè)分區(qū)才有決定Mnesia內(nèi)容的權(quán)限，發(fā)生在其他分區(qū)的改變將不被記錄到Mnesia中而直接丟棄。

停止（stop）其他分區(qū)的節(jié)點(diǎn)，然后啟動(start)這些節(jié)點(diǎn)，之后重新將這些節(jié)點(diǎn)加入到當(dāng)前信任的分區(qū)之中。

最后，你應(yīng)該重啟(restart)信任的分區(qū)中所有的節(jié)點(diǎn)，以去除告警。

你也可以簡單的關(guān)閉整個(gè)集群的節(jié)點(diǎn)，然后再啟動每一個(gè)節(jié)點(diǎn)，當(dāng)然，你要確保你啟動的第一個(gè)節(jié)點(diǎn)在你所信任的分區(qū)之中。

Automatically handling partitions
RabbitMQ also offers three ways to deal with network partitions automatically: pause-minority mode, pause-if-all-down mode and autoheal mode. (The default behaviour is referred to as ignore mode).
In pause-minority mode RabbitMQ will automatically pause cluster nodes which determine themselves to be in a minority (i.e. fewer or equal than half the total number of nodes) after seeing other nodes go down. It therefore chooses partition tolerance over availability from the CAP theorem. This ensures that in the event of a network partition, at most the nodes in a single partition will continue to run. The minority nodes will pause as soon as a partition starts, and will start again when the partition ends.
In pause-if-all-down mode, RabbitMQ will automatically pause cluster nodes which cannot reach any of the listed nodes. In other words, all the listed nodes must be down for RabbitMQ to pause a cluster node. This is close to the pause-minority mode, however, it allows an administrator to decide which nodes to prefer, instead of relying on the context. For instance, if the cluster is made of two nodes in rack A and two nodes in rack B, and the link between racks is lost, pause-minority mode will pause all nodes. In pause-if-all-down mode, if the administrator listed the two nodes in rack A, only nodes in rack B will pause. Note that it is possible the listed nodes get split across both sides of a partition: in this situation, no node will pause. That is why there is an additional ignore/autoheal argument to indicate how to recover from the partition.
In autoheal mode RabbitMQ will automatically decide on a winning partition if a partition is deemed to have occurred, and will restart all nodes that are not in the winning partition. Unlike pause_minority mode it therefore takes effect when a partition ends, rather than when one starts.
The winning partition is the one which has the most clients connected (or if this produces a draw, the one with the most nodes; and if that still produces a draw then one of the partitions is chosen in an unspecified way).
You can enable either mode by setting the configuration parameter cluster_partition_handling for therabbit application in your configuration file to:
● pause_minority
● {pause_if_all_down, [nodes], ignore | autoheal}
● autoheal

自動處理分區(qū)
RabbitMQ提供了三種方法自動的解決網(wǎng)絡(luò)分區(qū)：pause-minority mode, pause-if-all-down mode以及autoheal mode。（默認(rèn)的是ignore模式）

在pause-minority mode下，顧名思義，當(dāng)發(fā)生網(wǎng)絡(luò)分區(qū)時(shí)，集群中的節(jié)點(diǎn)在觀察到某些節(jié)點(diǎn)“丟失”時(shí)，會自動檢測其自身是否處于少數(shù)派（小于或者等于集群中一半的節(jié)點(diǎn)數(shù)），RabbitMQ會自動關(guān)閉這些節(jié)點(diǎn)的運(yùn)作。根據(jù)CAP原理來說，這里保障了P，即分區(qū)耐受性（partition tolerance）。這樣確保了在發(fā)生網(wǎng)絡(luò)分區(qū)的情況下，大多數(shù)節(jié)點(diǎn)（當(dāng)然這些節(jié)點(diǎn)在同一個(gè)分區(qū)中）可以繼續(xù)運(yùn)行。“少數(shù)派”中的節(jié)點(diǎn)在分區(qū)發(fā)生時(shí)會關(guān)閉，當(dāng)分區(qū)結(jié)束時(shí)又會啟動。

在pause-if-all-down mode下，RabbitMQ在集群中的節(jié)點(diǎn)不能和list中的任何節(jié)點(diǎn)交互時(shí)才會關(guān)閉集群的節(jié)點(diǎn)（{pause_if_all_down, [nodes], ignore | autoheal}，list即[nodes]中的節(jié)點(diǎn)）。也就是說，只有在list中所有的節(jié)點(diǎn)失敗時(shí)才會關(guān)閉集群的節(jié)點(diǎn)。這個(gè)模式和pause-minority mode有點(diǎn)相似，但是，這個(gè)模式允許管理員的任命而挑選信任的節(jié)點(diǎn)，而不是根據(jù)上下文關(guān)系。舉個(gè)案例，一個(gè)集群，有四個(gè)節(jié)點(diǎn)，2個(gè)節(jié)點(diǎn)在A機(jī)架上，另2個(gè)節(jié)點(diǎn)在B機(jī)架上，此時(shí)A機(jī)架和B機(jī)架的連接丟失，那么根據(jù)pause-minority mode所有的節(jié)點(diǎn)都將被關(guān)閉。

在autoheal mode下，當(dāng)認(rèn)為發(fā)生網(wǎng)絡(luò)分區(qū)時(shí)，RabbitMQ會自動決定一個(gè)獲勝（winning）的分區(qū)，然后重啟不在這個(gè)分區(qū)中的節(jié)點(diǎn)。

一個(gè)獲勝的分區(qū)（a winning partition）是指客戶端連接最多的一個(gè)分區(qū)。（如果產(chǎn)生一個(gè)平局，即有兩個(gè)（或多個(gè)）分區(qū)的客戶端連接數(shù)一樣多，那么節(jié)點(diǎn)數(shù)最多的一個(gè)分區(qū)就是a winning partition. 如果此時(shí)節(jié)點(diǎn)數(shù)也一樣多，將會以一個(gè)未知的方式挑選winning partition.）

你可以通過在RabbitMQ配置文件中設(shè)置cluster_partition_handling參數(shù)使下面任何一種模式生效：

pause_minority
{pause_if_all_down, [nodes], ignore | autoheal}
autoheal

Which mode should I pick?
It’s important to understand that allowing RabbitMQ to deal with network partitions automatically does not make them less of a problem. Network partitions will always cause problems for RabbitMQ clusters; you just get some degree of choice over what kind of problems you get. As stated in the introduction, if you want to connect RabbitMQ clusters over generally unreliable links, you should use federation or the shovel.
With that said, you might wish to pick a recovery mode as follows:
● ignore - Your network really is reliable. All your nodes are in a rack, connected with a switch, and that switch is also the route to the outside world. You don’t want to run any risk of any of your cluster shutting down if any other part of it fails (or you have a two node cluster).
● pause_minority - Your network is maybe less reliable. You have clustered across 3 AZs in EC2, and you assume that only one AZ will fail at once. In that scenario you want the remaining two AZs to continue working and the nodes from the failed AZ to rejoin automatically and without fuss when the AZ comes back.
● autoheal - Your network may not be reliable. You are more concerned with continuity of service than with data integrity. You may have a two node cluster.

我該挑選那種模式？
有一點(diǎn)必須要清楚，允許RabbitMQ能夠自動的處理網(wǎng)絡(luò)分區(qū)并不一定會有正面的成效，也有能會帶來更多的問題。網(wǎng)絡(luò)分區(qū)會導(dǎo)致RabbitMQ集群產(chǎn)生眾多的問題，你需要對你所遇到的問題作出一定的選擇。就像本文開篇所說的，如果你置RabbitMQ集群于一個(gè)不可靠的網(wǎng)絡(luò)環(huán)境下，你需要使用federation或者shovel插件。

你可能選擇如下的恢復(fù)模式：

ignore: 你的網(wǎng)絡(luò)很可靠，所有的節(jié)點(diǎn)都在一個(gè)機(jī)架上，連接在同一個(gè)交換機(jī)上，這個(gè)交換機(jī)也連接在WAN上，你不需要冒險(xiǎn)而關(guān)閉部分節(jié)點(diǎn)。（或者適合只有兩個(gè)節(jié)點(diǎn)的集群。）
pause_minority: 你的網(wǎng)絡(luò)相對沒有那么的可靠。比如你在EC2上建立了三個(gè)節(jié)點(diǎn)的集群，假設(shè)其中一個(gè)節(jié)點(diǎn)宕了，在這種策略下，剩余的兩個(gè)節(jié)點(diǎn)還可以繼續(xù)工作，失敗的節(jié)點(diǎn)可以在恢復(fù)之后重新加入集群
autoheal: 你的網(wǎng)絡(luò)非常不可靠，你更關(guān)心服務(wù)的連續(xù)性而不是數(shù)據(jù)的完整性。適合有兩個(gè)節(jié)點(diǎn)的集群。

More about pause-minority mode
The Erlang VM on the paused nodes will continue running but the nodes will not listen on any ports or do any other work. They will check once per second to see if the rest of the cluster has reappeared, and start up again if it has.
Note that nodes will not enter the paused state at startup, even if they are in a minority then. It is expected that any such minority at startup is due to the rest of the cluster not having been started yet.
Also note that RabbitMQ will pause nodes which are not in a strict majority of the cluster - i.e. containing more than half of all nodes. It is therefore not a good idea to enable pause-minority mode on a cluster of two nodes since in the event of any network partition or node failure, both nodes will pause. However, pause_minoritymode is likely to be safer than ignore mode for clusters of more than two nodes, especially if the most likely form of network partition is that a single minority of nodes drops off the network.
Finally, note that pause_minority mode will do nothing to defend against partitions caused by cluster nodes being suspended. This is because the suspended node will never see the rest of the cluster vanish, so will have no trigger to disconnect itself from the cluster.

有關(guān)pause-minority模式的更多信息
關(guān)閉的RabbitMQ節(jié)點(diǎn)所在主機(jī)上的Erlang虛擬機(jī)還是在正常運(yùn)行，但是此節(jié)點(diǎn)并不會監(jiān)聽任何端口也不會執(zhí)行其他任務(wù)。這些節(jié)點(diǎn)每秒會檢測一次剩下的集群節(jié)點(diǎn)是否會再次出現(xiàn)，如果出現(xiàn)，就啟動自己繼續(xù)運(yùn)行。

注意上面所說的“關(guān)閉的RabbitMQ節(jié)點(diǎn)”并不會在啟動時(shí)就進(jìn)入關(guān)閉狀態(tài)，即使它們在“少數(shù)派（minority）”。這些“少數(shù)派”可能在“剩余的集群節(jié)點(diǎn)”沒有啟動好之前就啟動了。

同樣需要注意的是RabbitMQ也會關(guān)閉不是嚴(yán)格意義上的“大多數(shù)（majority）”——數(shù)量超過集群的一半。因此在一個(gè)集群只有兩個(gè)節(jié)點(diǎn)的時(shí)候并不適合采用pause-minority模式，因?yàn)橛捎谄渲腥魏我粋€(gè)節(jié)點(diǎn)失敗而發(fā)生網(wǎng)絡(luò)分區(qū)時(shí)，兩個(gè)節(jié)點(diǎn)都會被關(guān)閉。然而如果集群中的節(jié)點(diǎn)個(gè)數(shù)遠(yuǎn)大于兩個(gè)時(shí)，pause_minority模式比ignore模式更加的可靠，特別是網(wǎng)絡(luò)分區(qū)通常是由于單個(gè)節(jié)點(diǎn)掉出網(wǎng)絡(luò)。

最后，需要注意的是pause_minority模式將不會防止由于集群節(jié)點(diǎn)被掛起而導(dǎo)致的分區(qū)。這是因?yàn)閽炱鸬墓?jié)點(diǎn)將永遠(yuǎn)不會看到集群的其余部分的消失，因此將沒有觸發(fā)器將其從集群中斷開。

總結(jié)

以上是生活随笔為你收集整理的RabbitMQ Network Partitions的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。