dubbo在redis注册中心下 ReconnectTimerTask 不停重连provider 问题
問(wèn)題描述 : 使用redis 注冊(cè)中心時(shí), dubbo消費(fèi)端一直不停重試reconnect dubbo provider, 并報(bào)錯(cuò);
[DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [10.1.1.12:0 -> /10.1.1.228:20888]], dubbo version: 2.7.3, current host: 10.1.1.12
2019-08-30 20:33:52.283 [/] httpwrapper [dubbo-client-idleCheck-thread-1] ERROR o.a.d.r.e.s.h.ReconnectTimerTask - [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [10.1.1.12:0 -> /10.1.1.228:20888]], dubbo version: 2.7.3, current host: 10.1.1.12
org.apache.dubbo.remoting.RemotingException: client(url: dubbo://10.1.1.228:20888/com.cxq56.service.GeoService?actives=0&anyhost=true&application=httpwrapper&async=false&bean.name=providers:dubbo:com.cxq56.service.GeoService&check=false&cluster=failover&codec=dubbo&default.deprecated=false&default.dynamic=false&default.register=true&default.retries=1&default.timeout=10000&deprecated=false&dubbo=2.0.2&dynamic=false&generic=false&heartbeat=60000&interface=com.cxq56.service.GeoService&lazy=false&loadbalance=random&methods=createForbiddenGeo,calculatedDistance,createSiteInfo,getSiteAndDistance,getAllGeoByCityId,searchForPOI,createGeo&pid=1&qos.enable=false®ister=true®ister.ip=10.1.1.12&release=2.7.1&remote.application=geo-provider&retries=0&revision=1.0-SNAPSHOT&shutwait=40000&side=consumer&sticky=false&timeout=3000×tamp=1567049198218&validation=false) failed to connect to server /10.1.1.228:20888 client-side timeout 3000ms (elapsed: 3000ms) from netty client 10.1.1.12 using dubbo version 2.7.3
at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:171)
at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:190)
at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:246)
at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:155)
at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49)
at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87)
at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:648)
at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:727)
at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:449)
at java.lang.Thread.run(Thread.java:748)
先說(shuō)結(jié)論,這種異常有兩種情況:
1. dubbo消費(fèi)端啟動(dòng)時(shí),獲取到一些已經(jīng)過(guò)期的provider注冊(cè)信息,并嘗試重新連接。
2, dubbo消費(fèi)端,dubbo provider正常運(yùn)行過(guò)程中,provider突然非正常停機(jī),導(dǎo)致的不停嘗試重新連接。(當(dāng)然一般情況下, 非正常停機(jī)是不能容忍的)
背景:
1. dubbo針對(duì)服務(wù)可用性的檢測(cè)有自己的實(shí)現(xiàn)機(jī)制, 主要通過(guò)ReconnectTimerTask來(lái)實(shí)現(xiàn)定時(shí)重連, 確保服務(wù)可用;
2. 另一方面, dubbo在使用redis注冊(cè)中心注冊(cè)時(shí), 會(huì)往redis寫(xiě)入一個(gè)hash值, key為service接口名, 如"/dubbo/com.sample.configmgmt.api.CustomerServiceApi",field字段是方法接口的一些基本信息如:“dubbo://127.0.0.1:20881/com.sample.configmgmt.api.OrgServiceApi?accepts=0&accesslog=true&anyhost=true&application=configmgmt-service&bean.name=ServiceBean:com.cxq56.configmgmt.api.OrgServiceApi&cluster=failover&deprecated=false&dubbo=2.0.2&dump.directory=/tmp&dynamic=true&generic=false&interface=com.cxq56.configmgmt.api.OrgServiceApi&loadbalance=random&methods=createOrg,getOrg,updateOrg,listOrg,pageOrg&pid=1®ister=true&release=2.7.3&retries=0&revision=1.0-SNAPSHOT&shutwait=40000&side=provider&timeout=3000×tamp=1567416954354&token=true”,value是一個(gè)定時(shí)更新的時(shí)間戳,由一個(gè)單獨(dú)的線程來(lái)維護(hù); 當(dāng)有別的服務(wù)通過(guò)注冊(cè)中心獲取可用的dubbo服務(wù)時(shí), 如果value的時(shí)間戳小于當(dāng)前日期, 那么就會(huì)認(rèn)定這個(gè)服務(wù)時(shí)過(guò)期的, 不可用的;
針對(duì)第二種情況,當(dāng)有新的provider注冊(cè)并發(fā)送notify消息是,Notify線程會(huì)執(zhí)行一系列操作,RegistryDirectory 會(huì)更新最新可用的provider信息(refreshInvoker),這樣就能把對(duì)應(yīng)過(guò)期dubbo服務(wù)的ReconnectTimerTask注銷(xiāo)掉。
經(jīng)過(guò)查看,本次我們出現(xiàn)的問(wèn)題,是屬于第一種情況; 當(dāng)dubbo消費(fèi)端啟動(dòng)時(shí),獲取到的服務(wù)端注冊(cè)信息,即使有些接口的時(shí)間戳已經(jīng)過(guò)期了,但是還是嘗試重新連接;者和我預(yù)期的完全不同;這是因?yàn)閐ubbo服務(wù)端有個(gè)關(guān)鍵的參數(shù)設(shè)置的有問(wèn)題。
HeaderExchangeClient.java :心跳機(jī)制和重連機(jī)制的啟動(dòng)器
private final Client client;
private final ExchangeChannel channel;
private static final HashedWheelTimer IDLE_CHECK_TIMER = new HashedWheelTimer(
new NamedThreadFactory("dubbo-client-idleCheck", true), 1, TimeUnit.SECONDS, TICKS_PER_WHEEL);
private HeartbeatTimerTask heartBeatTimerTask;
private ReconnectTimerTask reconnectTimerTask;
public HeaderExchangeClient(Client client, boolean startTimer) {
Assert.notNull(client, "Client can't be null");
this.client = client;
this.channel = new HeaderExchangeChannel(client);
if (startTimer) {
URL url = client.getUrl();
startReconnectTask(url);
startHeartBeatTask(url);
}
}
可以看到使用的是HashedWheelTimer來(lái)定時(shí)輪詢(xún)的;這里的reConnectTask如果失敗,就會(huì)打印出我們的異常日志;而且失敗后不會(huì)停止重試,會(huì)一直嘗試下去;那么這里有一個(gè)問(wèn)題,是否redis有的歷史注冊(cè)信息,consumer都會(huì)去嘗試reconnect呢?
所以我們嘗試打個(gè)斷點(diǎn)嘗試分析一下;并往上追述可以發(fā)現(xiàn)當(dāng)dubbo consumer啟動(dòng)時(shí)會(huì)在redis中注冊(cè)本身的消費(fèi)端信息,同時(shí)也會(huì)通過(guò)接口名獲取所有provider注冊(cè)信息,并在RedisRegistry.java(我用的是redis注冊(cè)中心)中進(jìn)行過(guò)濾,代碼如下:
RedisRegistry.java:實(shí)現(xiàn)注冊(cè),取消注冊(cè),以及訂閱,取消訂閱
private void doNotify(Jedis jedis, Collection<String> keys, URL url, Collection<NotifyListener> listeners) {
if (keys == null || keys.isEmpty()
|| listeners == null || listeners.isEmpty()) {
return;
}
long now = System.currentTimeMillis();
List<URL> result = new ArrayList<>();
List<String> categories = Arrays.asList(url.getParameter(CATEGORY_KEY, new String[0]));
String consumerService = url.getServiceInterface();
for (String key : keys) {
if (!ANY_VALUE.equals(consumerService)) {
String providerService = toServiceName(key);
if (!providerService.equals(consumerService)) {
continue;
}
}
String category = toCategoryName(key);
if (!categories.contains(ANY_VALUE) && !categories.contains(category)) {
continue;
}
List<URL> urls = new ArrayList<>();
Map<String, String> values = jedis.hgetAll(key);
if (CollectionUtils.isNotEmptyMap(values)) {
for (Map.Entry<String, String> entry : values.entrySet()) {
URL u = URL.valueOf(entry.getKey());
//如果dynamic為false 或者 過(guò)期時(shí)間 大于 當(dāng)前時(shí)間 就加入這個(gè)注冊(cè)u(píng)rl,后面進(jìn)行reconnect
if (!u.getParameter(DYNAMIC_KEY, true)
|| Long.parseLong(entry.getValue()) >= now) {
if (UrlUtils.isMatch(url, u)) {
urls.add(u);
}
}
}
}
if (urls.isEmpty()) {
urls.add(URLBuilder.from(url)
.setProtocol(EMPTY_PROTOCOL)
.setAddress(ANYHOST_VALUE)
.setPath(toServiceName(key))
.addParameter(CATEGORY_KEY, category)
.build());
}
result.addAll(urls);
if (logger.isInfoEnabled()) {
logger.info("redis notify: " + key + " = " + urls);
}
}
if (CollectionUtils.isEmpty(result)) {
return;
}
for (NotifyListener listener : listeners) {
notify(url, listener, result);
}
}
provider注冊(cè)信息的過(guò)濾條件是,dynamic為true且過(guò)期時(shí)間小于當(dāng)前時(shí)間,一般舊的注冊(cè)數(shù)據(jù)的過(guò)期時(shí)間肯定都會(huì)小于當(dāng)前時(shí)間(這種數(shù)據(jù)算是臟數(shù)據(jù),優(yōu)雅停機(jī)和dubbo monitor都可以移除),源頭就在這個(gè)dynamic上,由于這個(gè)provider使用的dubbo版本是2.7.1,有一個(gè)bug,默認(rèn)的dynamic的值為false,直接導(dǎo)致現(xiàn)在的問(wèn)題;另外這個(gè)dynamic的官方文檔解釋的意思是 "服務(wù)是否動(dòng)態(tài)注冊(cè),如果設(shè)為false,注冊(cè)后將顯示后disable狀態(tài),需人工啟用,并且服務(wù)提供者停止時(shí),也不會(huì)自動(dòng)取消冊(cè),需人工禁用。" 但是并沒(méi)有說(shuō),consumer會(huì)一直重連。
重連代碼如下
/**
* ReconnectTimerTask
*/
public class ReconnectTimerTask extends AbstractTimerTask {
private static final Logger logger = LoggerFactory.getLogger(ReconnectTimerTask.class);
private final int idleTimeout;
public ReconnectTimerTask(ChannelProvider channelProvider, Long heartbeatTimeoutTick, int idleTimeout) {
super(channelProvider, heartbeatTimeoutTick);
this.idleTimeout = idleTimeout;
}
//2.7.3版本默認(rèn)每分鐘執(zhí)行一次
@Override
protected void doTask(Channel channel) {
try {
Long lastRead = lastRead(channel);
Long now = now();
// Rely on reconnect timer to reconnect when AbstractClient.doConnect fails to init the connection
//如果此時(shí)channel已經(jīng)斷開(kāi)了,那么立即重連
if (!channel.isConnected()) {
try {
logger.info("Initial connection to " + channel);
((Client) channel).reconnect();
} catch (Exception e) {
logger.error("Fail to connect to " + channel, e);
}
// check pong at client
//如果此時(shí)channel沒(méi)有斷開(kāi),但是從上次
} else if (lastRead != null && now - lastRead > idleTimeout) {
logger.warn("Reconnect to channel " + channel + ", because heartbeat read idle time out: "
+ idleTimeout + "ms");
try {
((Client) channel).reconnect();
} catch (Exception e) {
logger.error(channel + "reconnect failed during idle time.", e);
}
}
} catch (Throwable t) {
logger.warn("Exception when reconnect to remote channel " + channel.getRemoteAddress(), t);
}
}
}
-----------------------------------------------------------------------------------------------------------end------------------------------------------------------------------------
總結(jié)
以上是生活随笔為你收集整理的dubbo在redis注册中心下 ReconnectTimerTask 不停重连provider 问题的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 丰田全新皇冠曝光!网友换标比亚迪汉毫无违
- 下一篇: 贵南高铁贵州段隧道全部贯通:全长199.