Zookeeper的一次迁移故障
生活随笔
收集整理的這篇文章主要介紹了
Zookeeper的一次迁移故障
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
前階段同事遷移Zookeeper(是給Kafka使用的以及flume使用)后發現所有Flume-producer/consumer端集體報錯:
| 1 2 3 4 | 07?Jan?2014?01:19:32,571?INFO? [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect:1058)? - Opening socket connection to server xxx:2181 07?Jan?2014?01:19:32,572?INFO? [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection:947)? - Socket connection established to xxx:2181, initiating session 07?Jan?2014?01:19:32,573?INFO? [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.run:1183)? - Unable to read additional data from server sessionid?0x142f42b91871911, likely server has closed socket, closing socket connection and attempting reconnect 07?Jan?2014?01:19:32,845?INFO? [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect:1058)? - Opening socket connection to server xxx:2181 |
一直在不斷的重試連接失敗再重試,問同事說:網路連通性早就驗證過,然后查看server端日志發現:
| 1 2 3 4 5 6 7 8 | 2014-01-06?23:59:59,987?[myid:1] - INFO? [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx:45282 2014-01-06?23:59:59,987?[myid:1] - WARN? [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old client xxx:45282; will be dropped?if?server is in r-o mode 2014-01-06?23:59:59,987?[myid:1] - INFO? [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@812] - Refusing session request?for?client xxx:45282?as it has seen zxid?0x60fd15564?our last zxid is?0x10000000f?client must?try?another server 2014-01-06?23:59:59,987?[myid:1] - INFO? [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection?for?client xxx:45282?(no se ssion established?for?client) 2014-01-06?23:59:59,989?[myid:1] - INFO? [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from xxx:45285 |
發現Flume還是保留原來的zxid,但是現在的zxid竟然是0,所以拋出異常!
| 1 2 3 4 5 6 7 8 9 10 11 | if?(connReq.getLastZxidSeen() > zkDb.dataTree.lastProcessedZxid) { ????????????String msg =?"Refusing session request for client " ????????????????+ cnxn.getRemoteSocketAddress() ????????????????+?" as it has seen zxid 0x" ????????????????+ Long.toHexString(connReq.getLastZxidSeen()) ????????????????+?" our last zxid is 0x" ????????????????+ Long.toHexString(getZKDatabase().getDataTreeLastProcessedZxid()) ????????????????+?" client must try another server"; ????????????LOG.info(msg); ????????????throw?new?CloseRequestException(msg); ????????} |
? ?后來問同事是怎么做的遷移:先啟動一套新的集群,然后關閉老的集群,同時在老集群的一個IP:2181起了一個haproxy代理新集群以為這樣,可以做到透明遷移=。=,其實是觸發了ZK的bug-832導致不停的重試連接,只有重啟flume才可以解決
? ?正確的遷移方式是,把新集群加入老集群,然后修改Flume配置等一段時間(flume自動reconfig)后再關閉老集群就不會觸發這個問題了.
本文轉自MIKE老畢 51CTO博客,原文鏈接:http://blog.51cto.com/boylook/1365364,如需轉載請自行聯系原作者
總結
以上是生活随笔為你收集整理的Zookeeper的一次迁移故障的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 第三章 安装apache
- 下一篇: 通过OWA修改密码,提示输入的密码不符合