Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
java: reduce scary exception spew and improve logging
In real environments there are various types of failures that are "expected". In particular, it's not unlikely for the client to occasionally connect to a server which has gone down or which is not the leader for a given tablet anymore. In that case, we should make sure that the logging output is easy to read and doesn't include irrelevant stack traces stemming from Netty internals. This patch addresses several of those cases. In particular, it gets rid of "unexpected exception" messages (and associated stacks) in the case of "connection refused" errors. It also helpfully includes the underlying error when invalidating the tablet cache for a location. There's a relatively simple new unit test, and I also ran some before/after comparisons using the "loadgen" sample from the kudu-examples repo against a local cluster with one TS: Case 1: Master down when we start the client: ============================================================ Before the patch -------------- [New I/O worker apache#12] WARN org.apache.kudu.client.ConnectToCluster - Error receiving response from localhost:7051 org.apache.kudu.client.RecoverableException: connection closed at org.apache.kudu.client.Connection.channelClosed(Connection.java:254) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:238) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O boss apache#17] ERROR org.apache.kudu.client.Connection - [peer master-localhost:7051(localhost:7051)] unexpected exception from downstream on [id: 0xf9759051] java.net.ConnectException: Connection refused: localhost/127.0.0.1:7051 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [New I/O worker apache#12] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff After the patch -------------- [New I/O boss apache#17] INFO org.apache.kudu.client.Connection - Failed to connect to peer master-localhost:7051(localhost:7051): Connection refused: localhost/127.0.0.1:7051 [New I/O worker apache#1] INFO org.apache.kudu.client.ConnectToCluster - Unable to connect to master localhost:7051: Connection refused: localhost/127.0.0.1:7051 [New I/O worker apache#1] WARN org.apache.kudu.client.ConnectToCluster - Unable to find the leader master localhost:7051; will retry ... repeats with exponential backoff Case 2: Tserver is down ============================================================ Before the patch ---------------- [New I/O worker apache#11] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache 9da164fbcd93404ea8e38e9491eb3fa8 [New I/O boss apache#17] ERROR org.apache.kudu.client.Connection - [peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050)] unexpected exception from downstream on [id: 0x270e3a82] java.net.ConnectException: Connection refused: todd-laptop/127.0.1.1:7050 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ... repeats with exponential backoff After the patch --------------- [New I/O boss apache#17] INFO org.apache.kudu.client.Connection - Failed to connect to peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050): Connection refused: todd-laptop/127.0.1.1:7050 [New I/O worker apache#15] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050) for tablet 478ddbaafb5b494ea0de8e2d0ed45a00: Connection refused: todd-laptop/127.0.1.1:7050 ... repeats with exponential backoff Case 3: A tablet is bad ============================================================ In this case I manually delete a non-replicated tablet that is being written to using 'kudu remote_replica delete'. This is unrecoverable since there is no other replica. Before the patch ----------------- [New I/O worker apache#10] INFO org.apache.kudu.client.AsyncKuduClient - Removing server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache a633f4a0c14b4404a8ae1f825b378867 ... repeats with exponential backoff After the patch ---------------- [New I/O worker apache#9] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating location 36622c46b25f4da9ada43f8591728053(todd-laptop:7050) for tablet c0b9488597e8447595ad0ee0a3378f95: Tablet not RUNNING: STOPPED ... repeats with exponential backoff Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b Reviewed-on: http://gerrit.cloudera.org:8080/9644 Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]>
- Loading branch information