Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[java] Attempt to deflake SecureKuduSinkTest
Occasionally, SecureKuduSinkTest fails when running with TSAN binaries because of the following sequence of operations: 1. The Kerberos ticket lifetime is set to 10s. 2. The test sets up a mini kudu cluster. This first sets up the KDC, which creates credentials for all of the Kudu servers and kinits using test user credentials for the test process. 3. The setup of the cluster takes > 10s. 4. At the end of the cluster setup, the test checks that setup succeeded in part by issuing a ListTabletServers RPC. This fails because the test user ticket has expired. 5. The test fails because it can't set up the cluster. The failure looks like 21:50:06.500 [ERROR - main] (RetryRule.java:217) org.apache.kudu.flume.sink.SecureKuduSinkTest.testEventsWithShortTickets: failed attempt 1 java.io.IOException: ListTabletServers RPC failed: Client connection negotiation failed: client connection to 127.12.111.60:36425: server requires authentication, but client does not have Kerberos credentials available at org.apache.kudu.test.cluster.MiniKuduCluster.sendRequestToCluster(MiniKuduCluster.java:169) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.cluster.MiniKuduCluster.start(MiniKuduCluster.java:234) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.cluster.MiniKuduCluster.access$300(MiniKuduCluster.java:71) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.cluster.MiniKuduCluster$MiniKuduClusterBuilder.build(MiniKuduCluster.java:658) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.KuduTestHarness.before(KuduTestHarness.java:140) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46) ~[junit-4.12.jar:4.12] at org.apache.kudu.test.junit.RetryRule$RetryStatement.doOneAttempt(RetryRule.java:215) [kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] at org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:232) [kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT] ... This patch attempts to deflake the test a bit by doubling the ticket lifetime to 20s. It also raises the renewal lifetime to 35s from 30s, to provide a bit of extra time between the ticket expiring and when Flume needs to renew the ticket. Before, the test waited 2x the renewable ticket lifetime. I made it so the test waits until the renewable ticket lifetime plus one second has passed, including the time spent in the test so far. I tried to test this on dist-test using TSAN binaries. With the new patch I saw 0/1000 failures, but without it I saw 830/1000 failures. That's *way* flakier than any previous indication, so I don't trust those results. The failures I sampled did seem to be related to the same issue, but it was ConnectToCluster RPCs failing instead. Change-Id: Icc936878d7f1496905e83ddaf93b9b049f417f72 Reviewed-on: http://gerrit.cloudera.org:8080/13454 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]>
- Loading branch information