Skip to content

Commit

Permalink
[java] Attempt to deflake SecureKuduSinkTest
Browse files Browse the repository at this point in the history
Occasionally, SecureKuduSinkTest fails when running with TSAN binaries
because of the following sequence of operations:

1. The Kerberos ticket lifetime is set to 10s.
2. The test sets up a mini kudu cluster. This first sets up the KDC,
   which creates credentials for all of the Kudu servers and kinits
   using test user credentials for the test process.
3. The setup of the cluster takes > 10s.
4. At the end of the cluster setup, the test checks that setup succeeded
   in part by issuing a ListTabletServers RPC. This fails because the test
   user ticket has expired.
5. The test fails because it can't set up the cluster.

The failure looks like

21:50:06.500 [ERROR - main] (RetryRule.java:217) org.apache.kudu.flume.sink.SecureKuduSinkTest.testEventsWithShortTickets: failed attempt 1
java.io.IOException: ListTabletServers RPC failed: Client connection negotiation failed: client connection to 127.12.111.60:36425: server requires authentication, but client does not have Kerberos credentials available
	at org.apache.kudu.test.cluster.MiniKuduCluster.sendRequestToCluster(MiniKuduCluster.java:169) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
	at org.apache.kudu.test.cluster.MiniKuduCluster.start(MiniKuduCluster.java:234) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
	at org.apache.kudu.test.cluster.MiniKuduCluster.access$300(MiniKuduCluster.java:71) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
	at org.apache.kudu.test.cluster.MiniKuduCluster$MiniKuduClusterBuilder.build(MiniKuduCluster.java:658) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
	at org.apache.kudu.test.KuduTestHarness.before(KuduTestHarness.java:140) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46) ~[junit-4.12.jar:4.12]
	at org.apache.kudu.test.junit.RetryRule$RetryStatement.doOneAttempt(RetryRule.java:215) [kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
	at org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:232) [kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
        ...

This patch attempts to deflake the test a bit by doubling the ticket
lifetime to 20s. It also raises the renewal lifetime to 35s from 30s,
to provide a bit of extra time between the ticket expiring and when
Flume needs to renew the ticket. Before, the test waited 2x the
renewable ticket lifetime. I made it so the test waits until the
renewable ticket lifetime plus one second has passed, including the
time spent in the test so far.

I tried to test this on dist-test using TSAN binaries. With the new
patch I saw 0/1000 failures, but without it I saw 830/1000 failures.
That's *way* flakier than any previous indication, so I don't trust
those results. The failures I sampled did seem to be related to the same
issue, but it was ConnectToCluster RPCs failing instead.

Change-Id: Icc936878d7f1496905e83ddaf93b9b049f417f72
Reviewed-on: http://gerrit.cloudera.org:8080/13454
Reviewed-by: Adar Dembo <[email protected]>
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
  • Loading branch information
wdberkeley committed May 30, 2019
1 parent a1bbd3d commit 3cbc025
Showing 1 changed file with 9 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
Expand All @@ -49,8 +51,8 @@

public class SecureKuduSinkTest {
private static final Logger LOG = LoggerFactory.getLogger(SecureKuduSinkTest.class);
private static final int TICKET_LIFETIME_SECONDS = 10;
private static final int RENEWABLE_LIFETIME_SECONDS = 30;
private static final int TICKET_LIFETIME_SECONDS = 20;
private static final int RENEWABLE_LIFETIME_SECONDS = 35;

private static final MiniKuduClusterBuilder clusterBuilder = KuduTestHarness.getBaseClusterBuilder()
.kdcTicketLifetime(TICKET_LIFETIME_SECONDS + "s")
Expand All @@ -68,6 +70,7 @@ public void clearTicketCacheProperty() {

@Test
public void testEventsWithShortTickets() throws Exception {
Instant start = Instant.now();
LOG.info("Creating new table...");
ArrayList<ColumnSchema> columns = new ArrayList<>(1);
columns.add(new ColumnSchema.ColumnSchemaBuilder("payload", Type.BINARY).key(true).build());
Expand All @@ -88,7 +91,10 @@ public void testEventsWithShortTickets() throws Exception {
processEvents(sink, 0, eventCount / 2);

LOG.info("Waiting for tickets to expire");
TimeUnit.SECONDS.sleep(RENEWABLE_LIFETIME_SECONDS * 2);
Duration elapsedSoFar = Duration.between(Instant.now(), start);
TimeUnit.MILLISECONDS.sleep(1000 * (RENEWABLE_LIFETIME_SECONDS + 1) - elapsedSoFar.toMillis());
// At this point, the ticket will have been outstanding for at least
// (RENEWABLE_LIFETIME_SECONDS + 1) seconds-- so the sink will need to reacquire a ticket.

LOG.info("Testing events after ticket renewal.");
processEvents(sink, eventCount / 2, eventCount);
Expand Down

0 comments on commit 3cbc025

Please sign in to comment.