Skip to content

Commit

Permalink
[mini-cluster] fix GetLeaderMasterIndex()
Browse files Browse the repository at this point in the history
Prior to this patch, when running external mini-cluster in
BindMode::UNIQUE_LOOPBACK mode, in rare cases masters might be bound
to the same port number at different loopback IP addresses.  In such
cases, GetLeaderMasterIndex() might return incorrect results.

I saw a manifestation of such an issue with failure of the
MultiMasterIdleConnectionsITest.ClientReacquiresAuthnToken scenario at
http://dist-test.cloudera.org/job?job_id=jenkins-slave.1592866280.18987

I think in that run the leader master index was detected as 0 (due to
the issue this patch fixes), but in reality the index was 1, so all
the attempts to start an election with current leader master didn't
change the leadership role, where the leader master output messages
like the following for every RunLeaderElectionRequest() call:

  I0622 22:58:08.909113 20844 raft_consensus.cc:461] T 00000000000000000000000000000000 P d31fb2325643457e8787dc92f8c3a4cf [term 1 LEADER]: Not starting forced leader election -- already a leader
  src/kudu/integration-tests/auth_token_expire-itest.cc:587: Failure

Eventually, the test failed with the message:
  Expected: (former_leader_master_idx) != (idx), actual: 1 vs 1

Change-Id: I3f445e587fe2152efbb67e4a36d36c760b486dac
Reviewed-on: http://gerrit.cloudera.org:8080/16106
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Alexey Serbin <[email protected]>
  • Loading branch information
alexeyserbin committed Jun 24, 2020
1 parent 2bd3d82 commit cd37d7c
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions src/kudu/mini-cluster/external_mini_cluster.cc
Original file line number Diff line number Diff line change
Expand Up @@ -757,16 +757,18 @@ Status ExternalMiniCluster::GetLeaderMasterIndex(int* idx) {
Synchronizer sync;
vector<pair<Sockaddr, string>> addrs_with_names;
Sockaddr leader_master_addr;
string leader_master_hostname;
MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(5);

for (const scoped_refptr<ExternalMaster>& master : masters_) {
addrs_with_names.emplace_back(master->bound_rpc_addr(), master->bound_rpc_addr().host());
}
const auto& cb = [&](const Status& status,
const pair<Sockaddr, string>& leader_master,
const master::ConnectToMasterResponsePB& resp) {
const master::ConnectToMasterResponsePB& /*resp*/) {
if (status.ok()) {
leader_master_addr = leader_master.first;
leader_master_hostname = leader_master.second;
}
sync.StatusCB(status);
};
Expand All @@ -782,7 +784,13 @@ Status ExternalMiniCluster::GetLeaderMasterIndex(int* idx) {
RETURN_NOT_OK(sync.Wait());
bool found = false;
for (int i = 0; i < masters_.size(); i++) {
if (masters_[i]->bound_rpc_hostport().port() == leader_master_addr.port()) {
const auto& bound_hp = masters_[i]->bound_rpc_hostport();
// If using BindMode::UNIQUE_LOOPBACK mode, in rare cases different masters
// might bind to different local IP addresses but use same port numbers.
// So, it's necessary to check both the returned hostnames and IP addresses
// to point to leader master.
if (bound_hp.port() == leader_master_addr.port() &&
bound_hp.host() == leader_master_hostname) {
found = true;
*idx = i;
break;
Expand Down

0 comments on commit cd37d7c

Please sign in to comment.