Skip to content

Commit

Permalink
Reduce severity of "Error trying to read ahead of the log" log message
Browse files Browse the repository at this point in the history
In clusters under load, it is typical to see the ERROR log dominated by
messages like the following:

E1221 09:09:13.869918 143384 consensus_queue.cc:440] T 1d030514317942ec9d7796a8a029dace P a4eea0affa4545879c8988b24d8cb2bb [LEADER]: Error trying to read ahead of the log while preparing peer request: Incomplete: Op with index 6620 is ahead of the local log (next sequential op: 6620). Destination peer: Peer: c0a2e34b708845efb8f090459815272c, Is new: false, Last received: 2995.6619, Next index: 6621, Last known committed idx: 6620, Last exchange result: ERROR, Needs tablet copy: false

This message is logged at the error level, and looks a little bit scary,
but it is in fact harmless. Here's what happens:

1. A leader neglects its duties and the followers elect a new leader.
2. The new leader manages to replicate more ops (usually just the NO_OP
   asserting leadership).
3. The leader of the previous term tries to replicate an op to a peer in
   the new term.
4. From the response, it founds out that
   a. It is in an earlier term, so it should step down and increment its
      term.
   b. The last op the peer saw is (leader's index + k) for some k > 0
      (usually k = 1). So the leader will attempt to send up ops of
      index up through (leader's index + k).
5. The term change is asynchronous, and before it happens the leader
   tries to prepare a new request to the peer whose log is ahead of the
   local log. This causes the ERROR message.
6. The term change happens. The leader steps down, and life goes on.

Note that the leader should also have received a VoteRequest with the
new term and an UpdateConsensus with the new term from the leader. In
general, this message appears only when the leader is under enough
stress to lose its leadership and be unable to service some consensus
RPCs in a timely manner. It is not in an of itself a problem, and it's a
decent indicator of stress on the leader, so I am leaving the message
but downgrading it to INFO level.

See KUDU-1078 for more information about this situation, especially its
previous causes (which were at one time actually harmful).

Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Reviewed-on: http://gerrit.cloudera.org:8080/12185
Reviewed-by: Adar Dembo <[email protected]>
Reviewed-by: Mike Percy <[email protected]>
Tested-by: Will Berkeley <[email protected]>
  • Loading branch information
wdberkeley committed Jan 9, 2019
1 parent 17b5efd commit 6103f87
Showing 1 changed file with 6 additions and 5 deletions.
11 changes: 6 additions & 5 deletions src/kudu/consensus/consensus_queue.cc
Original file line number Diff line number Diff line change
Expand Up @@ -696,11 +696,12 @@ Status PeerMessageQueue::RequestForPeer(const string& uuid,
}
if (s.IsIncomplete()) {
// IsIncomplete() means that we tried to read beyond the head of the log
// (in the future). See KUDU-1078.
LOG_WITH_PREFIX_UNLOCKED(ERROR) << "Error trying to read ahead of the log "
<< "while preparing peer request: "
<< s.ToString() << ". Destination peer: "
<< peer_copy.ToString();
// (in the future). This is usually a sign that this peer is under load
// and is about to step down as leader. See KUDU-1078.
LOG_WITH_PREFIX_UNLOCKED(INFO) << "Error trying to read ahead of the log "
<< "while preparing peer request: "
<< s.ToString() << ". Destination peer: "
<< peer_copy.ToString();
return s;
}
LOG_WITH_PREFIX_UNLOCKED(FATAL) << "Error reading the log while preparing peer request: "
Expand Down

0 comments on commit 6103f87

Please sign in to comment.