Skip to content

1.4.0-RC2

This fixes an issue discovered on a cluster due to the following
sequence of events:

- a block manager compacts a metadata file while starting up
- when it reopens the metadata file after replacing it with the
  compacted one, it gets a file_cache hit. Thus, the WritablePBContainer
  continues to write to the _deleted_ file instead of the compacted one.
  Metadata entries at this point are lost (which could cause block loss
  in the case of lost CREATE records, or dangling blocks in the case of
  lost DELETEs)
- if the server continues to run for a while, the FD will be evicted
  from the cache and eventually re-opened. At that point, a further
  DELETE record could end up writing to an offset past the end of the
  file, since the write offset was incremented by the "lost" records
  above.
- on the next restart, the metadata file would have a "gap" of zero
  bytes, which would surface as a checksum failure and failure to start
  up.

The fix is relatively simple: when we replace the metadata file we need
to invalidate and evict the cache entry so that when we "reopen", it
actually starts appending to the _new_ file and not the old deleted one.

The bulk of the changes here are to tests:
- the stress test now enforces a minimum number of live blocks before it
  starts deleting them. It also more aggressively compacts, and has a
  smaller cache. With these changes, I was sometimes able to reproduce
  the issue.
- A more targeted test issues a canned sequence of block creations and
  deletions that can reliably reproduce the above issue.

Change-Id: I491eacbad4750efedea854a2cc35b8ec994f9077
Reviewed-on: http://gerrit.cloudera.org:8080/7113
Reviewed-by: Adar Dembo <[email protected]>
Tested-by: Kudu Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/7125
Assets 2
Loading