Skip to content

Commit

Permalink
Add retry to tolerate the offload index file read failure (apache#12452)
Browse files Browse the repository at this point in the history
* Add retry to tolerate the offload index file read failure
---

*Motivation*

We met the ReadLedgerMetadata exception when reading the index
file. The index file only read once, so it may not read all the
data from the stream and cause the metadata read failed. We need
to ensure the all data is read from the stream or the stream is
end. When the stream is end, we will receive the EOF exception,
so we need to use `readFully` not `read`.

Add the retry logic to tolerate the failure cause by the network.
Because the stream is from the HTTP, so it's may break on some
case. Add a small retry to avoid it to backoff by the dispatcher.

*Modifications*

- Use `readFully` to replace the `read` method
- Add a small retry for handling the index block build

* Add comments and enrich log
  • Loading branch information
zymap authored Oct 22, 2021
1 parent b4d05ac commit 33bcc17
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -224,12 +224,32 @@ public static ReadHandle open(ScheduledExecutorService executor,
VersionCheck versionCheck,
long ledgerId, int readBufferSize)
throws IOException {
Blob blob = blobStore.getBlob(bucket, indexKey);
versionCheck.check(indexKey, blob);
OffloadIndexBlockBuilder indexBuilder = OffloadIndexBlockBuilder.create();
OffloadIndexBlock index;
try (InputStream payLoadStream = blob.getPayload().openStream()) {
index = (OffloadIndexBlock) indexBuilder.fromStream(payLoadStream);
int retryCount = 3;
OffloadIndexBlock index = null;
IOException lastException = null;
// The following retry is used to avoid to some network issue cause read index file failure.
// If it can not recovery in the retry, we will throw the exception and the dispatcher will schedule to
// next read.
// If we use a backoff to control the retry, it will introduce a concurrent operation.
// We don't want to make it complicated, because in the most of case it shouldn't in the retry loop.
while (retryCount-- > 0) {
Blob blob = blobStore.getBlob(bucket, indexKey);
versionCheck.check(indexKey, blob);
OffloadIndexBlockBuilder indexBuilder = OffloadIndexBlockBuilder.create();
try (InputStream payLoadStream = blob.getPayload().openStream()) {
index = (OffloadIndexBlock) indexBuilder.fromStream(payLoadStream);
} catch (IOException e) {
// retry to avoid the network issue caused read failure
log.warn("Failed to get index block from the offoaded index file {}, still have {} times to retry",
indexKey, retryCount, e);
lastException = e;
continue;
}
lastException = null;
break;
}
if (lastException != null) {
throw lastException;
}

BackedInputStream inputStream = new BlobStoreBackedInputStreamImpl(blobStore, bucket, key,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -338,11 +338,7 @@ private OffloadIndexBlock fromStream(DataInputStream dis) throws IOException {
int segmentMetadataLength = dis.readInt();

byte[] metadataBytes = new byte[segmentMetadataLength];

if (segmentMetadataLength != dis.read(metadataBytes)) {
log.error("Read ledgerMetadata from bytes failed");
throw new IOException("Read ledgerMetadata from bytes failed");
}
dis.readFully(metadataBytes);
this.segmentMetadata = parseLedgerMetadata(metadataBytes);

for (int i = 0; i < indexEntryCount; i++) {
Expand Down

0 comments on commit 33bcc17

Please sign in to comment.