Add retry to tolerate the offload index file read failure (apache#12452)

* Add retry to tolerate the offload index file read failure --- *Motivation* We met the ReadLedgerMetadata exception when reading the index file. The index file only read once, so it may not read all the data from the stream and cause the metadata read failed. We need to ensure the all data is read from the stream or the stream is end. When the stream is end, we will receive the EOF exception, so we need to use `readFully` not `read`. Add the retry logic to tolerate the failure cause by the network. Because the stream is from the HTTP, so it's may break on some case. Add a small retry to avoid it to backoff by the dispatcher. *Modifications* - Use `readFully` to replace the `read` method - Add a small retry for handling the index block build * Add comments and enrich log
bestming8 · Oct 22, 2021 · 33bcc17 · 33bcc17
1 parent b4d05ac
commit 33bcc17
Show file tree

Hide file tree

Showing 2 changed files with 27 additions and 11 deletions.
diff --git a/...java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreBackedReadHandleImpl.java b/...java/org/apache/bookkeeper/mledger/offload/jcloud/impl/BlobStoreBackedReadHandleImpl.java
@@ -224,12 +224,32 @@ public static ReadHandle open(ScheduledExecutorService executor,
                                   VersionCheck versionCheck,
                                   long ledgerId, int readBufferSize)
             throws IOException {
-        Blob blob = blobStore.getBlob(bucket, indexKey);
-        versionCheck.check(indexKey, blob);
-        OffloadIndexBlockBuilder indexBuilder = OffloadIndexBlockBuilder.create();
-        OffloadIndexBlock index;
-        try (InputStream payLoadStream = blob.getPayload().openStream()) {
-            index = (OffloadIndexBlock) indexBuilder.fromStream(payLoadStream);
+        int retryCount = 3;
+        OffloadIndexBlock index = null;
+        IOException lastException = null;
+        // The following retry is used to avoid to some network issue cause read index file failure.
+        // If it can not recovery in the retry, we will throw the exception and the dispatcher will schedule to
+        // next read.
+        // If we use a backoff to control the retry, it will introduce a concurrent operation.
+        // We don't want to make it complicated, because in the most of case it shouldn't in the retry loop.
+        while (retryCount-- > 0) {
+            Blob blob = blobStore.getBlob(bucket, indexKey);
+            versionCheck.check(indexKey, blob);
+            OffloadIndexBlockBuilder indexBuilder = OffloadIndexBlockBuilder.create();
+            try (InputStream payLoadStream = blob.getPayload().openStream()) {
+                index = (OffloadIndexBlock) indexBuilder.fromStream(payLoadStream);
+            } catch (IOException e) {
+                // retry to avoid the network issue caused read failure
+                log.warn("Failed to get index block from the offoaded index file {}, still have {} times to retry",
+                    indexKey, retryCount, e);
+                lastException = e;
+                continue;
+            }
+            lastException = null;
+            break;
+        }
+        if (lastException != null) {
+            throw lastException;
         }
 
         BackedInputStream inputStream = new BlobStoreBackedInputStreamImpl(blobStore, bucket, key,

diff --git a/...rc/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/OffloadIndexBlockImpl.java b/...rc/main/java/org/apache/bookkeeper/mledger/offload/jcloud/impl/OffloadIndexBlockImpl.java
@@ -338,11 +338,7 @@ private OffloadIndexBlock fromStream(DataInputStream dis) throws IOException {
         int segmentMetadataLength = dis.readInt();
 
         byte[] metadataBytes = new byte[segmentMetadataLength];
-
-        if (segmentMetadataLength != dis.read(metadataBytes)) {
-            log.error("Read ledgerMetadata from bytes failed");
-            throw new IOException("Read ledgerMetadata from bytes failed");
-        }
+        dis.readFully(metadataBytes);
         this.segmentMetadata = parseLedgerMetadata(metadataBytes);
 
         for (int i = 0; i < indexEntryCount; i++) {