Skip to content

Commit

Permalink
Disable buffering to disk for Azure output streams
Browse files Browse the repository at this point in the history
apache/hadoop@acffe20 introduced logic to ABFS Output streams buffering blocks to disk (by default).
This was introduced in Trino during Hadoop upgrade trinodb@343b908. We want to disable this and stick with the legacy behavior of keeping the blocks in memory
  • Loading branch information
anusudarsan authored and wendigo committed Jul 27, 2023
1 parent 02f1a63 commit 39458df
Showing 1 changed file with 5 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@

import static com.google.common.base.Preconditions.checkArgument;
import static java.lang.String.format;
import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.DATA_BLOCKS_BUFFER;
import static org.apache.hadoop.fs.store.DataBlocks.DATA_BLOCKS_BUFFER_ARRAY;

public class TrinoAzureConfigurationInitializer
implements ConfigurationInitializer
Expand Down Expand Up @@ -117,6 +119,9 @@ public void initializeConfiguration(Configuration config)

// do not rely on information returned from local system about users and groups
config.set("fs.azure.skipUserGroupMetadataDuringInitialization", "true");

// disable buffering Azure output streams to disk(default is DATA_BLOCKS_BUFFER_DISK)
config.set(DATA_BLOCKS_BUFFER, DATA_BLOCKS_BUFFER_ARRAY);
}

private static Optional<String> dropEmpty(Optional<String> optional)
Expand Down

0 comments on commit 39458df

Please sign in to comment.