Skip to content

Commit

Permalink
HADOOP-18073. S3A: Upgrade AWS SDK to V2 (apache#5995)
Browse files Browse the repository at this point in the history
This patch migrates the S3A connector to use the V2 AWS SDK.

This is a significant change at the source code level.
Any applications using the internal extension/override points in
the filesystem connector are likely to break.

This includes but is not limited to:
- Code invoking methods on the S3AFileSystem class
  which used classes from the V1 SDK.
- The ability to define the factory for the `AmazonS3` client, and
  to retrieve it from the S3AFileSystem. There is a new factory
  API and a special interface S3AInternals to access a limited
  set of internal classes and operations.
- Delegation token and auditing extensions.
- Classes trying to integrate with the AWS SDK.

All standard V1 credential providers listed in the option 
fs.s3a.aws.credentials.provider will be automatically remapped to their
V2 equivalent.

Other V1 Credential Providers are supported, but only if the V1 SDK is
added back to the classpath.  

The SDK Signing plugin has changed; all v1 signers are incompatible.
There is no support for the S3 "v2" signing algorithm.

Finally, the aws-sdk-bundle JAR has been replaced by the shaded V2
equivalent, "bundle.jar", which is now exported by the hadoop-aws module.

Consult the document aws_sdk_upgrade for the full details.

Contributed by Ahmar Suhail + some bits by Steve Loughran
  • Loading branch information
steveloughran authored Sep 11, 2023
1 parent e4550e1 commit 81d90fd
Show file tree
Hide file tree
Showing 205 changed files with 8,987 additions and 5,900 deletions.
1 change: 1 addition & 0 deletions LICENSE-binary
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,7 @@ org.objenesis:objenesis:2.6
org.xerial.snappy:snappy-java:1.1.10.1
org.yaml:snakeyaml:2.0
org.wildfly.openssl:wildfly-openssl:1.1.3.Final
software.amazon.awssdk:bundle:jar:2.20.128


--------------------------------------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,10 @@ public final class StoreStatisticNames {
public static final String MULTIPART_UPLOAD_LIST
= "multipart_upload_list";

/** Probe for store region: {@value}. */
public static final String STORE_REGION_PROBE
= "store_region_probe";

private StoreStatisticNames() {
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1387,61 +1387,31 @@
<description>AWS secret key used by S3A file system. Omit for IAM role-based or provider-based authentication.</description>
</property>

<property>
<name>fs.s3a.session.token</name>
<description>Session token, when using org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
as one of the providers.
</description>
</property>

<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
com.amazonaws.auth.EnvironmentVariableCredentialsProvider,
software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider,
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
</value>
<description>
Comma-separated class names of credential provider classes which implement
com.amazonaws.auth.AWSCredentialsProvider.
software.amazon.awssdk.auth.credentials.AwsCredentialsProvider.

When S3A delegation tokens are not enabled, this list will be used
to directly authenticate with S3 and other AWS services.
When S3A Delegation tokens are enabled, depending upon the delegation
token binding it may be used
to communicate wih the STS endpoint to request session/role
credentials.

These are loaded and queried in sequence for a valid set of credentials.
Each listed class must implement one of the following means of
construction, which are attempted in order:
* a public constructor accepting java.net.URI and
org.apache.hadoop.conf.Configuration,
* a public constructor accepting org.apache.hadoop.conf.Configuration,
* a public static method named getInstance that accepts no
arguments and returns an instance of
com.amazonaws.auth.AWSCredentialsProvider, or
* a public default constructor.

Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider allows
anonymous access to a publicly accessible S3 bucket without any credentials.
Please note that allowing anonymous access to an S3 bucket compromises
security and therefore is unsuitable for most use cases. It can be useful
for accessing public data sets without requiring AWS credentials.

If unspecified, then the default list of credential provider classes,
queried in sequence, is:
* org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider: looks
for session login secrets in the Hadoop configuration.
* org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider:
Uses the values of fs.s3a.access.key and fs.s3a.secret.key.
* com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports
configuration of AWS access key ID and secret access key in
environment variables named AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
and AWS_SESSION_TOKEN as documented in the AWS SDK.
* org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider: picks up
IAM credentials of any EC2 VM or AWS container in which the process is running.
</description>
</property>

<property>
<name>fs.s3a.session.token</name>
<description>Session token, when using org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
as one of the providers.
</description>
</property>

Expand Down Expand Up @@ -1539,10 +1509,10 @@
Note: for job submission to actually collect these tokens,
Kerberos must be enabled.

Options are:
Bindings available in hadoop-aws are:
org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding
org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenBinding
and org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenBinding
org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenBinding
</description>
</property>

Expand Down
22 changes: 20 additions & 2 deletions hadoop-project/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@
<surefire.fork.timeout>900</surefire.fork.timeout>
<aws-java-sdk.version>1.12.499</aws-java-sdk.version>
<hsqldb.version>2.7.1</hsqldb.version>
<aws-java-sdk-v2.version>2.20.128</aws-java-sdk-v2.version>
<aws.eventstream.version>1.0.1</aws.eventstream.version>
<frontend-maven-plugin.version>1.11.2</frontend-maven-plugin.version>
<jasmine-maven-plugin.version>2.1</jasmine-maven-plugin.version>
<phantomjs-maven-plugin.version>0.7</phantomjs-maven-plugin.version>
Expand Down Expand Up @@ -1128,15 +1130,31 @@
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bundle</artifactId>
<artifactId>aws-java-sdk-core</artifactId>
<version>${aws-java-sdk.version}</version>
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>bundle</artifactId>
<version>${aws-java-sdk-v2.version}</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>software.amazon.eventstream</groupId>
<artifactId>eventstream</artifactId>
<version>${aws.eventstream.version}</version>
</dependency>
<dependency>
<groupId>org.apache.mina</groupId>
<artifactId>mina-core</artifactId>
Expand Down
5 changes: 5 additions & 0 deletions hadoop-tools/hadoop-aws/dev-support/findbugs-exclude.xml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@
<Field name="futurePool"/>
<Bug pattern="IS2_INCONSISTENT_SYNC"/>
</Match>
<Match>
<Class name="org.apache.hadoop.fs.s3a.S3AFileSystem"/>
<Field name="s3AsyncClient"/>
<Bug pattern="IS2_INCONSISTENT_SYNC"/>
</Match>
<Match>
<Class name="org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$BucketInfo"/>
<Method name="run"/>
Expand Down
16 changes: 15 additions & 1 deletion hadoop-tools/hadoop-aws/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -494,11 +494,25 @@
<scope>test</scope>
<type>test-jar</type>
</dependency>

<!-- The v1 SDK is used at compilation time for adapter classes in
org.apache.hadoop.fs.s3a.adapter. It is not needed at runtime
unless a non-standard v1 credential provider is declared. -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bundle</artifactId>
<artifactId>aws-java-sdk-core</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>bundle</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>software.amazon.eventstream</groupId>
<artifactId>eventstream</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

package org.apache.hadoop.fs.s3a;

import com.amazonaws.AmazonServiceException;
import software.amazon.awssdk.awscore.exception.AwsServiceException;

/**
* A 400 "Bad Request" exception was received.
Expand All @@ -36,7 +36,7 @@ public class AWSBadRequestException extends AWSServiceIOException {
* @param cause the underlying cause
*/
public AWSBadRequestException(String operation,
AmazonServiceException cause) {
AwsServiceException cause) {
super(operation, cause);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -18,34 +18,40 @@

package org.apache.hadoop.fs.s3a;

import com.amazonaws.AmazonClientException;
import com.amazonaws.SdkBaseException;
import software.amazon.awssdk.core.exception.SdkException;
import org.apache.hadoop.util.Preconditions;

import java.io.IOException;

/**
* IOException equivalent of an {@link AmazonClientException}.
* IOException equivalent of an {@link SdkException}.
*/
public class AWSClientIOException extends IOException {

private final String operation;

public AWSClientIOException(String operation,
SdkBaseException cause) {
SdkException cause) {
super(cause);
Preconditions.checkArgument(operation != null, "Null 'operation' argument");
Preconditions.checkArgument(cause != null, "Null 'cause' argument");
this.operation = operation;
}

public AmazonClientException getCause() {
return (AmazonClientException) super.getCause();
public SdkException getCause() {
return (SdkException) super.getCause();
}

@Override
public String getMessage() {
return operation + ": " + getCause().getMessage();
}

/**
* Query inner cause for retryability.
* @return what the cause says.
*/
public boolean retryable() {
return getCause().retryable();
}
}
Loading

0 comments on commit 81d90fd

Please sign in to comment.