Skip to content

Commit

Permalink
Change Amazon SDK dependencies back to 'provided'
Browse files Browse the repository at this point in the history
See discussion at databricks/spark-redshift#64 (comment)

Author: Josh Rosen <[email protected]>

Closes snowflakedb#70 from JoshRosen/mark-deps-as-provided-again.
  • Loading branch information
JoshRosen committed Sep 3, 2015
1 parent e9f2686 commit 6b855e8
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 2 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ You will also need to provide a JDBC driver that is compatible with Redshift. Am

**Note on Hadoop versions**: This library depends on [`spark-avro`](https://github.com/databricks/spark-avro), which should automatically be downloaded because it is declared as a dependency. However, you may need to provide the corresponding `avro-mapred` dependency which matches your Hadoop distribution. In most deployments, however, this dependency will be automatically provided by your cluster's Spark assemblies and no additional action will be required.

**Note on Amazon SDK dependency**: This library declares a `provided` dependency on components of the AWS Java SDK. In most cases, these libraries will be provided by your deployment environment. However, if you get ClassNotFoundExceptions for Amazon SDK classes then you will need to add explicit dependencies on `com.amazonaws.aws-java-sdk-core` and `com.amazonaws.aws-java-sdk-s3` as part of your build / runtime configuration. See the comments in `project/SparkRedshiftBuild.scala` for more details.

## Usage

### Data Sources API
Expand Down
13 changes: 11 additions & 2 deletions project/SparkRedshiftBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,17 @@ object SparkRedshiftBuild extends Build {
resolvers +=
"Spark 1.5.0 RC2 Staging" at "https://repository.apache.org/content/repositories/orgapachespark-1141",
libraryDependencies ++= Seq(
"com.amazonaws" % "aws-java-sdk-core" % "1.9.40",
"com.amazonaws" % "aws-java-sdk-s3" % "1.9.40",
// These Amazon SDK depdencies are marked as 'provided' in order to reduce the risk of
// dependency conflicts with other user libraries. In many environments, such as EMR and
// Databricks, the Amazon SDK will already be on the classpath. In other cases, the SDK is
// likely to be provided via a dependency on the S3NativeFileSystem. If this was not marked
// as provided, then we would have to worry about the SDK's own dependencies evicting
// earlier versions of those dependencies that are required by the end user's own code.
// There's a trade-off here and we've chosen to err on the side of minimizing dependency
// conflicts for a majority of users while adding a minor inconvienece (adding one extra
// depenendecy by hand) for a smaller set of users.
"com.amazonaws" % "aws-java-sdk-core" % "1.9.40" % "provided",
"com.amazonaws" % "aws-java-sdk-s3" % "1.9.40" % "provided",
// We require spark-avro, but avro-mapred must be provided to match Hadoop version.
// In most cases, avro-mapred will be provided as part of the Spark assembly JAR.
"com.databricks" %% "spark-avro" % "1.0.0",
Expand Down

0 comments on commit 6b855e8

Please sign in to comment.