Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-33605][BUILD] Add
gcs-connector
to hadoop-cloud
module
### What changes were proposed in this pull request? This PR aims to add `gcs-connector` shaded jar to `hadoop-cloud` module. ### Why are the changes needed? To support Google Cloud Storage more easily. ### Does this PR introduce _any_ user-facing change? Only one shaded jar file is added when the distribution is built with `-Phadoop-cloud`. ``` $ ls -alh gcs* -rw-r--r-- 1 dongjoon staff 32M Aug 31 11:14 gcs-connector-hadoop3-2.2.7-shaded.jar ``` ### How was this patch tested? **BUILD** ``` $ dev/make-distribution.sh -Phadoop-cloud ``` **RUN** ``` $ export KEYFILE=YOUR-credentials.json $ export EMAIL=$(jq -r '.client_email' < $KEYFILE) $ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE) $ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)" $ bin/spark-shell \ -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \ -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \ -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/08/31 11:56:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1661972165062). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.4.0-SNAPSHOT /_/ Using Scala version 2.12.16 (OpenJDK 64-Bit Server VM, Java 17.0.4) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.text("gs://apache-spark-bucket/README.md").count() res0: Long = 124 scala> spark.read.orc("examples/src/main/resources/users.orc").write.orc("gs://apache-spark-bucket/users.orc") scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show() +------+--------------+----------------+ | name|favorite_color|favorite_numbers| +------+--------------+----------------+ |Alyssa| null| [3, 9, 15, 20]| | Ben| red| []| +------+--------------+----------------+ ``` Closes apache#37745 from dongjoon-hyun/SPARK-33605. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information