Skip to content

Commit

Permalink
[SPARK-49550][FOLLOWUP][SQL][DOC] Switch Hadoop to 3.4.1 in IsolatedC…
Browse files Browse the repository at this point in the history
…lientLoader and docs

### What changes were proposed in this pull request?

Switch Hadoop to 3.4.1 in `IsolatedClientLoader` and docs.

### Why are the changes needed?

Make the Hadoop version consistent in the code and docs.

### Does this PR introduce _any_ user-facing change?

Docs are updated.

### How was this patch tested?

Pass GHA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48890 from pan3793/SPARK-49550-followup.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
  • Loading branch information
pan3793 authored and LuciferYang committed Nov 20, 2024
1 parent 8791767 commit b7cf448
Show file tree
Hide file tree
Showing 6 changed files with 7 additions and 11 deletions.
2 changes: 1 addition & 1 deletion assembly/README
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ This module is off by default. To activate it specify the profile in the command

If you need to build an assembly for a different version of Hadoop the
hadoop-version system property needs to be set as in this example:
-Dhadoop.version=3.4.0
-Dhadoop.version=3.4.1
7 changes: 2 additions & 5 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,11 @@ This will build Spark distribution along with Python pip and R packages. For mor

## Specifying the Hadoop Version and Enabling YARN

You can specify the exact version of Hadoop to compile against through the `hadoop.version` property.

You can enable the `yarn` profile and optionally set the `yarn.version` property if it is different
from `hadoop.version`.
You can enable the `yarn` profile and specify the exact version of Hadoop to compile against through the `hadoop.version` property.

Example:

./build/mvn -Pyarn -Dhadoop.version=3.4.0 -DskipTests clean package
./build/mvn -Pyarn -Dhadoop.version=3.4.1 -DskipTests clean package

## Building With Hive and JDBC Support

Expand Down
2 changes: 1 addition & 1 deletion docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ A typical example of this using S3 is via passing the following options:

```
...
--packages org.apache.hadoop:hadoop-aws:3.4.0
--packages org.apache.hadoop:hadoop-aws:3.4.1
--conf spark.kubernetes.file.upload.path=s3a://<s3-bucket>/path
--conf spark.hadoop.fs.s3a.access.key=...
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
Expand Down
2 changes: 1 addition & 1 deletion docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Please see [Spark Security](security.html) and the specific security sections in

# Launching Spark on YARN

Apache Hadoop does not support Java 17 as of 3.4.0, while Apache Spark requires at least Java 17 since 4.0.0, so a different JDK should be configured for Spark applications.
Apache Hadoop does not support Java 17 as of 3.4.1, while Apache Spark requires at least Java 17 since 4.0.0, so a different JDK should be configured for Spark applications.
Please refer to [Configuring different JDKs for Spark Applications](#configuring-different-jdks-for-spark-applications) for details.

Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster.
Expand Down
3 changes: 1 addition & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,6 @@
<!-- SPARK-41247: When updating `protobuf.version`, also need to update `protoVersion` in `SparkBuild.scala` -->
<protobuf.version>4.28.3</protobuf.version>
<protoc-jar-maven-plugin.version>3.11.4</protoc-jar-maven-plugin.version>
<yarn.version>${hadoop.version}</yarn.version>
<zookeeper.version>3.9.3</zookeeper.version>
<curator.version>5.7.1</curator.version>
<hive.group>org.apache.hive</hive.group>
Expand Down Expand Up @@ -1418,7 +1417,7 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-minicluster</artifactId>
<version>${yarn.version}</version>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<!-- End of Hadoop 3.x dependencies -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ private[hive] object IsolatedClientLoader extends Logging {
case e: RuntimeException if e.getMessage.contains("hadoop") =>
// If the error message contains hadoop, it is probably because the hadoop
// version cannot be resolved.
val fallbackVersion = "3.4.0"
val fallbackVersion = "3.4.1"
logWarning(log"Failed to resolve Hadoop artifacts for the version " +
log"${MDC(HADOOP_VERSION, hadoopVersion)}. We will change the hadoop version from " +
log"${MDC(HADOOP_VERSION, hadoopVersion)} to " +
Expand Down

0 comments on commit b7cf448

Please sign in to comment.