Skip to content

Commit

Permalink
[SPARK-40677][CONNECT][FOLLOWUP] Refactor shade relocation/rename r…
Browse files Browse the repository at this point in the history
…ules

### What changes were proposed in this pull request?
This main change of this pr is refactor shade relocation/rename rules refer to result of `mvn dependency:tree -pl connector/connect` to
ensure that maven and sbt produce assembly jar according to the same rules.

The main parts of `mvn dependency:tree -pl connector/connect` result as follows:

```
[INFO] +- com.google.guava:guava:jar:31.0.1-jre:compile
[INFO] |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  +- org.checkerframework:checker-qual:jar:3.12.0:compile
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.7.1:compile
[INFO] |  \- com.google.j2objc:j2objc-annotations:jar:1.3:compile
[INFO] +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] +- com.google.protobuf:protobuf-java:jar:3.21.1:compile
[INFO] +- io.grpc:grpc-netty:jar:1.47.0:compile
[INFO] |  +- io.grpc:grpc-core:jar:1.47.0:compile
[INFO] |  |  +- com.google.code.gson:gson:jar:2.9.0:runtime
[INFO] |  |  +- com.google.android:annotations:jar:4.1.1.4:runtime
[INFO] |  |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.19:runtime
[INFO] |  +- io.netty:netty-codec-http2:jar:4.1.72.Final:compile
[INFO] |  |  \- io.netty:netty-codec-http:jar:4.1.72.Final:compile
[INFO] |  +- io.netty:netty-handler-proxy:jar:4.1.72.Final:runtime
[INFO] |  |  \- io.netty:netty-codec-socks:jar:4.1.72.Final:runtime
[INFO] |  +- io.perfmark:perfmark-api:jar:0.25.0:runtime
[INFO] |  \- io.netty:netty-transport-native-unix-common:jar:4.1.72.Final:runtime
[INFO] +- io.grpc:grpc-protobuf:jar:1.47.0:compile
[INFO] |  +- io.grpc:grpc-api:jar:1.47.0:compile
[INFO] |  |  \- io.grpc:grpc-context:jar:1.47.0:compile
[INFO] |  +- com.google.api.grpc:proto-google-common-protos:jar:2.0.1:compile
[INFO] |  \- io.grpc:grpc-protobuf-lite:jar:1.47.0:compile
[INFO] +- io.grpc:grpc-services:jar:1.47.0:compile
[INFO] |  \- com.google.protobuf:protobuf-java-util:jar:3.19.2:runtime
[INFO] +- io.grpc:grpc-stub:jar:1.47.0:compile
[INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile
```

The new shade rule excludes the following jar packages:

- scala related jars
- netty related jars
- only sbt inlcude jars before: pmml-model-*.jar, findbugs jsr305-*.jar, spark unused-1.0.0.jar

So after this pr

maven shade will includes the following jars:

```
[INFO] --- maven-shade-plugin:3.2.4:shade (default)  spark-connect_2.12 ---
[INFO] Including com.google.guava:guava:jar:31.0.1-jre in the shaded jar.
[INFO] Including com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava in the shaded jar.
[INFO] Including org.checkerframework:checker-qual:jar:3.12.0 in the shaded jar.
[INFO] Including com.google.errorprone:error_prone_annotations:jar:2.7.1 in the shaded jar.
[INFO] Including com.google.j2objc:j2objc-annotations:jar:1.3 in the shaded jar.
[INFO] Including com.google.guava:failureaccess:jar:1.0.1 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:3.21.1 in the shaded jar.
[INFO] Including io.grpc:grpc-netty:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-core:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:2.9.0 in the shaded jar.
[INFO] Including com.google.android:annotations:jar:4.1.1.4 in the shaded jar.
[INFO] Including org.codehaus.mojo:animal-sniffer-annotations:jar:1.19 in the shaded jar.
[INFO] Including io.perfmark:perfmark-api:jar:0.25.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-api:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-context:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-common-protos:jar:2.0.1 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-lite:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-services:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java-util:jar:3.19.2 in the shaded jar.
[INFO] Including io.grpc:grpc-stub:jar:1.47.0 in the shaded jar.
```

sbt assembly will include the following jars:

```
[debug] Including from cache: j2objc-annotations-1.3.jar
[debug] Including from cache: guava-31.0.1-jre.jar
[debug] Including from cache: protobuf-java-3.21.1.jar
[debug] Including from cache: grpc-services-1.47.0.jar
[debug] Including from cache: failureaccess-1.0.1.jar
[debug] Including from cache: grpc-stub-1.47.0.jar
[debug] Including from cache: perfmark-api-0.25.0.jar
[debug] Including from cache: annotations-4.1.1.4.jar
[debug] Including from cache: listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
[debug] Including from cache: animal-sniffer-annotations-1.19.jar
[debug] Including from cache: checker-qual-3.12.0.jar
[debug] Including from cache: grpc-netty-1.47.0.jar
[debug] Including from cache: grpc-api-1.47.0.jar
[debug] Including from cache: grpc-protobuf-lite-1.47.0.jar
[debug] Including from cache: grpc-protobuf-1.47.0.jar
[debug] Including from cache: grpc-context-1.47.0.jar
[debug] Including from cache: grpc-core-1.47.0.jar
[debug] Including from cache: protobuf-java-util-3.19.2.jar
[debug] Including from cache: error_prone_annotations-2.10.0.jar
[debug] Including from cache: gson-2.9.0.jar
[debug] Including from cache: proto-google-common-protos-2.0.1.jar
```

All the dependencies mentioned above are relocationed to the `org.sparkproject.connect` package according to the new rules to avoid conflicts with other third-party dependencies.

### Why are the changes needed?
Refactor shade relocation/rename rules to ensure that maven and sbt produce assembly jar according to the same rules.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes apache#38162 from LuciferYang/SPARK-40677-FOLLOWUP.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
LuciferYang authored and HyukjinKwon committed Oct 11, 2022
1 parent 4eb0edf commit e927a7e
Show file tree
Hide file tree
Showing 2 changed files with 82 additions and 14 deletions.
66 changes: 53 additions & 13 deletions connector/connect/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -268,11 +268,13 @@
as assembly build.
-->
<include>com.google.android:annotations</include>
<include>com.google.api.grpc:proto-google-common-proto</include>
<include>com.google.api.grpc:proto-google-common-protos</include>
<include>io.perfmark:perfmark-api</include>
<include>org.codehaus.mojo:animal-sniffer-annotations</include>
<include>com.google.errorprone:error_prone_annotations</include>
<include>com.google.j2objc:j2objc-annotations</include>
<include>org.checkerframework:checker-qual</include>
<include>com.google.code.gson:gson</include>
</includes>
</artifactSet>
<relocations>
Expand Down Expand Up @@ -303,28 +305,66 @@
</relocation>

<relocation>
<pattern>com.google.android</pattern>
<shadedPattern>${spark.shade.packageName}.connect.android</shadedPattern>
<pattern>android.annotation</pattern>
<shadedPattern>${spark.shade.packageName}.connect.android_annotation</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.api.grpc</pattern>
<shadedPattern>${spark.shade.packageName}.connect.api</shadedPattern>
<pattern>io.perfmark</pattern>
<shadedPattern>${spark.shade.packageName}.connect.io_perfmark</shadedPattern>
</relocation>
<relocation>
<pattern>io.perfmark</pattern>
<shadedPattern>${spark.shade.packageName}.connect.perfmark</shadedPattern>
<pattern>org.codehaus.mojo.animal_sniffer</pattern>
<shadedPattern>${spark.shade.packageName}.connect.animal_sniffer</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.j2objc.annotations</pattern>
<shadedPattern>${spark.shade.packageName}.connect.j2objc_annotations</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.errorprone.annotations</pattern>
<shadedPattern>${spark.shade.packageName}.connect.errorprone_annotations</shadedPattern>
</relocation>
<relocation>
<pattern>org.checkerframework</pattern>
<shadedPattern>${spark.shade.packageName}.connect.checkerframework</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.gson</pattern>
<shadedPattern>${spark.shade.packageName}.connect.gson</shadedPattern>
</relocation>

<!--
For `com.google.api.grpc:proto-google-common-protos`, do not directly define pattern
as `common.google`, otherwise, otherwise, the relocation result may be uncertain due
to the change of rule order.
-->
<relocation>
<pattern>com.google.api</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.api</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.cloud</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.cloud</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.geo</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.geo</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.logging</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.logging</shadedPattern>
</relocation>
<relocation>
<pattern>org.codehaus.mojo</pattern>
<shadedPattern>${spark.shade.packageName}.connect.mojo</shadedPattern>
<pattern>com.google.longrunning</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.longrunning</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.errorprone</pattern>
<shadedPattern>${spark.shade.packageName}.connect.errorprone</shadedPattern>
<pattern>com.google.rpc</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.rpc</shadedPattern>
</relocation>
<relocation>
<pattern>com.com.google.j2objc</pattern>
<shadedPattern>${spark.shade.packageName}.connect.j2objc</shadedPattern>
<pattern>com.google.type</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.type</shadedPattern>
</relocation>
</relocations>
</configuration>
Expand Down
30 changes: 29 additions & 1 deletion project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -655,19 +655,47 @@ object SparkConnect {

(assembly / logLevel) := Level.Info,

// Exclude `scala-library` from assembly.
(assembly / assemblyPackageScala / assembleArtifact) := false,

// Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
// `netty-*.jar` and `unused-1.0.0.jar` from assembly.
(assembly / assemblyExcludedJars) := {
val cp = (assembly / fullClasspath).value
cp filter { v =>
val name = v.data.getName
name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
}
},

(assembly / assemblyShadeRules) := Seq(
ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.grpc.@0").inAll,
ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.guava.@1").inAll,
ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.guava.@1").inAll,
ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.android_annotation.@1").inAll,
ShadeRule.rename("io.perfmark.**" -> "org.sparkproject.connect.io_perfmark.@1").inAll,
ShadeRule.rename("org.codehaus.mojo.animal_sniffer.**" -> "org.sparkproject.connect.animal_sniffer.@1").inAll,
ShadeRule.rename("com.google.j2objc.annotations.**" -> "org.sparkproject.connect.j2objc_annotations.@1").inAll,
ShadeRule.rename("com.google.errorprone.annotations.**" -> "org.sparkproject.connect.errorprone_annotations.@1").inAll,
ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.checkerframework.@1").inAll,
ShadeRule.rename("com.google.gson.**" -> "org.sparkproject.connect.gson.@1").inAll,
ShadeRule.rename("com.google.api.**" -> "org.sparkproject.connect.google_protos.api.@1").inAll,
ShadeRule.rename("com.google.cloud.**" -> "org.sparkproject.connect.google_protos.cloud.@1").inAll,
ShadeRule.rename("com.google.geo.**" -> "org.sparkproject.connect.google_protos.geo.@1").inAll,
ShadeRule.rename("com.google.logging.**" -> "org.sparkproject.connect.google_protos.logging.@1").inAll,
ShadeRule.rename("com.google.longrunning.**" -> "org.sparkproject.connect.google_protos.longrunning.@1").inAll,
ShadeRule.rename("com.google.rpc.**" -> "org.sparkproject.connect.google_protos.rpc.@1").inAll,
ShadeRule.rename("com.google.type.**" -> "org.sparkproject.connect.google_protos.type.@1").inAll
),

(assembly / assemblyMergeStrategy) := {
case m if m.toLowerCase(Locale.ROOT).endsWith("manifest.mf") => MergeStrategy.discard
// Drop all proto files that are not needed as artifacts of the build.
case m if m.toLowerCase(Locale.ROOT).endsWith(".proto") => MergeStrategy.discard
case _ => MergeStrategy.first
},
}
)
}

Expand Down

0 comments on commit e927a7e

Please sign in to comment.