Skip to content

Commit

Permalink
[SPARK-42656][CONNECT][FOLLOWUP] Spark Connect Shell
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
Add spark connect shell to start the spark shell with spark connect enabled.
Added "-Pconnect" to build the spark connect in the distributions.
Simplified the dev shell scripts with "-Pconnect" command.

### Why are the changes needed?
Allow users to play with spark connect easily.

### Does this PR introduce _any_ user-facing change?
Yes. Added a new shell script and "-Pconnect" build option.

### How was this patch tested?
Manually tested.

Closes apache#40305 from zhenlineo/connect-shell.

Authored-by: Zhen Li <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
  • Loading branch information
zhenlineo authored and hvanhovell committed Mar 7, 2023
1 parent c3a09e2 commit 2e7207f
Show file tree
Hide file tree
Showing 7 changed files with 85 additions and 7 deletions.
10 changes: 10 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,16 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>connect</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>kubernetes</id>
<dependencies>
Expand Down
27 changes: 27 additions & 0 deletions bin/spark-connect-shell
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# The shell script to start a spark-shell with spark connect enabled.

if [ -z "${SPARK_HOME}" ]; then
source "$(dirname "$0")"/find-spark-home
fi

# This requires building the spark with `-Pconnect`, e,g, `build/sbt -Pconnect package`
exec "${SPARK_HOME}"/bin/spark-shell --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin "$@"
13 changes: 6 additions & 7 deletions connector/connect/bin/spark-connect
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,16 @@
# limitations under the License.
#

# Start the spark-connect with server logs printed in the standard output. The script rebuild the
# server dependencies and start the server at the default port. This can be used to debug client
# during client development.

# Go to the Spark project root directory
FWDIR="$(cd "`dirname "$0"`"/../../..; pwd)"
cd "$FWDIR"
export SPARK_HOME=$FWDIR

SCALA_BINARY_VER=`grep "scala.binary.version" "${SPARK_HOME}/pom.xml" | head -n1 | awk -F '[<>]' '{print $3}'`
SCALA_ARG=$(if [ "${SCALA_BINARY_VER}" == "2.13" ]; then echo "-Pscala-2.13"; else echo ""; fi)

# Build the jars needed for spark submit and spark connect
build/sbt "${SCALA_ARG}" -Phive package

CONNECT_JAR=`ls "${SPARK_HOME}"/connector/connect/server/target/scala-"${SCALA_BINARY_VER}"/spark-connect-assembly*.jar | paste -sd ',' -`
build/sbt -Phive -Pconnect package

exec "${SPARK_HOME}"/bin/spark-submit "$@" --class org.apache.spark.sql.connect.SimpleSparkConnectService "$CONNECT_JAR"
exec "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.sql.connect.SimpleSparkConnectService "$@"
1 change: 1 addition & 0 deletions connector/connect/bin/spark-connect-scala-client.sc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ val sessionBuilder = SparkSession.builder()
val spark = if (conStr.isEmpty) sessionBuilder.build() else sessionBuilder.remote(conStr).build()
import spark.implicits._
import spark.sql
println("Spark session available as 'spark'.")
println(
"""
| _____ __ ______ __
Expand Down
32 changes: 32 additions & 0 deletions connector/connect/bin/spark-connect-shell
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# The spark connect shell for development. This shell script builds the spark connect server with
# all dependencies and starts the server at the default port.
# Use `/bin/spark-connect-shell` instead if rebuilding the dependency jars are not needed.

# Go to the Spark project root directory
FWDIR="$(cd "`dirname "$0"`"/../../..; pwd)"
cd "$FWDIR"
export SPARK_HOME=$FWDIR

# Build the jars needed for spark shell and spark connect
build/sbt -Phive -Pconnect package

exec "${SPARK_HOME}"/bin/spark-shell --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin "$@"
4 changes: 4 additions & 0 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,10 @@ For instance, you can build the Spark Streaming module using:

where `spark-streaming_{{site.SCALA_BINARY_VERSION}}` is the `artifactId` as defined in `streaming/pom.xml` file.

## Building with Spark Connect support

./build/mvn -Pconnect -DskipTests clean package

## Continuous Compilation

We use the scala-maven-plugin which supports incremental and continuous compilation. E.g.
Expand Down
5 changes: 5 additions & 0 deletions repl/src/main/scala-2.12/org/apache/spark/repl/Main.scala
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,11 @@ object Main extends Logging {
sparkContext = sparkSession.sparkContext
sparkSession
} catch {
case e: ClassNotFoundException if isShellSession && e.getMessage.contains(
"org.apache.spark.sql.connect.SparkConnectPlugin") =>
logError("Failed to load spark connect plugin.")
logError("You need to build Spark with -Pconnect.")
sys.exit(1)
case e: Exception if isShellSession =>
logError("Failed to initialize Spark session.", e)
sys.exit(1)
Expand Down

0 comments on commit 2e7207f

Please sign in to comment.