Skip to content

Latest commit

 

History

History
 
 

connect

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Spark Connect

This module contains the implementation of Spark Connect which is a logical plan facade for the implementation in Spark. Spark Connect is directly integrated into the build of Spark.

The documentation linked here is specifically for developers of Spark Connect and not directly intended to be end-user documentation.

Development Topics

Guidelines for new clients

When contributing a new client please be aware that we strive to have a common user experience across all languages. Please follow the below guidelines:

Python client development

Python-specific development guidelines are located in python/docs/source/development/testing.rst that is published at Development tab in PySpark documentation.

To generate the Python client code from the proto files:

First, make sure to have a Python environment with the installed dependencies. Specifically, install black and dependencies from the "Spark Connect python proto generation plugin (optional)" section.

pip install -r dev/requirements.txt

Install buf

brew install bufbuild/buf/buf

Generate the Python files by running:

dev/connect-gen-protos.sh

Build with user-defined protoc and protoc-gen-grpc-java

When the user cannot use the official protoc and protoc-gen-grpc-java binary files to build the connect module in the compilation environment, for example, compiling connect module on CentOS 6 or CentOS 7 which the default glibc version is less than 2.14, we can try to compile and test by specifying the user-defined protoc and protoc-gen-grpc-java binary files as follows:

export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
export CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
./build/mvn -Phive -Puser-defined-protoc clean package

or

export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
export CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
./build/sbt -Puser-defined-protoc clean package

The user-defined protoc and protoc-gen-grpc-java binary files can be produced in the user's compilation environment by source code compilation, for compilation steps, please refer to protobuf and grpc-java.