Build on Apache Spark DataSourceV2 API.
See the documentation for how to use this connector.
- Java 8 or 11
- Scala 2.12 or 2.13
- Apache Spark 3.2 or 3.3
Notes:
-
As of 0.5.0, this connector switches from ClickHouse raw gRPC Client to ClickHouse Official Java Client, which brings HTTP protocol support, extending the range of supported versions of ClickHouse Server.
-
Due to lack of developer resources, the project is currently only focusing on Spark 3.3 support, which means you may find something it documents but does not work in Spark 3.2, or has significantly worse performance comparing to Spark 3.3. When you come into such a situation, send a PR to backport the patch from Spark 3.3 module to Spark 3.2 is first choice. Also, open an issue to request a backport is fine, I will check the issue list and fix some of important ones if I have time.
Build w/o test
./gradlew clean build -x test
The project leverage Testcontainers and Docker Compose to do integration tests, you should install Docker and Docker Compose before running test, and check more details on Testcontainers document if you'd like to run test with remote Docker daemon.
Run all test
./gradlew clean test
Run all test w/ Spark 3.2 and Scala 2.13
./gradlew clean test -Dspark_binary_version=3.2 -Dscala_binary_version=2.13
Run single test
./gradlew test --tests=ConvertDistToLocalWriteSuite
For developers/users who use ARM platform, e.g. Apple Silicon chips, Kunpeng chips, you may not able to run TPC-DS integrations test using gRPC in local directly, because ClickHouse does not provide gRPC support in official ARM image.
As a workaround, you can set the environment variable CLICKHOUSE_IMAGE
to use a custom image which supports gRPC
on ARM platform for testing.
export CLICKHOUSE_IMAGE=pan3793/clickhouse-server:22.5.1-alpine-arm-grpc
./gradlew clean test