Nebula Exchange (Exchange for short) is an Apache Spark application. It is used to migrate cluster data in bulk from Spark to Nebula Graph in a distributed environment. It supports migration of batch data and streaming data in various formats.
Exchange 2.0 only supports Nebula Graph 2.0 . If you want to import data for Nebula Graph v1.x,please use Nebula Exchange v1.0.
-
Package latest Exchange
$ git clone https://github.com/vesoft-inc/nebula-exchange.git $ cd nebula-exchange/nebula-exchange $ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true
After the packaging, you can see the newly generated nebula-exchange-2.5-SNAPSHOT.jar under the nebula-exchange/nebula-exchange/target/ directory.
-
Download from Maven repository
release version: https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/
snapshot version: https://oss.sonatype.org/content/repositories/snapshots/com/vesoft/nebula-exchange/
Import command:
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange-2.5.0.jar -c /path/to/application.conf
If your source is HIVE, import command is:
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange-2.5.0.jar -c /path/to/application.conf -h
Note:Submit Exchange with Yarn-Cluster mode, please use following command:
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--master yarn-cluster \
--files application.conf \
--conf spark.driver.extraClassPath=./ \
--conf spark.executor.extraClassPath=./ \
nebula-exchange-2.5.0.jar \
-c application.conf
Note: When use Exchange to generate SST files, please add spark.sql.shuffle.partition config for Spark's shuffle operation:
$SPARK_HOME/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange \
--master local \
--conf spark.sql.shuffle.partitions=200 \
nebula-exchange-2.5.0.jar \
-c application.conf
For more details about Exchange, please refer to Exchange 2.0 .
There are the version correspondence between Nebula Exchange and Nebula:
Nebula Exchange Version | Nebula Version |
---|---|
2.0.0 | 2.0.0, 2.0.1 |
2.0.1 | 2.0.0, 2.0.1 |
2.1.0 | 2.0.0, 2.0.1 |
2.5.0 | 2.5.0, 2.5.1 |
2.5.1 | 2.5.0, 2.5.1 |
2.6.0 | 2.6.0 |
2.5-SNAPSHOT | nightly |
- Supports importing vertex data with String and Integer type IDs.
- Supports importing data of the Null, Date, DateTime, and Time types(DateTime uses UTC, not local time).
- Supports importing data from other Hive sources besides Hive on Spark.
- Supports recording and retrying the INSERT statement after failures during data import.
- Supports SST import, but not support property's default value yet.
Refer to application.conf as an example to edit the configuration file.