Nebula Graph csv importer with go
. This tool reads local csv files and writes into Nebula storage.
You can use this tool by source code or by docker.
You should start a Nebula server by
docker-compose
or rpm installation. And also make sure the corrsponding space, tags and edge types have been created in Nebula.
Nebula-importer will read a YAML
configuration file to get information about connection to graph server, tag/edge schema, etc.
Here's an example of configuration file.
See description below
version: v1rc1
description: example
clientSettings:
concurrency: 4 # number of graph clients
connection:
user: user
password: password
address: 127.0.0.1:3699
logPath: ./err/test.log
files:
- path: ./edge.csv
failDataPath: ./err/edge.csv
batchSize: 100
type: csv
csv:
withHeader: false
withLabel: false
schema:
space: test
type: edge
edge:
name: edge_name
withRanking: true
props:
- name: prop_name
type: string
- path: ./vertex.csv
failDataPath: ./err/vertex.csv
batchSize: 100
type: csv
csv:
withHeader: false
withLabel: false
schema:
space: test
type: vertex
vertex:
tags:
- name: tag1
props:
- name: prop1
type: int
- name: prop2
type: timestamp
- name: tag2
props:
- name: prop3
type: double
- name: prop4
type: string
As for this example, nebula-importer will import two csv data files edge.csv
and vertex.csv
in turn.
options | description | default |
---|---|---|
version | Configuration file version | v1rc1 |
description | Description of this configure file | "" |
clientSettings | Graph client settings | - |
clientSettings.concurrency | Number of graph clients | 4 |
clientSettings.connection | Connection options of graph client | - |
clientSettings.connection.user | Username | user |
clientSettings.connection.password | Password | password |
clientSettings.connection.address | Address of graph client | 127.0.0.1:3699 |
logPath | Path of log file | "" |
files | File list to be imported | - |
files[0].path | File path | "" |
files[0].type | File type | csv |
files[0].csv | CSV file options | - |
files[0].csv.withHeader | Whether csv file has header | false |
files[0].csv.withLabel | Whether csv file has +/- label to represent delete/insert operation |
false |
files[0].schema | Schema definition for this file data | - |
files[0].schema.space | Space name created in nebula | "" |
files[0].schema.type | Schema type: vertex or edge | vertex |
files[0].schema.edge | Edge options | - |
files[0].schema.edge.name | Edge name in above space | "" |
files[0].schema.edge.withRanking | Whether this edge has ranking | false |
files[0].schema.edge.props | Properties of the edge | - |
files[0].schema.edge.props[0].name | Property name | "" |
files[0].schema.edge.props[0].type | Property type | "" |
files[0].schema.vertex | Vertex options | - |
files[0].schema.vertex.tags | Vertex tags options | - |
files[0].schema.vertex.tags[0].name | Vertex tag name | "" |
files[0].schema.vertex.tags[0].props | Vertex tag's properties | - |
files[0].schema.vertex.tags[0].props[0].name | Vertex tag's property name | "" |
files[0].schema.vertex.tags[0].props[0].type | Vertex tag's property type | "" |
files[0].failDataPath | Failed data file path | "" |
There will be two csv data formats supported in the future. But now please use the first format which has no header line in your csv data file.
In vertex csv data file, first column could be a label(+/-) or the vid. Vertex VID column is the first column if the label option csv.withLabel
configured false
.
Then property values are behind VID and the order of these values must be same as props
in configuration.
1,2,this is a property string
2,4,yet another property string
with label:
+
: Insert-
: Delete
In labeled -
row, only need the vid which you want to delete.
+,1,2,this is a property string
-,1
+,2,4,yet anthor property string
Edge csv data file format is like the vertex description. But difference with above vertex vid is source vertex vid, destination vertex vid and edge ranking.
Without label column, src_vid
, dst_vid
and ranking
always are first three columns in csv data file.
1,2,0,first property value
1,3,2,prop value
Ranking column is not required, you must not give it if you don't need it.
1,2,first property value
1,3,prop value
with label:
+,1,2,0,first property value
+,1,3,2,prop value
This feature has not been supported now. Please remove the header from your csv data file at present.
_src,_dst,_ranking,prop1,prop2
...
_src
and _dst
represent edge source and destination vertex id. _ranking
column is value of edge ranking.
_vid,tag1.prop1,tag2.prop2,tag1.prop3,tag2.prop4
...
_vid
column represent the global unique vertex id.
This tool depends on golang 1.13, so make sure you have install go
first.
Use git
to clone this project to your local directory and execute the cmd/importer.go
with config
parameter.
$ git clone https://github.com/vesoft-inc/nebula-importer.git
$ cd nebula-importer/cmd
$ go run importer.go --config /path/to/yaml/config/file
With docker, we can easily to import our local data to nebula without golang
runtime environment.
$ docker run --rm -ti \
--network=host \
-v {your-config-file}:/root/{your-config-file} \
-v {your-csv-data-dir}:/root/{your-csv-data-dir} \
vesoft/nebula-importer
--config /root/{your-config-file}
All logs info will output to your logPath
file in configuration.
- Summary statistics of response
- Write error log and data
- Configure file
- Concurrent request to Graph server
- Create space and tag/edge automatically
- Configure retry option for Nebula client
- Support edge rank
- Support label for add/delete(+/-) in first column
- Support column header in first line
- Support vid partition
- Support multi-tags insertion in vertex
- Provide docker image and usage
- Make header adapt to props order defined in schema of configure file
- Handle string column in nice way
- Update concurrency and batch size online
- Count duplicate vids
- Support VID generation automatically
- Output logs to file