Skip to content

jievince/nebula-importer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nebula-importer

Introduction

Nebula Graph csv importer with go. This tool reads local csv files and writes into Nebula storage.

You can use this tool by source code or by docker.

You should start a Nebula server by docker-compose or rpm installation. And also make sure the corrsponding space, tags and edge types have been created in Nebula.

Prepare configure file

Nebula-importer will read a YAML configuration file to get information about connection to graph server, tag/edge schema, etc.

Here's an example of configuration file.

See description below

version: v1rc1
description: example
clientSettings:
  concurrency: 4 # number of graph clients
  connection:
    user: user
    password: password
    address: 127.0.0.1:3699
logPath: ./err/test.log
files:
  - path: ./edge.csv
    failDataPath: ./err/edge.csv
    batchSize: 100
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      space: test
      type: edge
      edge:
        name: edge_name
        withRanking: true
        props:
          - name: prop_name
            type: string
  - path: ./vertex.csv
    failDataPath: ./err/vertex.csv
    batchSize: 100
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      space: test
      type: vertex
      vertex:
        tags:
          - name: tag1
            props:
              - name: prop1
                type: int
              - name: prop2
                type: timestamp
          - name: tag2
            props:
              - name: prop3
                type: double
              - name: prop4
                type: string

As for this example, nebula-importer will import two csv data files edge.csv and vertex.csv in turn.

Configuration Properties

options description default
version Configuration file version v1rc1
description Description of this configure file ""
clientSettings Graph client settings -
clientSettings.concurrency Number of graph clients 4
clientSettings.connection Connection options of graph client -
clientSettings.connection.user Username user
clientSettings.connection.password Password password
clientSettings.connection.address Address of graph client 127.0.0.1:3699
logPath Path of log file ""
files File list to be imported -
files[0].path File path ""
files[0].type File type csv
files[0].csv CSV file options -
files[0].csv.withHeader Whether csv file has header false
files[0].csv.withLabel Whether csv file has +/- label to represent delete/insert operation false
files[0].schema Schema definition for this file data -
files[0].schema.space Space name created in nebula ""
files[0].schema.type Schema type: vertex or edge vertex
files[0].schema.edge Edge options -
files[0].schema.edge.name Edge name in above space ""
files[0].schema.edge.withRanking Whether this edge has ranking false
files[0].schema.edge.props Properties of the edge -
files[0].schema.edge.props[0].name Property name ""
files[0].schema.edge.props[0].type Property type ""
files[0].schema.vertex Vertex options -
files[0].schema.vertex.tags Vertex tags options -
files[0].schema.vertex.tags[0].name Vertex tag name ""
files[0].schema.vertex.tags[0].props Vertex tag's properties -
files[0].schema.vertex.tags[0].props[0].name Vertex tag's property name ""
files[0].schema.vertex.tags[0].props[0].type Vertex tag's property type ""
files[0].failDataPath Failed data file path ""

CSV Data Example

There will be two csv data formats supported in the future. But now please use the first format which has no header line in your csv data file.

Without Header Line

Vertex

In vertex csv data file, first column could be a label(+/-) or the vid. Vertex VID column is the first column if the label option csv.withLabel configured false. Then property values are behind VID and the order of these values must be same as props in configuration.

1,2,this is a property string
2,4,yet another property string

with label:

  • +: Insert
  • -: Delete

In labeled - row, only need the vid which you want to delete.

+,1,2,this is a property string
-,1
+,2,4,yet anthor property string

Edge

Edge csv data file format is like the vertex description. But difference with above vertex vid is source vertex vid, destination vertex vid and edge ranking.

Without label column, src_vid, dst_vid and ranking always are first three columns in csv data file.

1,2,0,first property value
1,3,2,prop value

Ranking column is not required, you must not give it if you don't need it.

1,2,first property value
1,3,prop value

with label:

+,1,2,0,first property value
+,1,3,2,prop value

With Header Line

This feature has not been supported now. Please remove the header from your csv data file at present.

Edge

_src,_dst,_ranking,prop1,prop2
...

_src and _dst represent edge source and destination vertex id. _ranking column is value of edge ranking.

Vertex

_vid,tag1.prop1,tag2.prop2,tag1.prop3,tag2.prop4
...

_vid column represent the global unique vertex id.

Usage

From Sources

This tool depends on golang 1.13, so make sure you have install go first.

Use git to clone this project to your local directory and execute the cmd/importer.go with config parameter.

$ git clone https://github.com/vesoft-inc/nebula-importer.git
$ cd nebula-importer/cmd
$ go run importer.go --config /path/to/yaml/config/file

Docker

With docker, we can easily to import our local data to nebula without golang runtime environment.

$ docker run --rm -ti \
    --network=host \
    -v {your-config-file}:/root/{your-config-file} \
    -v {your-csv-data-dir}:/root/{your-csv-data-dir} \
    vesoft/nebula-importer
    --config /root/{your-config-file}

Log

All logs info will output to your logPath file in configuration.

TODO

  • Summary statistics of response
  • Write error log and data
  • Configure file
  • Concurrent request to Graph server
  • Create space and tag/edge automatically
  • Configure retry option for Nebula client
  • Support edge rank
  • Support label for add/delete(+/-) in first column
  • Support column header in first line
  • Support vid partition
  • Support multi-tags insertion in vertex
  • Provide docker image and usage
  • Make header adapt to props order defined in schema of configure file
  • Handle string column in nice way
  • Update concurrency and batch size online
  • Count duplicate vids
  • Support VID generation automatically
  • Output logs to file

About

Nebula Graph Importer with Go

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 98.1%
  • Dockerfile 1.9%