Skip to content

huyph/stellar-ingest

 
 

Repository files navigation

stellar-ingest - Stellar data ingestion module

Build status

Branch Test Coverage
master Build Status Coverage Status
devel Build Status Coverage Status

Introduction

This repository hosts stellar-ingest, a module of the Stellar - Graph Analytics platform developed by CSIRO Data61. This module takes care of ingesting relational data, stored as CSV files, into a graph, for further processing by Stellar.

If you are interested in running the entire Stellar platform, please refer to the instructions on the main Stellar repository.

License

Copyright 2017-2018 CSIRO Data61

Licensed under the Apache License, Version 2.0 (the "License"); you may not use the files included in this repository except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Obtaining stellar-ingest

The remainder of this page describes how to build and run stellar-ingest from its source code.

Additionally stellar-ingest is available as a Docker image. See details at the dedicated documentation page.

Compilation

stellar-ingest is a Clojure program and uses Leiningen as build tool.

To create a standalone executable (jar file) run the following command from within stellar-ingest code repository root:

lein uberjar

Usage

Currently, stellar-ingest offers two main interfaces to perform graph ingestion:

  • a command line interface (CLI), invoking stellar-ingest as a utility for every new ingestion, passing parameters on the command line;
  • a RESTful API, starting stellar-ingest as a server and making ingestion requests via http.

In both cases the following pieces of information are required for stellar-ingest to operate:

The resulting graph is represented using the Extended Property Graph Model (EPGM) and stored using its JSON serialization format. For additional details see also the stellar-utils documentation.

Note: the Stellar platform includes a Python client library, which allows to access all modules from Python scripts, using the REST API. An example, which includes data ingestion, can be found here.

CLI ingestion

Once the program has been compiled, from the repository root issue:

java \
  -cp ./target/uberjar/stellar-ingest-0.1.0-standalone.jar \
  stellar_ingest.schema \
  path/to/schema_file.json \
  path/to/output_directory \
  arbitrary-graph-label

A ready-to-use example is provided with the source code repository. It represent a (tiny) film database (FilmDB), in the spirit of the Internet Movie Database (IMDB), that can be used to create a graph linking films with actors, non-acting staff and production companies.

cd resources/examples/imdb_norm

java \
  -cp ../../../target/uberjar/stellar-ingest-0.1.0-standalone.jar \
  stellar_ingest.schema \
  imdb_norm_schema.json \
  imdb_output \
  imdb

A directory imdb_output will be created, which contains an EPGM graph with label imdb.

REST ingestion

To run stellar-ingest in server mode and access the REST API issue this command:

java \
  -cp ./target/uberjar/stellar-ingest-0.1.0-standalone.jar \
  stellar_ingest.rest

After stellar_ingest.rest it is possible to specify a port number for the server to listen on. The default port is 3000. Along with REST requests, this port also serves an API documentation page. To see it point a web browser to:

http:\\localhost:3000

Graph ingestion is triggered by am http POST request to the endpoint ingestor/ingest. The request body, encoded in JSON, is composed by the graph schema, as used in the CLI ingestion process, enriched by a few elements:

{
  "abortUrl": "https://requestb.in/HERE-YOUR-BIN",
  "completeUrl": "https://requestb.in/HERE-YOUR-BIN",
  "sessionId": "example-session",
  "label": "imdb",
  "output": "imdb_output",
  
  "sources": [ ... ],
  "mapping": { ... },
  "graphSchema": { ... },
}

You can try it on the included MovieDB example. Add additional JSON file, with the additional required elements, is provided (imdb_norm_schema_rest.json).

cd resources/examples/imdb_norm

java \
  -cp ../../../target/uberjar/stellar-ingest-0.1.0-standalone.jar \
  stellar_ingest.rest

curl -X POST \
     --header 'Content-Type: application/json' \
     --header 'Accept: application/json' \
     -d '@imdb_norm_schema_rest.json' \
     'http://localhost:3000/ingestor/ingest'

Documentation

Further documentation is provided in the doc directory of this repository.

About

Stellar data ingestion module

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Clojure 84.7%
  • Shell 14.3%
  • Dockerfile 1.0%