Branch | Test | Coverage |
---|---|---|
master | ||
devel |
This repository hosts stellar-ingest, a module of the Stellar - Graph Analytics platform developed by CSIRO Data61. This module takes care of ingesting relational data, stored as CSV files, into a graph, for further processing by Stellar.
If you are interested in running the entire Stellar platform, please refer to the instructions on the main Stellar repository.
Copyright 2017-2018 CSIRO Data61
Licensed under the Apache License, Version 2.0 (the "License"); you may not use the files included in this repository except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The remainder of this page describes how to build and run stellar-ingest from its source code.
Additionally stellar-ingest is available as a Docker image. See details at the dedicated documentation page.
stellar-ingest is a Clojure program and uses Leiningen as build tool.
To create a standalone executable (jar file) run the following command from within stellar-ingest code repository root:
lein uberjar
Currently, stellar-ingest offers two main interfaces to perform graph ingestion:
- a command line interface (CLI), invoking stellar-ingest as a utility for every new ingestion, passing parameters on the command line;
- a RESTful API, starting stellar-ingest as a server and making ingestion requests via http.
In both cases the following pieces of information are required for stellar-ingest to operate:
- input CSV files (see details on format);
- a JSON-encoded graph schema, including mappings between CSV columns and graph elements.
The resulting graph is represented using the Extended Property Graph Model (EPGM) and stored using its JSON serialization format. For additional details see also the stellar-utils documentation.
Note: the Stellar platform includes a Python client library, which allows to access all modules from Python scripts, using the REST API. An example, which includes data ingestion, can be found here.
Once the program has been compiled, from the repository root issue:
java \
-cp ./target/uberjar/stellar-ingest-0.1.0-standalone.jar \
stellar_ingest.schema \
path/to/schema_file.json \
path/to/output_directory \
arbitrary-graph-label
A ready-to-use example is provided with the source code repository. It represent a (tiny) film database (FilmDB), in the spirit of the Internet Movie Database (IMDB), that can be used to create a graph linking films with actors, non-acting staff and production companies.
cd resources/examples/imdb_norm
java \
-cp ../../../target/uberjar/stellar-ingest-0.1.0-standalone.jar \
stellar_ingest.schema \
imdb_norm_schema.json \
imdb_output \
imdb
A directory imdb_output
will be created, which contains an EPGM graph with
label imdb
.
To run stellar-ingest in server mode and access the REST API issue this command:
java \
-cp ./target/uberjar/stellar-ingest-0.1.0-standalone.jar \
stellar_ingest.rest
After stellar_ingest.rest
it is possible to specify a port number for the
server to listen on. The default port is 3000. Along with REST requests, this
port also serves an API documentation page. To see it point a web browser to:
http:\\localhost:3000
Graph ingestion is triggered by am http POST
request to the endpoint
ingestor/ingest
. The request body, encoded in JSON, is composed by the graph
schema, as used in the CLI ingestion process, enriched by a few elements:
{
"abortUrl": "https://requestb.in/HERE-YOUR-BIN",
"completeUrl": "https://requestb.in/HERE-YOUR-BIN",
"sessionId": "example-session",
"label": "imdb",
"output": "imdb_output",
"sources": [ ... ],
"mapping": { ... },
"graphSchema": { ... },
}
You can try it on the included MovieDB example. Add additional JSON file, with the
additional required elements, is provided (imdb_norm_schema_rest.json
).
cd resources/examples/imdb_norm
java \
-cp ../../../target/uberjar/stellar-ingest-0.1.0-standalone.jar \
stellar_ingest.rest
curl -X POST \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
-d '@imdb_norm_schema_rest.json' \
'http://localhost:3000/ingestor/ingest'
Further documentation is provided in the doc directory of this repository.