Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
BillSwirrl authored Jul 11, 2018
1 parent 932a45d commit 8c923bf
Showing 1 changed file with 43 additions and 6 deletions.
49 changes: 43 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,35 @@

This project transforms tables of observations and reference data into [rdf data cube](https://www.w3.org/TR/vocab-data-cube/) resources specified as [csvw](https://github.com/w3c/csvw).

## How to run table2qb

UPDATE THIS ONCE WE'VE DONE ISSUES [47](https://github.com/Swirrl/table2qb/issues/47) and [45](https://github.com/Swirrl/table2qb/issues/45).

```BASE_URI=your_domain java -jar target/table2qb-0.1.3-SNAPSHOT-standalone.jar exec pipeline --input-csv input_file --column-config config_file --output-file output_file```

### pipeline

This parameter must be one of:

* cube-pipeline
* components-pipeline
* codelist-pipeline

### input_file

a csv file of the correct structure - contents must correspond to the choice of pipeline - see section below describing the structure required for each.

explain how the config file ('columns.csv') is used to determine how the data is interpreted

### config_file

explain the structure of the config file

### output_file

The output of the process: a single file as RDF in Turtle format.


### Observation Data

The observation input table should be arranged as [tidy-data](http://vita.had.co.nz/papers/tidy-data.pdf) e.g. one row per observation, one column per component (i.e. dimension, attribute or measure). The output is a set of csvw documents - i.e. csv with json-ld metadata - that can be translated into RDF via a [csv2rdf](http://www.w3.org/TR/csv2rdf/) processor. The outputs that make up the cube are:
Expand All @@ -19,11 +48,15 @@ The observation input table should be arranged as [tidy-data](http://vita.had.co

We also provide a set of `skos:ConceptScheme`s enumerating all of the codes used in each of the componentss (via `used-codes-scheme.json` and `used-codes-concepts.json`). These are useful for navigating within a cube by using the marginals - in other words this saves you from having to scan through all of the observations in order to establish the extent of the cube.

### Reference Data
### Definition of components

The project provides pipelines for preparing reference data. These can be used for managing reference data across multiple `qb:DataSet`s.

- Components: given a tidy-data input of one component per row, this pipeline creates a `components.csv` file and a `components.json` for creating `qb:ComponentProperty`s in an `owl:Ontology`. Note that components are the dimensions, attributes and measures themselves whereas the component-specifications are what links these to a given data-structure-definition.


### Definition of code-lists

- Codelists: given a tidy-data input of one code per row, this pipeline creates a `codelist.csv` file and a `codelist.json` for creating `skos:Concepts` in an `skos:ConceptScheme`. Note that these codelists describe the universal set of codes that may be the object of a component (making it a `qb:CodedProperty`) not the (sub)set that have been used within a cube.

## Configuration
Expand All @@ -38,20 +71,24 @@ The dataset should have the following columns:
- `property_template` - the predicate used to attach the (cell) values to the observations
- `value_template` - the URI template applied to the cell values
- `datatype` - as per csvw:datatype, how the cell value should be parsed (typically `string` for everything except the value column which will be `number`)
- `value-transformation` - WHAT DOES THIS DO? WHAT ARE THE ALLOWED OPTIONS

This initial draft also includes several conventions in the code that ought to be generalised to configuration - particularly how cell values are slugged.


## Example

The [./examples/regional-trade](./examples/regional-trade) directory provides an example and instructions for running it.
The [./examples/employment](./examples/employment) directory provides a full example and instructions for running it.

## How to compile table2qb

table2qb is written in Clojure

## Documentation
lein uberjar

There's more background in the [doc](/doc) folder:
where to get leiningen, any configuration of leiningen required?

- the [requirements](/doc/requirements.md) explain some of the challenges face and contexts where this will need to work
- the [architecture](/doc/architecture.md) overview describes the overall process of creating rdf-cubes from tablular inputs
need Java. Any particular version?

## License

Expand Down

0 comments on commit 8c923bf

Please sign in to comment.