Skip to content

Commit

Permalink
Improved and fixed documentation. Thanks to @mwainwright
Browse files Browse the repository at this point in the history
  • Loading branch information
domoritz authored and amercader committed Oct 12, 2012
1 parent a33b2df commit 529ec77
Showing 1 changed file with 47 additions and 48 deletions.
95 changes: 47 additions & 48 deletions doc/datastore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,36 +14,35 @@ Installation and Configuration
.. warning:: Make sure that you follow the steps below and make sure that the settings are correct. Wrong settings could lead to serious security issues.

The DataStore in previous lives required a custom setup of ElasticSearch and Nginx,
but that is no more, as it the relational database management system PostgreSQL.
However, you should set-up a separate database for the datastore
and create a read-only user to make your CKAN installation save.
but that is no more, as it now uses the relational database management system PostgreSQL.
However, you should set up a separate database for the DataStore
and create a read-only user to make your CKAN and DataStore installation safe.

In your ``config`` file ensure that the datastore extension is enabled::

ckan.plugins = datastore

Also ensure that the ``ckan.datastore.write_url`` and ``datastore.read_url`` variables are set::

ckan.datastore.write_url = postgresql://ckanuser:pass@localhost/datastore
ckan.datastore.write_url = postgresql://writeuser:pass@localhost/datastore
ckan.datastore.read_url = postgresql://readonlyuser:pass@localhost/datastore

A few things have to be kept in mind
A few things have to be kept in mind:

* The datastore cannot be on the CKAN database (except for testing)
* The write user (i.e. ``ckanuser``) and read-only user (i.e. ``readonlyuser``) cannot be the same
* The DataStore cannot be on the CKAN database (except for testing)
* The write user (i.e. ``writeuser``) and read-only user (i.e. ``readonlyuser``) cannot be the same

To create a new database and a read-only user, use the provided paster commands after you have set the right database URLs.::
To create a new database with a write user and read-only user, use this paster command::

paster datastore create-db SQL_SUPER_USER
paster datastore create-read-only-user SQL_SUPER_USER
paster datastore create-all SQL_SUPER_USER

To test the setup you can create a new datastore, to do so you can run the following command::
To test the set-up you can create a new DataStore. To do so you can run the following command::

curl -X POST http://127.0.0.1:5000/api/3/action/datastore_create -H "Authorization: {YOUR-API-KEY}" -d '{"resource_id": "{RESOURCE-ID}", "fields": [ {"id": "a"}, {"id": "b"} ], "records": [ { "a": 1, "b": "xyz"}, {"a": 2, "b": "zzz"} ]}'

A table named after the resource id should have been created on your datastore
database, and visiting this URL should return a response from the datastore with
the previous records::
A table named after the resource id should have been created on your DataStore
database. Visiting the following URL should return a response from the DataStore with
the records inserted above::

http://127.0.0.1:5000/api/3/action/datastore_search?resource_id={RESOURCE_ID}

Expand All @@ -55,11 +54,11 @@ The DataStore is distinct but complementary to the FileStore (see
:doc:`filestore`). In contrast to the the FileStore which provides 'blob'
storage of whole files with no way to access or query parts of that file, the
DataStore is like a database in which individual data elements are accessible
and queryable. To illustrate this distinction consider storing a spreadsheet
file like a CSV or Excel document. In the FileStore this filed would be stored
and queryable. To illustrate this distinction, consider storing a spreadsheet
file like a CSV or Excel document. In the FileStore this file would be stored
directly. To access it you would download the file as a whole. By contrast, if
the spreadsheet data is stored in the DataStore one would be able to access
individual spreadsheet rows via a simple web-api as well as being able to make
the spreadsheet data is stored in the DataStore, one would be able to access
individual spreadsheet rows via a simple web API, as well as being able to make
queries over the spreadsheet contents.


Expand All @@ -71,7 +70,7 @@ uploaded to the :doc:`FileStore <filestore>`) to be automatically added to the
DataStore. This requires some processing, to extract the data from your files
and to add it to the DataStore in the format the DataStore can handle.

This task of automatically parsing and then adding data to the datastore is
This task of automatically parsing and then adding data to the DataStore is
performed by a DataStorer, a queue process that runs asynchronously and can be
triggered by uploads or other activities. The DataStorer is an extension and can
be found, along with installation instructions, at: https://github.com/okfn/ckanext-datastorer
Expand All @@ -80,18 +79,18 @@ be found, along with installation instructions, at: https://github.com/okfn/ckan
The DataStore Data API
======================

The DataStore's Data API, which derives from the underlying data-table,
The DataStore's Data API, which derives from the underlying data table,
is RESTful and JSON-based with extensive query capabilities.

Each resource in a CKAN instance can have an associated DataStore 'table'. The
basic API for accessing the DataStore is detailed below. For a detailed
basic API for accessing the DataStore is outlined below. For a detailed
tutorial on using this API see :doc:`using-data-api`.


API Reference
-------------

.. note:: Lists can always be expressed in different ways. It is possible to use lists, comma separated strings or single items. These are valid lists: ``['foo', 'bar']``, ``foo, bar``, ``"foo", "bar"`` and ``foo``.
.. note:: Lists can always be expressed in different ways. It is possible to use lists, comma separated strings or single items. These are valid lists: ``['foo', 'bar']``, ``'foo, bar'``, ``"foo", "bar"`` and ``'foo'``.


datastore_create
Expand All @@ -103,7 +102,7 @@ The datastore_create API endpoint allows a user to post JSON data to be stored a
resource_id: resource_id, # the data is going to be stored against.
aliases: # list of names for read only aliases to the resource
fields: [] # a list of dictionaries of fields/columns and their extra metadata.
records: [] # a list of dictionaries of the data, eg: [{"dob": "2005", "some_stuff": ['a', b']}, ..]
records: [] # a list of dictionaries of the data, eg: [{"dob": "2005", "some_stuff": ['a', 'b']}, ..]
primary_key: # list of fields that represent a unique key
indexes: # indexes on table
}
Expand All @@ -115,42 +114,42 @@ See :ref:`fields` and :ref:`records` for details on how to lay out records.
datastore_delete
~~~~~~~~~~~~~~~~

The datastore_delete API endpoint allows a user to delete from a resource. The JSON for searching must be in the following form::
The datastore_delete API endpoint allows a user to delete records from a resource. The JSON for searching must be in the following form::

{
resource_id: resource_id # the data that is going to be deleted.
filter: # dictionary of matching conditions to delete
# e.g {'key1': 'a. 'key2': 'b'}
# e.g {'key1': 'a', 'key2': 'b'}
# this will be equivalent to "delete from table where key1 = 'a' and key2 = 'b' "
}


datastore_upsert
~~~~~~~~~~~~~~~~

The datastore_upsert API endpoint allows a user to add data to an existing datastore resource. In order for the upsert and update to work a unique key has to defined via the datastore_create API endpoint command.
The datastore_upsert API endpoint allows a user to add or edit records in an existing DataStore resource. In order for the ``upsert`` and ``update`` methods to work, a unique key has to defined via the datastore_create API endpoint command.
The JSON for searching must be in the following form::

{
resource_id: resource_id # resource id that the data is going to be stored under.
records: [] # a list of dictionaries of the data, eg: [{"dob": "2005", "some_stuff": ['a', b']}, ..]
records: [] # a list of dictionaries of the data, eg: [{"dob": "2005", "some_stuff": ['a', 'b']}, ..]
method: # the method to use to put the data into the datastore
# possible options: upsert (default), insert, update
}

upsert
``upsert``
Update if record with same key already exists, otherwise insert. Requires unique key.
insert
Insert only. This method is faster that upsert because checks are omitted. Does *not* require a unique key.
update
Update only. Exception will occur if the key that should be updated does not exist. Requires unique key.
``insert``
Insert only. This method is faster that upsert, but will fail if any inserted record matches an existing one. Does *not* require a unique key.
``update``
Update only. An exception will occur if the key that should be updated does not exist. Requires unique key.

.. _datastore_search:

datastore_search
~~~~~~~~~~~~~~~~

The datastore_search API endpoint allows a user to search data at a resource.
The datastore_search API endpoint allows a user to search data in a resource.
The JSON for searching must be in the following form::

{
Expand All @@ -171,7 +170,7 @@ The JSON for searching must be in the following form::
datastore_search_sql
~~~~~~~~~~~~~~~~~~~~

The datastore_search_sql API endpoint allows a user to search data at a resource or connect multiple resources with join expressions. The underlying SQL engine is the `PostgreSQL engine <http://www.postgresql.org/docs/9.1/interactive/sql/.html>`_. The JSON for searching must be in the following form::
The datastore_search_sql API endpoint allows a user to search data in a resource or connect multiple resources with join expressions. The underlying SQL engine is the `PostgreSQL engine <http://www.postgresql.org/docs/9.1/interactive/sql/.html>`_. The JSON for searching must be in the following form::

{
sql: # a single sql select statement
Expand All @@ -183,9 +182,9 @@ The datastore_search_sql API endpoint allows a user to search data at a resource
datastore_search_htsql
~~~~~~~~~~~~~~~~~~~~~~

.. note:: HTSQL is not in the core datastore and has to be installed as an extension. The extension is available on https://github.com/okfn/ckanext-htsql.
.. note:: HTSQL is not in the core DataStore. To use it, it is necessary to install the ckanext-htsql extension available at https://github.com/okfn/ckanext-htsql.

The datastore_search_htsql API endpoint allows a user to search data at a resource using the `HTSQL <http://htsql.org/doc/>`_ query expression language. The JSON for searching must be in the following form::
The datastore_search_htsql API endpoint allows a user to search data in a resource using the `HTSQL <http://htsql.org/doc/>`_ query expression language. The JSON for searching must be in the following form::

{
htsql: # a htsql query statement.
Expand All @@ -196,14 +195,14 @@ The datastore_search_htsql API endpoint allows a user to search data at a resour
Fields
~~~~~~

Fields define the column names and the type of the data in a column. They are defined as an array of fields. One field is defined as follows::
Fields define the column names and the type of the data in a column. A field is defined as follows::

{
"id": # a string which defines the column name
"type": # the data type for the column
}

Field **types are optional** and will be guessed by the provided data. However, setting the types ensures that future inserts to not fail because of wrong types. See :ref:`valid-types` for details on which types are valid.
Field **types are optional** and will be guessed by the DataStore from the provided data. However, setting the types ensures that future inserts will not fail because of wrong types. See :ref:`valid-types` for details on which types are valid.

Example::

Expand All @@ -223,7 +222,7 @@ Example::
Records
~~~~~~~

Records are defined as an array of records. One record is the data to be inserted in a table and is defined as follows::
A record is the data to be inserted in a table and is defined as follows::

{
"<id>": # data to be set
Expand All @@ -235,7 +234,7 @@ Example::
[
{
"foo": 100,
"bar": "I'm a text."
"bar": "Here's some text"
},
{
"foo": 42
Expand All @@ -247,16 +246,16 @@ Example::
Field types
-----------

The datastore supports all types supported by PostgreSQL as well as a few additions. A list of the PostgreSQL types can be found in the `type section of the documentation`_. Below you can find a list of the most common data types. The ``json`` type has been added as a storage for nested data.
The DataStore supports all types supported by PostgreSQL as well as a few additions. A list of the PostgreSQL types can be found in the `type section of the documentation`_. Below you can find a list of the most common data types. The ``json`` type has been added as a storage for nested data.

.. _type section of the documentation: http://www.postgresql.org/docs/9.1/static/datatype.html


text
Arbitrary text data, e.g. ``I'm a text``.
Arbitrary text data, e.g. ``Here's some text``.
json
Arbitrary nested json data, e.g ``{"foo": 42, "bar": [1, 2, 3]}``.
Please note that this type is a custom type that is wrapped by the datastore.
Please note that this type is a custom type that is wrapped by the DataStore.
date
Date without time, e.g ``2012-5-25``.
time
Expand All @@ -268,23 +267,23 @@ int
float
Floats, e.g. ``1.61803``.
bool
Boolen values, e.g. ``true``, ``0``
Boolean values, e.g. ``true``, ``0``


You can find more information about the formatting of dates in the `date/time types section of the documentation`_.
You can find more information about the formatting of dates in the `date/time types section of the PostgreSQL documentation`_.

.. _date/time types section of the documentation: http://www.postgresql.org/docs/9.1/static/datatype-datetime.html
.. _date/time types section of the PostgreSQL documentation: http://www.postgresql.org/docs/9.1/static/datatype-datetime.html


Table aliases
-------------

Resources in the datastore can have multiple aliases that are easier to remember than the resource id. Aliases can be created and edited with the datastore_create API endpoint. All aliases can be found in a special view called ``_table_metadata``.
A resource in the DataStore can have multiple aliases that are easier to remember than the resource id. Aliases can be created and edited with the datastore_create API endpoint. All aliases can be found in a special view called ``_table_metadata``.

Comparison of different querying methods
----------------------------------------

The datastore supports querying with the datastore_search and datastore_search_sql API endpoint. They are similar but support different features. The following list gives an overview on the different methods.
The DataStore supports querying with multiple API endpoints. They are similar but support different features. The following list gives an overview of the different methods.

============================== ======================= =========================== =============================
.. :ref:`datastore_search` :ref:`datastore_search_sql` :ref:`datastore_search_htsql`
Expand Down

0 comments on commit 529ec77

Please sign in to comment.