Skip to content

Commit

Permalink
Update docs with notices regarding DataPusher and scheming
Browse files Browse the repository at this point in the history
The automatic uploads DataStore docs only mentioned DataPusher. We now
list all known options, recommending xloader as default, and mark
datapusher as unmaintained.

In the customizing the metadata fields tutorial, make the suggestion to
use ckanext-scheming much more prominent, and list its benefits.
  • Loading branch information
amercader committed Aug 20, 2024
1 parent 07e89bf commit 4446e31
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 17 deletions.
15 changes: 11 additions & 4 deletions doc/extensions/adding-custom-fields.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,22 @@ restrict the possible values to a defined list. By using CKAN's IDatasetForm
plugin interface, a CKAN plugin can add custom, first-class metadata fields to
CKAN datasets, and can do custom validation of these fields.

.. warning::

In most cases users should use `ckanext-scheming <https://github.com/ckan/ckanext-scheming>`_
rather than the low level interfaces described in this tutorial. The ckanext-scheming
extension allows:

* Metadata schema configuration using a YAML or JSON schema description
* Automatic conversion of custom fields to the internal representation used by CKAN
* Automatic use of relevant template snippets according to the field type for editing and display
* Use of may pre-configured presets for multiple choice fields, dates, repeating subfields, etc.

.. seealso::

In this tutorial we are assuming that you have read the
:doc:`/extensions/tutorial`.

You may also want to check the [ckanext-scheming](https://github.com/ckan/ckanext-scheming)
extension, as it will allow metadata schema configuration using a YAML or JSON
schema description, replete with custom validation and template snippets for
editing and display.

CKAN schemas and validation
---------------------------
Expand Down
33 changes: 20 additions & 13 deletions doc/maintaining/datastore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ When a resource is added to the DataStore, you get:
The DataStore is integrated into the :doc:`CKAN API </api/index>` and
authorization system.

The DataStore is generally used alongside the
`DataPusher <https://github.com/ckan/datapusher>`_, which will
The DataStore is generally used alongside other tools which will
automatically upload data to the DataStore from suitable files, whether
uploaded to CKAN's FileStore or externally linked.
uploaded to CKAN's FileStore or externally linked. See :ref:`automatic_uploads`
for more details.

.. contents::
:depth: 1
Expand Down Expand Up @@ -192,24 +192,31 @@ You can now delete the DataStore table with::

To find out more about the Data API, see `The Data API`_.

.. _automatic_uploads:

---------------------------------------------------
DataPusher: Automatically Add Data to the DataStore
---------------------------------------------------
------------------------------------------
Automatically Adding Data to the DataStore
------------------------------------------

Often, one wants data that is added to CKAN (whether it is linked to or
In most cases, you will want data that is added to CKAN (whether it is linked to or
uploaded to the :doc:`FileStore <filestore>`) to be automatically added to the
DataStore. This requires some processing, to extract the data from your files
and to add it to the DataStore in the format the DataStore can handle.

This task of automatically parsing and then adding data to the DataStore is
performed by the `DataPusher`_, a service that runs asynchronously and can be installed
alongside CKAN.
This task of automatically parsing and then adding data to the DataStore can be performed
by different tools, you can choose the one the best fits your requirements:

To install this please look at the docs here: https://github.com/ckan/datapusher
* `XLoader <https://github.com/ckan/ckanext-xloader>`_ is the officially supported extension for
automated uploads to the DataStore. It runs as a :doc:`background job <background-tasks>` and supports
type guessing and limiting the number of rows imported among other settings.
* `DataPusher+ (DataPusher Plus) <https://github.com/dathere/datapusher-plus>`_ is a next-generation replacement for the
DataPusher, maintained by `datHere <https://dathere.com/>`_. It focuses on increased performance and robustness and
includes data pre-processing capabilities to infer fields, transform data, etc.
* `AirCan <https://github.com/datopian/aircan>`_ is a tool built on top of Apache Airflow maintained
by `Datopian <https://www.datopian.com/>`_ that among other functionalities supports automated data uploads to the DataStore.
* `DataPusher <https://github.com/ckan/datapusher>`_ is a **legacy tool** that is no longer maintained.
It presents significant limitations so users are encouraged to migrate to one of the tools above.

.. note:: The DataPusher only imports the first worksheet of a spreadsheet. It also does
not support duplicate column headers. That includes blank column headings.

.. _data_dictionary:

Expand Down

0 comments on commit 4446e31

Please sign in to comment.