Skip to content

CODAIT/pardata

Repository files navigation

PyDAX

PyPI PyPI - Python Version PyPI - Implementation Gitter Runtime Tests Lint Docs Development Environment

PyDAX is a Python API that enables data consumers and distributors to easily use and share datasets, and establishes a standard for exchanging data assets. It enables:

  • a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and
  • a data distributor to have a consistent, safe, and open source way to share datasets with interested communities.

Quick Example

>>> import pydax
>>> pydax.list_all_datasets()
{'claim_sentences_search': ('1.0.2',),
 ..., 'wikitext103': ('1.0.1',)}
>>> pydax.load_dataset('wikitext103')
{...}  # Content of the dataset

Install the Package & its Dependencies

To install the latest version of PyDAX, run

$ pip install pydax

Alternatively, if you have downloaded the source, switch to the source directory (same directory as this README file, cd /path/to/pydax-source) and run

$ pip install -U .

Quick Start

Import the package and load a dataset. PyDAX will download WikiText-103 dataset (version 1.0.1) if it's not already downloaded, and then load it.

import pydax
wikitext103_data = pydax.load_dataset('wikitext103')

View available PyDAX datasets and their versions.

>>> pydax.list_all_datasets()
{'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}

To view your globally set configs for PyDAX, such as your default data directory, use pydax.get_config.

>>> pydax.get_config()
Config(DATADIR=PosixPath('dir/to/download/load/from'), ..., DATASET_SCHEMA_FILE_URL='file/to/load/datasets/from')

By default, pydax.load_dataset downloads to and loads from ~/.pydax/data/<dataset-name>/<dataset-version>/. To change the default data directory, use pydax.init.

pydax.init(DATADIR='new/dir/to/download/load/from')

Load a previously downloaded dataset using pydax.load_dataset. With the new default data dir set, PyDAX now searches for the Groningen Meaning Bank dataset (version 1.0.2) in new/dir/to/download/load/from/gmb/1.0.2/.

gmb_data = load_dataset('gmb', version='1.0.2', download=False)  # assuming GMB dataset was already downloaded

To learn more about PyDAX, check out the documentation and the tutorial.