Quilt is a data package manager. Quilt consists of a client-side data compiler (this repository) and a server-side registry, where packages are stored.
A data package is an abstraction that encapsulates and automates data preparation. More concretely, a data package is a tree of serialized data wrapped in a Python module. Each data package has a unique handle, a revision history, and a web page. Packages are stored in a server-side registry that enforces access control.
- build to create a package from files
- push a package to store it in the registry
- install a package to download it locally
- import packages to use them in code
- Install Conda
conda install -c conda-forge pyarrow=0.3
pip install quilt
- Install the OpenSSL headers:
- Ubuntu:
sudo apt-get install libssl-dev
- Fedora:
sudo dnf install openssl-devel
- Ubuntu:
pip install quilt
Quilt currently supports Python. Spark and R support are in the works.
Chat with us on quiltdata.com.
You can use Quilt on the command line or directly in Python. Both interfaces have the same singature.
So $ quilt install foo/bar build.yml
is equivalent to quilt.install("foo/bar", "build.yml")
.
quilt -h
for a list of commandsquilt CMD -h
for info about a commandquilt login
quilt build USER/PACKAGE [SOURCE DIRECTORY or FILE.YML]
quilt push USER/PACKAGE
stores the package in the registryquilt install [-x HASH | -v VERSION | -t TAG] USER/PACKAGE
installs a packagequilt access list USER/PACKAGE
to see who has access to a packagequilt access {add, remove} USER/PACKAGE ANOTHER_USER
to set accessquilt access add public
makes a package visible to the world
quilt log USER/PACKAGE
to see all changes to a packagequilt version list USER/PACKAGE
to see versions of a packagequilt version add USER/PACKAGE VERSION HASH
to create a new versionquilt tag list USER/PACKAGE
to see tags of a packagequilt tag add USER/PACKAGE TAG HASH
to create a new tag- The tag "latest" is automatically added to the most recent push
quilt tag remove USER/PACKAGE TAG
to delete a tag
- 2.7
3.23.3- 3.4
- 3.5
- 3.6
See the Tutorial for details on build.yml
.
contents:
GROUP_NAME:
DATA_NAME:
file: PATH_TO_FILE
transform: {id, csv, tsv, ssv, xls, xlsx}
sep: "\t" # tab separated values
# or any key-word argument to pandas.read_csv (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)
Supported Pandas column types (via dtype):
- int
- bool
- float
- complex
- str
- unicode
- buffer
Everything else becomes type object. See dtypes.
pip install pylint pytest pytest-cov
pytest
will run anytest_*
files in any subdirectory- All new modules, files, and functions should have a corresponding test
- Track test code coverage by running:
python -m pytest --cov=quilt/tools/ --cov-report html:cov_html quilt/test -v
- View coverage results by opening cov_html/index.html
pip install git+https://github.com/quiltdata/quilt.git
git clone https://github.com/quiltdata/quilt.git
cd quilt
- From the repository root:
pip install -e .