Skip to content

manzo1991/quilt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

Package and version data

Quilt is a data package manager. Quilt consists of a client-side data compiler (this repository) and a server-side registry, where packages are stored.

Data packages

A data package is an abstraction that encapsulates and automates data preparation. More concretely, a data package is a tree of serialized data wrapped in a Python module. Each data package has a unique handle, a revision history, and a web page. Packages are stored in a server-side registry that enforces access control.

Package lifecycle

  • build to create a package from files
  • push a package to store it in the registry
  • install a package to download it locally
  • import packages to use them in code

Installation

Mac / Windows

  1. Install Conda
  2. conda install -c conda-forge pyarrow=0.3
  3. pip install quilt

Linux

  1. Install the OpenSSL headers:
    • Ubuntu: sudo apt-get install libssl-dev
    • Fedora: sudo dnf install openssl-devel
  2. pip install quilt

Learn

Future

Quilt currently supports Python. Spark and R support are in the works.

Questions?

Chat with us on quiltdata.com.

Command summary

You can use Quilt on the command line or directly in Python. Both interfaces have the same singature. So $ quilt install foo/bar build.yml is equivalent to quilt.install("foo/bar", "build.yml").

  • quilt -h for a list of commands
  • quilt CMD -h for info about a command
  • quilt login
  • quilt build USER/PACKAGE [SOURCE DIRECTORY or FILE.YML]
  • quilt push USER/PACKAGE stores the package in the registry
  • quilt install [-x HASH | -v VERSION | -t TAG] USER/PACKAGE installs a package
  • quilt access list USER/PACKAGE to see who has access to a package
  • quilt access {add, remove} USER/PACKAGE ANOTHER_USER to set access
    • quilt access add public makes a package visible to the world
  • quilt log USER/PACKAGE to see all changes to a package
  • quilt version list USER/PACKAGE to see versions of a package
  • quilt version add USER/PACKAGE VERSION HASH to create a new version
  • quilt tag list USER/PACKAGE to see tags of a package
  • quilt tag add USER/PACKAGE TAG HASH to create a new tag
    • The tag "latest" is automatically added to the most recent push
  • quilt tag remove USER/PACKAGE TAG to delete a tag

Supported Python versions

  • 2.7
  • 3.2
  • 3.3
  • 3.4
  • 3.5
  • 3.6

Build recipes

build.yml structure and options

See the Tutorial for details on build.yml.

contents:
  GROUP_NAME:
    DATA_NAME:
      file: PATH_TO_FILE
      transform: {id, csv, tsv, ssv, xls, xlsx}
      sep: "\t" # tab separated values
      # or any key-word argument to pandas.read_csv (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)

Column types

Supported Pandas column types (via dtype):

  • int
  • bool
  • float
  • complex
  • str
  • unicode
  • buffer

Everything else becomes type object. See dtypes.

Developer

  • pip install pylint pytest pytest-cov
  • pytest will run any test_* files in any subdirectory
  • All new modules, files, and functions should have a corresponding test
  • Track test code coverage by running: python -m pytest --cov=quilt/tools/ --cov-report html:cov_html quilt/test -v
  • View coverage results by opening cov_html/index.html

Install latest from master

  • pip install git+https://github.com/quiltdata/quilt.git

Local installation

  1. git clone https://github.com/quiltdata/quilt.git
  2. cd quilt
  3. From the repository root: pip install -e .

About

Quilt - manage data like code

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%