Name	Name	Last commit message	Last commit date
Latest commit History 958 Commits
appveyor	appveyor
catalog	catalog
compiler	compiler
docs	docs
registry	registry
.gitignore	.gitignore
.travis.yml	.travis.yml
LICENSE	LICENSE
README.md	README.md
_config.yml	_config.yml
appveyor.yml	appveyor.yml
book.json	book.json
circle.yml	circle.yml
pylint_git_commit_hook.sh	pylint_git_commit_hook.sh

Name

Last commit message

Last commit date

appveyor

pylint_git_commit_hook.sh

OS	`master`	Python support
		2.7, 3.5, 3.6
		2.7, 3.5, 3.6
		3.5, 3.6

Docs

Visit docs.quiltdata.com. Or browse the docs on GitHub.

Quilt is a data registry

Quilt provides versioned, reusable building blocks for analysis in the form of data packages. A data package may contain data of any type or size. In spirit, Quilt does for data what package managers and Docker registries do for code: provide a centralized, collaborative store of record.

Benefits

Reproducibility - Imagine source code without versions. Ouch. Why live with un-versioned data? Versioned data makes analysis reproducible by creating unambiguous references to potentially complex data dependencies.
Collaboration and transparency - Data likes to be shared. Quilt offers a centralized data warehouse for finding and sharing data.
Auditing - the registry tracks all reads and writes so that admins know when data are accessed or changed
Less data prep - the registry abstracts away network, storage, and file format so that users can focus on what they wish to do with the data.
Deduplication - Data fragments are hashed with SHA256. Duplicate data fragments are written to disk once globally per user. As a result, large, repeated data fragments consume less disk and network bandwidth.
Faster analysis - Serialized data loads 5 to 20 times faster than files. Moreover, specialized storage formats like Apache Parquet minimize I/O bottlenecks so that tools like Presto DB and Hive run faster.

Commands

Here are the basic Quilt commands:

Service

Quilt is offered as a managed service at quiltdata.com.

Architecture

Quilt consists of three source-level components:

A data catalog
- Displays package meta-data in HTML
- Implemented with JavaScript with redux, sagas
A data registry
- Controls permissions
- Stores pacakge fragments in blob storage
- Stores package meta-data
- De-duplicates repeated data fragments
- Implemented in Python with Flask and PostgreSQL
A data compiler
- Serializes tabular data to Apache Parquet
- Transforms and parses files
- builds packages locally
- pushes packages to the registry
- pulls packages from the registry
- Implemented in Python with pandas and PyArrow

Languages

Jupyter Notebook 56.7%

JavaScript 25.9%

Python 17.0%

HTML 0.4%

Dockerfile 0.0%

CSS 0.0%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docs

Quilt is a data registry

Benefits

Commands

Service

Architecture

About

Releases

Packages

Languages

License

sampathweb/quilt

Folders and files

Latest commit

History

Repository files navigation

Docs

Quilt is a data registry

Benefits

Commands

Service

Architecture

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages