Below is the documentation for Quilt 3. See here and here from Quilt 2.
- open.quiltdata.com is a petabyte-scale open data portal that runs on Quilt
- quiltdata.com includes case studies, use cases, videos, and information on how you can run a private Quilt instance
Quilt is for data-driven teams of both technical and non-technical members (executives, data scientists, data engineers, sales, product, etc.).
Quilt adds search, visual content preview, and versioning to every file in S3.
Quilt consists of a Python client, web catalog, lambda functions—all of which are open source—plus a suite of backend services and Docker containers orchestrated by CloudFormation. The latter are available under a paid license for private use on quiltdata.com.
Quilt addresses five key use cases:
- Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to create an IAM role).
- Understand data better through inline documentation (Jupyter notebooks, markdown) and visualizations (Vega, Vega Lite)
- Discover related data by indexing objects in ElasticSearch
- Model data by providing a home for large data and models that don't fit in git, and by providing immutable versions for objects and data sets (a.k.a. "Quilt Packages")
- Decide by broadening data access within the organization and supporting the documentation of decision processes through audit-able versioning and inline documentation
- Address performance issues with push (e.g. re-hash)
- Investigate and implement more efficient manifest formats (e.g. Parquet), that scale to 10M keys; consider abbreviated "fast manifests" for lazy browsing
- Refactor
s3://bucket/.quilt
for improved listing and delete performance - Provide Presto-DB-powered services for filtering package repos with SQL
- Ability to fork/merge packages
- Data quality monitoring
- Evaluate min.io and ceph.io as shims
- Evaluate feasibility of on-prem local storage as a repo
- Evaluate K8s and Terraform to replace CloudFormation
- Shim lambdas (consider serverless.com)
- Shim ElasticSearch (consider SOLR)
- Shim IAM via RBAC