Rev README and include basic roadmap (quiltdata#1154)

sampathweb · Sep 19, 2019 · 079c387 · 079c387
1 parent 3c979f7
commit 079c387
Show file tree

Hide file tree

Showing 2 changed files with 100 additions and 54 deletions.
diff --git a/README.md b/README.md
@@ -6,40 +6,63 @@
 [![docs on_gitbook](https://img.shields.io/badge/docs-on_gitbook-blue.svg?style=flat-square)](https://docs.quiltdata.com/)
 [![chat on_slack](https://img.shields.io/badge/chat-on_slack-blue.svg?style=flat-square)](https://slack.quiltdata.com/)
 [![codecov](https://codecov.io/gh/quiltdata/quilt/branch/master/graph/badge.svg)](https://codecov.io/gh/quiltdata/quilt)
-[![pypi](https://img.shields.io/pypi/v/quilt.svg?style=flat-square)](https://pypi.org/project/quilt3/)
+[![pypi](https://img.shields.io/pypi/v/quilt3.svg?style=flat-square)](https://pypi.org/project/quilt3/)
 
-*Note: this is the documentation for [Quilt 3](https://blog.quiltdata.com/rethinking-s3-announcing-t4-a-team-data-hub-8e63ce7ec988). For Quilt 2 see [here](https://docs.quiltdata.com/v/quilt-2-master/) and [here](https://github.com/quiltdata/quilt/tree/quilt-2-master).*
+> Below is the documentation for [Quilt 3](https://quiltdata.com/). See [here](https://docs.quiltdata.com/v/quilt-2-master/) and [here](https://github.com/quiltdata/quilt/tree/quilt-2-master) from Quilt 2.
 
-## Overview
+# Quilt is a versioned data portal for AWS
 
-Quilt is a collaboration tool for creating, managing, and sharing
-datasets in S3. Quilt users transform raw, messy data in S3 buckets
-into immutable datasets--reusable, trusted building blocks that are
-easy to version, test, share and catalog. Working with datasets in
-Quilt speeds up model creation, accelerates experimentation, reduces
-downtime, and increases the productivity of data science teams.
+## Who is Quilt for?
+Quilt is for data-driven teams of both technical
+and non-technical members (executives, data scientists,
+data engineers, sales, product, etc.).
 
-## Collaborate in S3
+## What does Quilt do?
+Quilt adds search, visual content preview, and
+versioning to every file in S3.
 
-* Quilt adds search, content preview, versioning, and a Python API to any S3 bucket
-* Every file in Quilt is versioned and searchable
-* Quilt is for data scientists, data engineers, and data-driven teams
+## How does Quilt work?
+Quilt consists of a Python client, web catalog, lambda
+functions&mdash;all of which are open source&mdash;plus
+a suite of backend services and Docker containers
+orchestrated by CloudFormation.
+The latter are available under a paid license for
+private use on [quiltdata.com](https://quiltdata.com).
 
-![](https://github.com/quiltdata/quilt/blob/master/docs/imgs/quilt.gif?raw=true)
 
-### Use cases
-* Collaborate - get everyone on the same page by pointing them all to the same immutable data version
-* Experiment faster - blob storage is schemaless and scalable, so iterations are quick
-* Recover, rollback, and reproduce with immutable packages
-* Understand what's in S3 - plaintext and faceted search over S3
+## Use cases
 
-### Key features
-* Browse, search any S3 bucket
-* Preview images, Jupyter notebooks, [Vega visualizations](https://vega.github.io/) - without downloading
-* Read/write Python objects to and from S3
-* Immutable versions for objects, immutable packages for collections of objects
+Quilt addresses five key use cases:
+* **Share** data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to
+create an IAM role).
+* **Understand** data better through inline documentation
+(Jupyter notebooks, markdown) and visualizations (Vega, 
+Vega Lite)
+* **Discover** related data by indexing objects in 
+ElasticSearch
+* **Model** data by providing a home for large data and models that don't fit in git, and by providing immutable
+versions for objects and data sets (a.k.a. "Quilt Packages")
+* **Decide** by broadening data access within the organization
+and supporting the documentation of decision
+processes through audit-able versioning and inline
+documentation
 
-## Components
+## Roadmap
 
-* `/catalog` (JavaScript) - Search, browse, and preview your data in S3
-* `/api/python` - Read, write, and annotate Python objects in S3
+### I - Performance and core services
+* [ ] Address performance issues with push (e.g. re-hash)
+* [ ] Refactor `bucket/.quilt` for improved listing
+and delete performance
+
+### II - CI/CD for data
+* [ ] Ability to fork/merge packages (via manifests in git)
+* [ ] Automated data quality monitoring
+
+### III - Storage agnostic (support Azure, GCP buckets)
+* [ ] evaluate min.io and ceph.io
+* [ ] evaluate feasibility of local storage (e.g. NAS)
+
+### IV - Cloud agnostic
+* [ ] K8s deployment for Azure, GCP
+* [ ] Shim lambdas via serverless.com?
+* [ ] Shim ElasticSearch via SOLR?
diff --git a/docs/README.md b/docs/README.md
@@ -6,40 +6,63 @@
 [![docs on_gitbook](https://img.shields.io/badge/docs-on_gitbook-blue.svg?style=flat-square)](https://docs.quiltdata.com/)
 [![chat on_slack](https://img.shields.io/badge/chat-on_slack-blue.svg?style=flat-square)](https://slack.quiltdata.com/)
 [![codecov](https://codecov.io/gh/quiltdata/quilt/branch/master/graph/badge.svg)](https://codecov.io/gh/quiltdata/quilt)
-[![pypi](https://img.shields.io/pypi/v/quilt.svg?style=flat-square)](https://pypi.org/project/quilt3/)
+[![pypi](https://img.shields.io/pypi/v/quilt3.svg?style=flat-square)](https://pypi.org/project/quilt3/)
 
-*Note: this is the documentation for [Quilt 3](https://blog.quiltdata.com/rethinking-s3-announcing-t4-a-team-data-hub-8e63ce7ec988). For Quilt 2 see [here](https://docs.quiltdata.com/v/quilt-2-master/) and [here](https://github.com/quiltdata/quilt/tree/quilt-2-master).*
+> Below is the documentation for [Quilt 3](https://quiltdata.com/). See [here](https://docs.quiltdata.com/v/quilt-2-master/) and [here](https://github.com/quiltdata/quilt/tree/quilt-2-master) from Quilt 2.
 
-## Overview
+# Quilt is a versioned data portal for AWS
 
-Quilt is a collaboration tool for creating, managing, and sharing
-datasets in S3. Quilt users transform raw, messy data in S3 buckets
-into immutable datasets--reusable, trusted building blocks that are
-easy to version, test, share and catalog. Working with datasets in
-Quilt speeds up model creation, accelerates experimentation, reduces
-downtime, and increases the productivity of data science teams.
+## Who is Quilt for?
+Quilt is for data-driven teams of both technical
+and non-technical members (executives, data scientists,
+data engineers, sales, product, etc.).
 
-## Collaborate in S3
+## What does Quilt do?
+Quilt adds search, visual content preview, and
+versioning to every file in S3.
 
-* Quilt adds search, content preview, versioning, and a Python API to any S3 bucket
-* Every file in Quilt is versioned and searchable
-* Quilt is for data scientists, data engineers, and data-driven teams
+## How does Quilt work?
+Quilt consists of a Python client, web catalog, lambda
+functions&mdash;all of which are open source&mdash;plus
+a suite of backend services and Docker containers
+orchestrated by CloudFormation.
+The latter are available under a paid license for
+private use on [quiltdata.com](https://quiltdata.com).
 
-![](https://github.com/quiltdata/quilt/blob/master/docs/imgs/quilt.gif?raw=true)
 
-### Use cases
-* Collaborate - get everyone on the same page by pointing them all to the same immutable data version
-* Experiment faster - blob storage is schemaless and scalable, so iterations are quick
-* Recover, rollback, and reproduce with immutable packages
-* Understand what's in S3 - plaintext and faceted search over S3
+## Use cases
 
-### Key features
-* Browse, search any S3 bucket
-* Preview images, Jupyter notebooks, [Vega visualizations](https://vega.github.io/) - without downloading
-* Read/write Python objects to and from S3
-* Immutable versions for objects, immutable packages for collections of objects
+Quilt addresses five key use cases:
+* **Share** data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to
+create an IAM role).
+* **Understand** data better through inline documentation
+(Jupyter notebooks, markdown) and visualizations (Vega, 
+Vega Lite)
+* **Discover** related data by indexing objects in 
+ElasticSearch
+* **Model** data by providing a home for large data and models that don't fit in git, and by providing immutable
+versions for objects and data sets (a.k.a. "Quilt Packages")
+* **Decide** by broadening data access within the organization
+and supporting the documentation of decision
+processes through audit-able versioning and inline
+documentation
 
-## Components
+## Roadmap
 
-* `/catalog` (JavaScript) - Search, browse, and preview your data in S3
-* `/api/python` - Read, write, and annotate Python objects in S3
+### I - Performance and core services
+* [ ] Address performance issues with push (e.g. re-hash)
+* [ ] Refactor `bucket/.quilt` for improved listing
+and delete performance
+
+### II - CI/CD for data
+* [ ] Ability to fork/merge packages (via manifests in git)
+* [ ] Automated data quality monitoring
+
+### III - Storage agnostic (support Azure, GCP buckets)
+* [ ] evaluate min.io and ceph.io
+* [ ] evaluate feasibility of local storage (e.g. NAS)
+
+### IV - Cloud agnostic
+* [ ] K8s deployment for Azure, GCP
+* [ ] Shim lambdas via serverless.com?
+* [ ] Shim ElasticSearch via SOLR?