Skip to content

Commit

Permalink
Update links for new docs structure (towhee-io#1215)
Browse files Browse the repository at this point in the history
Signed-off-by: Frank Liu <[email protected]>
  • Loading branch information
fzliu authored May 10, 2022
1 parent 51f9bfe commit d54adb6
Show file tree
Hide file tree
Showing 7 changed files with 13 additions and 11 deletions.
12 changes: 6 additions & 6 deletions docs/01-Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,23 @@ Towhee is a framework that provides [ETL](https://databricks.com/glossary/extrac

Unstructured data refers to data that cannot be stored in a tabular or key-value format. Nearly all human-generated data (images, video, text, etc...) is unstructured - some market analysts estimate that over 80% of data generated by 2024 will be unstructured data. Towhee is the first open-source project that's meant to process a variety of unstructured data using ETL pipelines.

To accomplish this, we built Towhee atop popular machine learning and unstructured data processing libraries, i.e. `torch`, `timm`, `transformers`, etc. Models or functions from different libraries are wrapped as standard Towhee operators, and can be freely integrated into application-oriented pipelines using a [Pythonic API](/02-Getting%20Started/03-data-collection.md). To ensure user-friendliness, pre-built pipelines can also be called in just a single line of code, without the need to understand the underlying models or modules used to build them.
To accomplish this, we built Towhee atop popular machine learning and unstructured data processing libraries, i.e. `torch`, `timm`, `transformers`, etc. Models or functions from different libraries are wrapped as standard Towhee operators, and can be freely integrated into application-oriented pipelines using a [Pythonic API](/03-User%20Guides/01-DataCollection/01-data-collection.md). To ensure user-friendliness, pre-built pipelines can also be called in just a single line of code, without the need to understand the underlying models or modules used to build them.

For more information, take a look at our [quick start](/02-Getting%20Started/01-quick-start.mdx) page.

## Problems Towhee solves

- **Modern ML applications require far more than a single neural network.** Running a modern ML application in production requires a combination of online pre-processing, data transformation, the models themselves, and other ML-related tools. Building an application that recognizes objects within a video, for example, involves decompression, key-frame extraction, image deduplication, object detection, etc. This necessitates a platform that offers a fast and robust method for developing end-to-end application pipelines that use ML models in addition to supporting data parallelism and resource management.

Towhee solves this problem by reintroducing the concept of `Pipeline` as being _application-centric_ instead of _model-centric_. Where model-centric pipelines are composed of a single model followed by auxiliary code, application-centric pipelines treat every single data processing step as a first-class citizen. Towhee also exposes a [Pythonic API](/02-Getting%20Started/03-data-collection.md) for developing more complex applications in just a couple lines of code.
Towhee solves this problem by reintroducing the concept of `Pipeline` as being _application-centric_ instead of _model-centric_. Where model-centric pipelines are composed of a single model followed by auxiliary code, application-centric pipelines treat every single data processing step as a first-class citizen. Towhee also exposes a [Pythonic API](/03-User%20Guides/01-DataCollection/01-data-collection.md) for developing more complex applications in just a couple lines of code.

- **Too many model implementations exist without any interface standard.** Machine learning models (NN-based and traditional) are ubiquitous. Different implementations of machine learning models require different auxiliary code to support testing and fine-tuning, making model evaluation and productionization a tedious task.

Towhee solves this by providing a universal `Operator` wrapper for dataset loading, basic data transformations, ML models, and other miscellaneous scripts. Operators have a pre-defined API and glue logic to make Towhee work with a number of machine learning and data processing libraries. Operators can be chained together in a DAG to form entire ML applications.

- **ETL pipelines for unstructured data are nearly nonexistent.** ETL, short for _extract, transform, and load_, is a framework used by data scientists, ML application developers, and other engineers to extract data from various sources, transform the data into a format that can be understood by computers, and load the data into downstream platforms for recommendation, analytics, and other business intelligence tasks.

Towhee solves this by providing an open-source vision for ETL in the era of unstructured data. We provide: 1) over 300 pre-built pipelines across a multitude of different data transformation tasks (including but not limited to image embedding, audio embedding, text summarization), and 2) a way to build pipelines of arbitrary complexity through an intuitive Python API called [`DataCollection`](/02-Getting%20Started/03-data-collection.md).
Towhee solves this by providing an open-source vision for ETL in the era of unstructured data. We provide: 1) over 300 pre-built pipelines across a multitude of different data transformation tasks (including but not limited to image embedding, audio embedding, text summarization), and 2) a way to build pipelines of arbitrary complexity through an intuitive Python API called [`DataCollection`](/03-User%20Guides/01-DataCollection/01-data-collection.md).

## Design philosophy

Expand All @@ -46,9 +46,9 @@ For more information, take a look at our [quick start](/02-Getting%20Started/01-

### Tutorials:

- [Reverse image search](/03-Tutorials/01-reverse-image-search.md): search for similar or related images.
- [Image deduplication](/03-Tutorials/03-image-deduplication.md): detect and remove identical or near-identical photos.
- [Music recognition](/03-Tutorials/02-music-recognition-system.md): music identification with full-length song or a snippet.
- [Reverse image search](/05-Tutorials/02-Computer%20Vision/01-reverse-image-search.md): search for similar or related images.
- [Image deduplication](/05-Tutorials/02-Computer%20Vision/02-image-deduplication.md): detect and remove identical or near-identical photos.
- [Music recognition](/05-Tutorials/03-Audio/01-music-recognition-system.md): music identification with full-length song or a snippet.

### Community:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Audio is defined as any human-hearable sound; audio embedding is the process of

## Popular scenarios

- [Identify music with an audio snippet](/03-Tutorials/02-music-recognition-system.md)
- [Identify music with an audio snippet](/05-Tutorials/03-Audio/01-music-recognition-system.md)
- Recognize audio events or scenes
- Tag music for genres, artists, emotion
- Music copyright infringement
Expand Down
4 changes: 2 additions & 2 deletions docs/03-User Guides/03-Built-in Pipelines/image-embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Image embedding pipelines are used for reduction the dimensionality of the input

## Popular Scenarios

- [Reverse image search](/03-Tutorials/01-reverse-image-search.md)
- [Image deduplication](/03-Tutorials/03-image-deduplication.md)
- [Reverse image search](/05-Tutorials/02-Computer%20Vision/01-reverse-image-search.md)
- [Image deduplication](/05-Tutorials/02-Computer%20Vision/02-image-deduplication.md)
- Copyright infringement detection
- Item tagging
- Celebrity tagging
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Towhee v0.4.0 Release Notes

#### Highlights
- The Towhee website has a new look and feel! The new Towhee website includes several important docs, including in-depth Towhee tutorials, pipeline and operator summaries, development and contributing guides, and more. See https://docs.towhee.io. If you have any feedback for the website design or encounter any bugs, please open an issue through Github.
- Towhee now offers pre-built embedding pipelines that use transformer-based models: SwinTransformer and ViT.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# v0.5.0 Release Notes
# Towhee v0.5.0 Release Notes

#### Highlights
- training/fine-tuning is supported! For the cases when an operator is a wrapper of neural network model, users can train or fine-tune the model via `train()` method of the operator.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# v0.6.0 Release Notes
# Towhee v0.6.0 Release Notes

#### Highlights

Expand Down

0 comments on commit d54adb6

Please sign in to comment.