Skip to content

Commit

Permalink
Improve the readme
Browse files Browse the repository at this point in the history
Signed-off-by: SimFG <[email protected]>
  • Loading branch information
SimFG committed Mar 31, 2023
1 parent 9f4662d commit 61ebe06
Show file tree
Hide file tree
Showing 7 changed files with 124 additions and 133 deletions.
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ pip_upgrade:
package:
@python setup.py sdist bdist_wheel

upload:
@python -m twine upload dist/*

upload_test:
@python -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*

remove_example_cache.sh:
@bash ./script/remove_example_cache.sh

Expand Down
75 changes: 25 additions & 50 deletions README-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,28 +9,22 @@ GPT Cache主要用于缓存用户在使用ChatGPT的问答数据。这个系统

如果这个想法💡对你很有帮助,帮忙给个star 🌟,甚是感谢!

## 🤔 是否有必要使用缓存?

我认为有必要,理由如下:

- 基于ChatGPT开发的某些领域服务,许多问答具有一定的相似性。
- 对于一个用户,使用ChatGPT提出的一系列问题具有一定规律性,与其职业、生活习惯、性格等有一定关联。例如,程序员使用ChatGPT服务的可能性很大程度上与其工作有关。
- 如果您提供的ChatGPT服务面向大量用户群体,将其分为不同的类别,那么相同类别中的用户问的相关问题也有很大概率命中缓存,从而降低服务成本。

## 😊 快速接入

### alpha 测试包安装

注:
- 可以通过下面指令快速体验这个缓存,值得注意的是或许这不是很稳定。
- 默认情况下,基本上不需要安装什么第三方库。当需要使用一些特性的时候,相关的第三方库会自动下载。
- 如果因为pip版本低安装第三方库失败,使用:`python -m pip install --upgrade pip`

### pip 安装

```bash
# create conda new environment
conda create --name gpt-cache python=3.8
conda activate gpt-cache
pip install gpt_cache
```

### dev 安装

```bash
# clone gpt cache repo
git clone https://github.com/zilliztech/gpt-cache
cd gpt-cache
Expand All @@ -40,6 +34,8 @@ pip install -r requirements.txt
python setup.py install
```

### 快速使用

如果只是想实现请求的精准匹配缓存,即两次一模一样的请求,则只需要**两步**就可以接入这个cache !!!

1. cache初始化
Expand All @@ -63,55 +59,34 @@ answer = openai.ChatCompletion.create(
)
```

在本地运行,如果想要更好的效果,可以使用示例中的 [Sqlite + Faiss + Towhee](example/sf_towhee/sf_manager.py) 方案,其中 Sqlite + Faiss 进行缓存数据管理,Towhee 进行 embedding 操作。

在实际生产中,或者有一定用户群里,需要更多的考虑向量搜索这部分,可以了解下 [Milvus](https://github.com/milvus-io/milvus),当然也有 [Zilliz 云服务](https://cloud.zilliz.com/) ,快速体验 Milvus 向量检索
如果想快速在本地体验下向量相似搜索缓存,参考案例:[Sqlite + Faiss + Towhee](example/sqlite_faiss_towhee/sqlite_faiss_towhee.py)

更多参考文档:

- [更多案例](example/example.md)
- [系统设计](doc/system-cn.md)
- [系统设计,了解系统如何被构建](doc/system-cn.md)
- [功能,当前支持的所有特性](doc/feature_cn.md)
- [案例,更加了解如何定制化缓存](example/example.md)

## 🥳 功能

- 支持openai普通和流式的聊天请求
- 支持top_k搜索,可以在DataManager创建时进行设置
- 支持多级缓存, 参考: `Cache#next_cache`

```python
bak_cache = Cache()
bak_cache.init()
cache.init(next_cache=bak_cache)
```

- 是否跳过当前缓存,对于请求不进行缓存搜索也不保存chat gpt返回的结果,参考: `Cache#cache_enable_func`
- 缓存系统初始化阶段,不进行缓存搜索,但是保存chat gpt返回的结果,参考: 使用`create`方法时设置`cache_skip=True`参数
## 🤔 是否有必要使用缓存?

```python
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=mock_messages,
cache_skip=True,
)
```
我认为有必要,理由如下:

- 像积木一样,所有模块均可自定义,包括:
- pre-embedding,获取原始请求中的特征信息,如最后一条消息,prompt等
- embedding,将特征信息转换成向量数据
- data manager,缓存数据管理,主要包括数据搜索和保存
- cache similarity evaluation,可以使用相似搜索的距离或者其他更适合使用场景的模型
- post-process,处理缓存答案列表,比如最相似的,随机或者自定义
- 基于ChatGPT开发的某些领域服务,许多问答具有一定的相似性。
- 对于一个用户,使用ChatGPT提出的一系列问题具有一定规律性,与其职业、生活习惯、性格等有一定关联。例如,程序员使用ChatGPT服务的可能性很大程度上与其工作有关。
- 如果您提供的ChatGPT服务面向大量用户群体,将其分为不同的类别,那么相同类别中的用户问的相关问题也有很大概率命中缓存,从而降低服务成本。

## 🤗 所有模块

- Pre-embedding
![GPTCache Struct](doc/GPTCacheStructure.png)

- Pre-embedding,提取请求中的关键信息
- 获取请求的最后一条消息, 参考: `pre_embedding.py#last_content`
- Embedding
- Embedding,将文本转换为向量,后续进行相似搜索
- [x] [towhee](https://towhee.io/), 英语模型: paraphrase-albert-small-v2, 中文模型: uer/albert-base-chinese-cluecorpussmall
- [x] openai embedding api
- [x] string, 不做任何处理
- [ ] [cohere](https://docs.cohere.ai/reference/embed) embedding api
- Data Manager
- Cache,缓存数据管理,包括搜索、存储和清理
- 标量存储
- [x] [sqlite](https://sqlite.org/docs.html)
- [ ] [postgresql](https://www.postgresql.org/)
Expand All @@ -120,12 +95,12 @@ openai.ChatCompletion.create(
- [x] [milvus](https://milvus.io/)
- 向量索引
- [x] [faiss](https://faiss.ai/)
- Similarity Evaluation
- Similarity Evaluation,评估缓存结果
- 搜索距离, 参考: `simple.py#pair_evaluation`
- [towhee](https://towhee.io/), roberta_duplicate模型, 问题与问题相关性匹配,只支持512个token
- string, 缓存问题和输入问题字符匹配
- np, 使用`linalg.norm`进行向量距离计算
- Post Process
- Post Process,如何将多个缓存答案返回给用户
- 选择最相似的答案
- 随机选择

Expand Down
76 changes: 26 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,27 +9,22 @@ The GPT Cache system is mainly used to cache the question-answer data of users i

If the idea 💡 is helpful to you, please feel free to give me a star 🌟, which is helpful to me.

## 🤔 Is Cache necessary?

I believe it is necessary for the following reasons:

- Many question-answer pairs in certain domain services based on ChatGPT have a certain similarity.
- For a user, there is a certain regularity in the series of questions raised using ChatGPT, which is related to their occupation, lifestyle, personality, etc. For example, the likelihood of a programmer using ChatGPT services is largely related to their work.
- If your ChatGPT service targets a large user group, categorizing them can increase the probability of relevant questions being cached, thus reducing service costs.

## 😊 Quickly Start

**Note**:
- You can quickly experience the cache, it is worth noting that maybe this is not very **stable**.
- By default, basically **a few** libraries are installed. When you need to use some features, it will **auto install** related libraries.
- If failed to install a library for low pip version, run: `python -m pip install --upgrade pip`

### alpha test package install
### pip install

```bash
# create conda new environment
conda create --name gpt-cache python=3.8
conda activate gpt-cache
pip install gpt_cache
```

### dev install

```bash
# clone gpt cache repo
git clone https://github.com/zilliztech/gpt-cache
cd gpt-cache
Expand All @@ -39,6 +34,8 @@ pip install -r requirements.txt
python setup.py install
```

### quick usage

If you just want to achieve precise matching cache of requests, that is, two identical requests, you **ONLY** need **TWO** steps to access this cache

1. Cache init
Expand All @@ -65,69 +62,48 @@ answer = openai.ChatCompletion.create(
)
```

Run locally, if you want better results, you can use the example [Sqlite + Faiss + Towhee](example/sf_towhee/sf_manager.py). Among them, Sqlite + Faiss is used for cache data management, and Towhee is used for embedding operations.

In actual production, or in a certain user group, it is necessary to consider the vector search part more, you can get to know [Milvus](https://github.com/milvus-io/milvus),or [Zilliz Cloud](https://cloud.zilliz.com/), which allows you to quickly experience Milvus vector retrieval.
If you want to experience vector similarity search cache locally, you can use the example [Sqlite + Faiss + Towhee](example/sqlite_faiss_towhee/sqlite_faiss_towhee.py).

More Docs:
- [examples](example/example.md)
- [system design](doc/system.md)


## 🥳 Feature
- [System Design, how it was constructed](doc/system.md)
- [Features, all features currently supported by the cache](doc/feature.md)
- [Examples, learn better custom caching](example/example.md)

- Support the openai chat completion normal and stream request
- Get top_k similar search results, it can be set when creating the data manager
- Support the cache chain, see: `Cache#next_cache`

```python
bak_cache = Cache()
bak_cache.init()
cache.init(next_cache=bak_cache)
```

- Whether to completely skip the current cache, that is, do not search the cache or save the Chat GPT results, see: `Cache#cache_enable_func`
- In the cache initialization phase, no cache search is performed, but save the result returned by the chat gpt to cache, see: `cache_skip=True` in `create` request
## 🤔 Is Cache necessary?

```python
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=mock_messages,
cache_skip=True,
)
```
I believe it is necessary for the following reasons:

- Like Lego bricks, custom assemble all modules, including:
- pre-embedding function, get feature information in the original request, such as prompt, last message, etc.
- embedding function, convert feature information into a vector for cache search, choose a model that fits your use case
- data manager, cache data management, mainly dealing with the search and storage of cache data
- cache similarity evaluation function, can use the distance of similar search or additional selection model to ensure that the answer is more accurate
- post-process the cache answer list, first, random or custom combination
- Many question-answer pairs in certain domain services based on ChatGPT have a certain similarity.
- For a user, there is a certain regularity in the series of questions raised using ChatGPT, which is related to their occupation, lifestyle, personality, etc. For example, the likelihood of a programmer using ChatGPT services is largely related to their work.
- If your ChatGPT service targets a large user group, categorizing them can increase the probability of relevant questions being cached, thus reducing service costs.

## 🤗 All Model

- Pre-embedding
![GPTCache Struct](doc/GPTCacheStructure.png)

- Pre-embedding, get the key information in the request
- get the last message in the request, see: `pre_embedding.py#last_content`
- Embedding
- Embedding, transfer the text to vector for similarity search
- [x] [towhee](https://towhee.io/), english model: paraphrase-albert-small-v2, chinese model: uer/albert-base-chinese-cluecorpussmall
- [x] openai embedding api
- [x] string, nothing change
- [ ] [cohere](https://docs.cohere.ai/reference/embed) embedding api
- Data Manager
- Cache, data manager, including search, save or evict
- scalar store
- [x] [sqlite](https://sqlite.org/docs.html)
- [ ] [postgresql](https://www.postgresql.org/)
- [ ] [mysql](https://www.mysql.com/)
- vector store
- [x] [milvus](https://milvus.io/)
- [x] [zilliz cloud](https://cloud.zilliz.com/)
- vector index
- [x] [faiss](https://faiss.ai/)
- Similarity Evaluation
- Similarity Evaluation, judging the quality of cached answers
- the search distance, see: `simple.py#pair_evaluation`
- [towhee](https://towhee.io/), roberta_duplicate, precise comparison of problems to problems mode, only support the 512 token
- string, the cache request and the original request are judged by the exact match of characters
- np, use the `linalg.norm`
- Post Process
- Post Process, how multiple cached answers are returned to the user
- choose the most similar
- choose randomly

Expand Down
31 changes: 31 additions & 0 deletions doc/feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# 🥳 Feature

English | [中文](feature_cn.md)

- Support the openai chat completion normal and stream request
- Get top_k similar search results, it can be set when creating the data manager
- Support the cache chain, see: `Cache#next_cache`

```python
bak_cache = Cache()
bak_cache.init()
cache.init(next_cache=bak_cache)
```

- Whether to completely skip the current cache, that is, do not search the cache or save the Chat GPT results, see: `Cache#cache_enable_func`
- In the cache initialization phase, no cache search is performed, but save the result returned by the chat gpt to cache, see: `cache_skip=True` in `create` request

```python
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=mock_messages,
cache_skip=True,
)
```

- Like Lego bricks, custom assemble all modules, including:
- pre-embedding function, get feature information in the original request, such as prompt, last message, etc.
- embedding function, convert feature information into a vector for cache search, choose a model that fits your use case
- data manager, cache data management, mainly dealing with the search and storage of cache data
- cache similarity evaluation function, can use the distance of similar search or additional selection model to ensure that the answer is more accurate
- post-process the cache answer list, first, random or custom combination
31 changes: 31 additions & 0 deletions doc/feature_cn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# 🥳 功能

[English](feature.md) | 中文

- 支持openai普通和流式的聊天请求
- 支持top_k搜索,可以在DataManager创建时进行设置
- 支持多级缓存, 参考: `Cache#next_cache`

```python
bak_cache = Cache()
bak_cache.init()
cache.init(next_cache=bak_cache)
```

- 是否跳过当前缓存,对于请求不进行缓存搜索也不保存chat gpt返回的结果,参考: `Cache#cache_enable_func`
- 缓存系统初始化阶段,不进行缓存搜索,但是保存chat gpt返回的结果,参考: 使用`create`方法时设置`cache_skip=True`参数

```python
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=mock_messages,
cache_skip=True,
)
```

- 像积木一样,所有模块均可自定义,包括:
- pre-embedding,获取原始请求中的特征信息,如最后一条消息,prompt等
- embedding,将特征信息转换成向量数据
- data manager,缓存数据管理,主要包括数据搜索和保存
- cache similarity evaluation,可以使用相似搜索的距离或者其他更适合使用场景的模型
- post-process,处理缓存答案列表,比如最相似的,随机或者自定义
36 changes: 4 additions & 32 deletions example/example.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,49 +4,21 @@

How to use the map to cache data.

## [Sqlite + Faiss manage cache data](sf_mock/sf_manager.py)

Before running this case, you should install the `faiss-cpu`.

```bash
pip install faiss-cpu
```
## [Sqlite + Faiss manage cache data](sqlite_faiss_mock/sqlite_faiss_mock.py)

How to use the [sqlite](https://www.sqlite.org/index.html) to store the scale data and the faiss to query the vector data.

## [Sqlite + Faiss + Towhee](sf_towhee/sf_manager.py)

Before running this case, you should install the `faiss-cpu` and `towhee`.

```bash
pip install faiss-cpu
pip install towhee==0.9.0
```
## [Sqlite + Faiss + Towhee](sqlite_faiss_towhee/sqlite_faiss_towhee.py)

On the basis of the above example, use [towhee](https://towhee.io/) for embedding operation

Note: the default embedding model only support the **ENGLISH**. If you want to use the Chinese, you can use the `uer/albert-base-chinese-cluecorpussmall` model. For other languages, you should use the corresponding model.

## [Sqlite + Milvus + Towhee](sqlite_milvus_mock/sqlite_milvus_mock.py)

Before running this case, you should install the `faiss-cpu`, `towhee` and `pymilvus`.

```bash
pip install faiss-cpu
pip install towhee==0.9.0
pip install pymilvus
```

How to use the [sqlite](https://www.sqlite.org/index.html) to store the scale data and the [milvus](https://milvus.io/docs) to store the vector data.

## [Benchmark](benchmark/benchmark_sf_towhee.py)

Before running this case, you should install the `faiss-cpu` and `towhee`.
How to use the [sqlite](https://www.sqlite.org/index.html) to store the scale data and the [Milvus](https://milvus.io/docs) or [Zilliz Cloud](https://cloud.zilliz.com/) to store the vector data.

```bash
pip install faiss-cpu
pip install towhee==0.9.0
```
## [Benchmark](benchmark/benchmark_sqlite_faiss_towhee.py)

The benchmark script about the `Sqlite + Faiss + Towhee`

Expand Down
Loading

0 comments on commit 61ebe06

Please sign in to comment.