Improve the readme

Signed-off-by: SimFG <[email protected]>
zhaolijun1109 · Mar 31, 2023 · 61ebe06 · 61ebe06
1 parent 9f4662d
commit 61ebe06
Show file tree

Hide file tree

Showing 7 changed files with 124 additions and 133 deletions.
diff --git a/Makefile b/Makefile
@@ -8,6 +8,12 @@ pip_upgrade:
 package:
 	@python setup.py sdist bdist_wheel
 
+upload:
+	@python -m twine upload dist/*
+
+upload_test:
+	@python -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*
+
 remove_example_cache.sh:
 	@bash ./script/remove_example_cache.sh
 

diff --git a/README-CN.md b/README-CN.md
@@ -9,28 +9,22 @@ GPT Cache主要用于缓存用户在使用ChatGPT的问答数据。这个系统
 
 如果这个想法💡对你很有帮助，帮忙给个star 🌟，甚是感谢！
 
-## 🤔 是否有必要使用缓存？
-
-我认为有必要，理由如下：
-
-- 基于ChatGPT开发的某些领域服务，许多问答具有一定的相似性。
-- 对于一个用户，使用ChatGPT提出的一系列问题具有一定规律性，与其职业、生活习惯、性格等有一定关联。例如，程序员使用ChatGPT服务的可能性很大程度上与其工作有关。
-- 如果您提供的ChatGPT服务面向大量用户群体，将其分为不同的类别，那么相同类别中的用户问的相关问题也有很大概率命中缓存，从而降低服务成本。
-
 ## 😊 快速接入
 
-### alpha 测试包安装
-
 注：
 - 可以通过下面指令快速体验这个缓存，值得注意的是或许这不是很稳定。
 - 默认情况下，基本上不需要安装什么第三方库。当需要使用一些特性的时候，相关的第三方库会自动下载。
+- 如果因为pip版本低安装第三方库失败，使用：`python -m pip install --upgrade pip`
 
+### pip 安装
 
 ```bash
-# create conda new environment
-conda create --name gpt-cache python=3.8
-conda activate gpt-cache
+pip install gpt_cache
+```
+
+### dev 安装
 
+```bash
 # clone gpt cache repo
 git clone https://github.com/zilliztech/gpt-cache
 cd gpt-cache
@@ -40,6 +34,8 @@ pip install -r requirements.txt
 python setup.py install
 ```
 
+### 快速使用
+
 如果只是想实现请求的精准匹配缓存，即两次一模一样的请求，则只需要**两步**就可以接入这个cache !!!
 
 1. cache初始化
@@ -63,55 +59,34 @@ answer = openai.ChatCompletion.create(
     )
 ```
 
-在本地运行，如果想要更好的效果，可以使用示例中的 [Sqlite + Faiss + Towhee](example/sf_towhee/sf_manager.py) 方案，其中 Sqlite + Faiss 进行缓存数据管理，Towhee 进行 embedding 操作。
-
-在实际生产中，或者有一定用户群里，需要更多的考虑向量搜索这部分，可以了解下 [Milvus](https://github.com/milvus-io/milvus)，当然也有 [Zilliz 云服务](https://cloud.zilliz.com/) ，快速体验 Milvus 向量检索
+如果想快速在本地体验下向量相似搜索缓存，参考案例：[Sqlite + Faiss + Towhee](example/sqlite_faiss_towhee/sqlite_faiss_towhee.py)
 
 更多参考文档：
 
-- [更多案例](example/example.md)
-- [系统设计](doc/system-cn.md)
+- [系统设计，了解系统如何被构建](doc/system-cn.md)
+- [功能，当前支持的所有特性](doc/feature_cn.md)
+- [案例，更加了解如何定制化缓存](example/example.md)
 
-## 🥳 功能
-
-- 支持openai普通和流式的聊天请求
-- 支持top_k搜索，可以在DataManager创建时进行设置
-- 支持多级缓存, 参考: `Cache#next_cache`
-
-```python
-bak_cache = Cache()
-bak_cache.init()
-cache.init(next_cache=bak_cache)
-```
-
-- 是否跳过当前缓存，对于请求不进行缓存搜索也不保存chat gpt返回的结果，参考： `Cache#cache_enable_func`
-- 缓存系统初始化阶段，不进行缓存搜索，但是保存chat gpt返回的结果，参考： 使用`create`方法时设置`cache_skip=True`参数
+## 🤔 是否有必要使用缓存？
 
-```python
-openai.ChatCompletion.create(
-    model="gpt-3.5-turbo",
-    messages=mock_messages,
-    cache_skip=True,
-)
-```
+我认为有必要，理由如下：
 
-- 像积木一样，所有模块均可自定义，包括：
-  - pre-embedding，获取原始请求中的特征信息，如最后一条消息，prompt等
-  - embedding，将特征信息转换成向量数据
-  - data manager，缓存数据管理，主要包括数据搜索和保存
-  - cache similarity evaluation，可以使用相似搜索的距离或者其他更适合使用场景的模型
-  - post-process，处理缓存答案列表，比如最相似的，随机或者自定义
+- 基于ChatGPT开发的某些领域服务，许多问答具有一定的相似性。
+- 对于一个用户，使用ChatGPT提出的一系列问题具有一定规律性，与其职业、生活习惯、性格等有一定关联。例如，程序员使用ChatGPT服务的可能性很大程度上与其工作有关。
+- 如果您提供的ChatGPT服务面向大量用户群体，将其分为不同的类别，那么相同类别中的用户问的相关问题也有很大概率命中缓存，从而降低服务成本。
 
 ## 🤗 所有模块
 
-- Pre-embedding
+![GPTCache Struct](doc/GPTCacheStructure.png)
+
+- Pre-embedding，提取请求中的关键信息
   - 获取请求的最后一条消息, 参考: `pre_embedding.py#last_content`
-- Embedding
+- Embedding，将文本转换为向量，后续进行相似搜索
   - [x] [towhee](https://towhee.io/), 英语模型: paraphrase-albert-small-v2, 中文模型: uer/albert-base-chinese-cluecorpussmall
   - [x] openai embedding api
   - [x] string, 不做任何处理
   - [ ] [cohere](https://docs.cohere.ai/reference/embed) embedding api  
-- Data Manager
+- Cache，缓存数据管理，包括搜索、存储和清理
   - 标量存储
     - [x] [sqlite](https://sqlite.org/docs.html)
     - [ ] [postgresql](https://www.postgresql.org/)
@@ -120,12 +95,12 @@ openai.ChatCompletion.create(
     - [x] [milvus](https://milvus.io/)
   - 向量索引
     - [x] [faiss](https://faiss.ai/)
-- Similarity Evaluation
+- Similarity Evaluation，评估缓存结果
   - 搜索距离, 参考: `simple.py#pair_evaluation`
   - [towhee](https://towhee.io/), roberta_duplicate模型, 问题与问题相关性匹配，只支持512个token
   - string, 缓存问题和输入问题字符匹配
   - np, 使用`linalg.norm`进行向量距离计算
-- Post Process
+- Post Process，如何将多个缓存答案返回给用户
   - 选择最相似的答案
   - 随机选择
 

diff --git a/README.md b/README.md
@@ -9,27 +9,22 @@ The GPT Cache system is mainly used to cache the question-answer data of users i
 
 If the idea 💡 is helpful to you, please feel free to give me a star 🌟, which is helpful to me.
 
-## 🤔 Is Cache necessary?
-
-I believe it is necessary for the following reasons:
-
-- Many question-answer pairs in certain domain services based on ChatGPT have a certain similarity.
-- For a user, there is a certain regularity in the series of questions raised using ChatGPT, which is related to their occupation, lifestyle, personality, etc. For example, the likelihood of a programmer using ChatGPT services is largely related to their work.
-- If your ChatGPT service targets a large user group, categorizing them can increase the probability of relevant questions being cached, thus reducing service costs.
-
 ## 😊 Quickly Start
 
 **Note**:
 - You can quickly experience the cache, it is worth noting that maybe this is not very **stable**.
 - By default, basically **a few** libraries are installed. When you need to use some features, it will **auto install** related libraries.
+- If failed to install a library for low pip version, run: `python -m pip install --upgrade pip` 
 
-### alpha test package install
+### pip install
 
 ```bash
-# create conda new environment
-conda create --name gpt-cache python=3.8
-conda activate gpt-cache
+pip install gpt_cache
+```
+
+### dev install
 
+```bash
 # clone gpt cache repo
 git clone https://github.com/zilliztech/gpt-cache
 cd gpt-cache
@@ -39,6 +34,8 @@ pip install -r requirements.txt
 python setup.py install
 ```
 
+### quick usage
+
 If you just want to achieve precise matching cache of requests, that is, two identical requests, you **ONLY** need **TWO** steps to access this cache
 
 1. Cache init
@@ -65,69 +62,48 @@ answer = openai.ChatCompletion.create(
 )
 ```
 
-Run locally, if you want better results, you can use the example [Sqlite + Faiss + Towhee](example/sf_towhee/sf_manager.py). Among them, Sqlite + Faiss is used for cache data management, and Towhee is used for embedding operations.
-
-In actual production, or in a certain user group, it is necessary to consider the vector search part more, you can get to know [Milvus](https://github.com/milvus-io/milvus)，or [Zilliz Cloud](https://cloud.zilliz.com/), which allows you to quickly experience Milvus vector retrieval.
+If you want to experience vector similarity search cache locally, you can use the example [Sqlite + Faiss + Towhee](example/sqlite_faiss_towhee/sqlite_faiss_towhee.py).
 
 More Docs：
-- [examples](example/example.md)
-- [system design](doc/system.md)
-
-
-## 🥳 Feature
+- [System Design, how it was constructed](doc/system.md)
+- [Features, all features currently supported by the cache](doc/feature.md)
+- [Examples, learn better custom caching](example/example.md)
 
-- Support the openai chat completion normal and stream request
-- Get top_k similar search results, it can be set when creating the data manager
-- Support the cache chain, see: `Cache#next_cache`
-
-```python
-bak_cache = Cache()
-bak_cache.init()
-cache.init(next_cache=bak_cache)
-```
-
-- Whether to completely skip the current cache, that is, do not search the cache or save the Chat GPT results, see: `Cache#cache_enable_func`
-- In the cache initialization phase, no cache search is performed, but save the result returned by the chat gpt to cache, see: `cache_skip=True` in `create` request
+## 🤔 Is Cache necessary?
 
-```python
-openai.ChatCompletion.create(
-    model="gpt-3.5-turbo",
-    messages=mock_messages,
-    cache_skip=True,
-)
-```
+I believe it is necessary for the following reasons:
 
-- Like Lego bricks, custom assemble all modules, including:
-  - pre-embedding function, get feature information in the original request, such as prompt, last message, etc.
-  - embedding function, convert feature information into a vector for cache search, choose a model that fits your use case
-  - data manager, cache data management, mainly dealing with the search and storage of cache data
-  - cache similarity evaluation function, can use the distance of similar search or additional selection model to ensure that the answer is more accurate
-  - post-process the cache answer list, first, random or custom combination
+- Many question-answer pairs in certain domain services based on ChatGPT have a certain similarity.
+- For a user, there is a certain regularity in the series of questions raised using ChatGPT, which is related to their occupation, lifestyle, personality, etc. For example, the likelihood of a programmer using ChatGPT services is largely related to their work.
+- If your ChatGPT service targets a large user group, categorizing them can increase the probability of relevant questions being cached, thus reducing service costs.
 
 ## 🤗 All Model
 
-- Pre-embedding
+![GPTCache Struct](doc/GPTCacheStructure.png)
+
+- Pre-embedding, get the key information in the request
   - get the last message in the request, see: `pre_embedding.py#last_content`
-- Embedding
+- Embedding, transfer the text to vector for similarity search
   - [x] [towhee](https://towhee.io/), english model: paraphrase-albert-small-v2, chinese model: uer/albert-base-chinese-cluecorpussmall
   - [x] openai embedding api
   - [x] string, nothing change
   - [ ] [cohere](https://docs.cohere.ai/reference/embed) embedding api  
-- Data Manager
+- Cache, data manager, including search, save or evict
   - scalar store
     - [x] [sqlite](https://sqlite.org/docs.html)
     - [ ] [postgresql](https://www.postgresql.org/)
     - [ ] [mysql](https://www.mysql.com/)
   - vector store
     - [x] [milvus](https://milvus.io/)
+    - [x] [zilliz cloud](https://cloud.zilliz.com/)
   - vector index
     - [x] [faiss](https://faiss.ai/)
-- Similarity Evaluation
+- Similarity Evaluation, judging the quality of cached answers
   - the search distance, see: `simple.py#pair_evaluation`
   - [towhee](https://towhee.io/), roberta_duplicate, precise comparison of problems to problems mode, only support the 512 token
   - string, the cache request and the original request are judged by the exact match of characters
   - np, use the `linalg.norm`
-- Post Process
+- Post Process, how multiple cached answers are returned to the user
   - choose the most similar
   - choose randomly
 

diff --git a/doc/feature.md b/doc/feature.md
@@ -0,0 +1,31 @@
+# 🥳 Feature
+
+English | [中文](feature_cn.md)
+
+- Support the openai chat completion normal and stream request
+- Get top_k similar search results, it can be set when creating the data manager
+- Support the cache chain, see: `Cache#next_cache`
+
+```python
+bak_cache = Cache()
+bak_cache.init()
+cache.init(next_cache=bak_cache)
+```
+
+- Whether to completely skip the current cache, that is, do not search the cache or save the Chat GPT results, see: `Cache#cache_enable_func`
+- In the cache initialization phase, no cache search is performed, but save the result returned by the chat gpt to cache, see: `cache_skip=True` in `create` request
+
+```python
+openai.ChatCompletion.create(
+    model="gpt-3.5-turbo",
+    messages=mock_messages,
+    cache_skip=True,
+)
+```
+
+- Like Lego bricks, custom assemble all modules, including:
+  - pre-embedding function, get feature information in the original request, such as prompt, last message, etc.
+  - embedding function, convert feature information into a vector for cache search, choose a model that fits your use case
+  - data manager, cache data management, mainly dealing with the search and storage of cache data
+  - cache similarity evaluation function, can use the distance of similar search or additional selection model to ensure that the answer is more accurate
+  - post-process the cache answer list, first, random or custom combination
diff --git a/doc/feature_cn.md b/doc/feature_cn.md
@@ -0,0 +1,31 @@
+# 🥳 功能
+
+[English](feature.md) | 中文
+
+- 支持openai普通和流式的聊天请求
+- 支持top_k搜索，可以在DataManager创建时进行设置
+- 支持多级缓存, 参考: `Cache#next_cache`
+
+```python
+bak_cache = Cache()
+bak_cache.init()
+cache.init(next_cache=bak_cache)
+```
+
+- 是否跳过当前缓存，对于请求不进行缓存搜索也不保存chat gpt返回的结果，参考： `Cache#cache_enable_func`
+- 缓存系统初始化阶段，不进行缓存搜索，但是保存chat gpt返回的结果，参考： 使用`create`方法时设置`cache_skip=True`参数
+
+```python
+openai.ChatCompletion.create(
+    model="gpt-3.5-turbo",
+    messages=mock_messages,
+    cache_skip=True,
+)
+```
+
+- 像积木一样，所有模块均可自定义，包括：
+  - pre-embedding，获取原始请求中的特征信息，如最后一条消息，prompt等
+  - embedding，将特征信息转换成向量数据
+  - data manager，缓存数据管理，主要包括数据搜索和保存
+  - cache similarity evaluation，可以使用相似搜索的距离或者其他更适合使用场景的模型
+  - post-process，处理缓存答案列表，比如最相似的，随机或者自定义
diff --git a/example/example.md b/example/example.md
@@ -4,49 +4,21 @@
 
 How to use the map to cache data.
 
-## [Sqlite + Faiss manage cache data](sf_mock/sf_manager.py)
-
-Before running this case, you should install the `faiss-cpu`.
-
-```bash
-pip install faiss-cpu
-```
+## [Sqlite + Faiss manage cache data](sqlite_faiss_mock/sqlite_faiss_mock.py)
 
 How to use the [sqlite](https://www.sqlite.org/index.html) to store the scale data and the faiss to query the vector data.
 
-## [Sqlite + Faiss + Towhee](sf_towhee/sf_manager.py)
-
-Before running this case, you should install the `faiss-cpu` and `towhee`.
-
-```bash
-pip install faiss-cpu
-pip install towhee==0.9.0
-```
+## [Sqlite + Faiss + Towhee](sqlite_faiss_towhee/sqlite_faiss_towhee.py)
 
 On the basis of the above example, use [towhee](https://towhee.io/) for embedding operation
 
 Note: the default embedding model only support the **ENGLISH**. If you want to use the Chinese, you can use the `uer/albert-base-chinese-cluecorpussmall` model. For other languages, you should use the corresponding model.
 
 ## [Sqlite + Milvus + Towhee](sqlite_milvus_mock/sqlite_milvus_mock.py)
 
-Before running this case, you should install the `faiss-cpu`, `towhee` and `pymilvus`.
-
-```bash
-pip install faiss-cpu
-pip install towhee==0.9.0
-pip install pymilvus
-```
-
-How to use the [sqlite](https://www.sqlite.org/index.html) to store the scale data and the [milvus](https://milvus.io/docs) to store the vector data.
-
-## [Benchmark](benchmark/benchmark_sf_towhee.py)
-
-Before running this case, you should install the `faiss-cpu` and `towhee`.
+How to use the [sqlite](https://www.sqlite.org/index.html) to store the scale data and the [Milvus](https://milvus.io/docs) or [Zilliz Cloud](https://cloud.zilliz.com/) to store the vector data.
 
-```bash
-pip install faiss-cpu
-pip install towhee==0.9.0
-```
+## [Benchmark](benchmark/benchmark_sqlite_faiss_towhee.py)
 
 The benchmark script about the `Sqlite + Faiss + Towhee`