Skip to content

Commit

Permalink
Improve core docstring (#99)
Browse files Browse the repository at this point in the history
Signed-off-by: Jael Gu <[email protected]>
  • Loading branch information
jaelgu authored Apr 6, 2023
1 parent 1edeba1 commit cd779d7
Show file tree
Hide file tree
Showing 8 changed files with 22 additions and 28 deletions.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

# -- Project information -----------------------------------------------------

project = 'GPT Cache'
project = 'GPTCache'
copyright = '2023, Zilliz'
author = 'Zilliz'

Expand Down
4 changes: 2 additions & 2 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 😆 Contributing to GPT Cache
# 😆 Contributing to GPTCache

Before contributing to GPT Cache, it is recommended to read the [system design article](./system.md).
Before contributing to GPTCache, it is recommended to read the [system design article](./system.md).

In the process of contributing, pay attention to **the parameter type**, because there is currently no type restriction added.

Expand Down
2 changes: 1 addition & 1 deletion docs/feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ openai.ChatCompletion.create(
```

- Like Lego bricks, custom assemble all modules, including:
- Adapter: The user interface to adapt different LLM model requests to the GPT cache protocol
- Adapter: The user interface to adapt different LLM model requests to the GPTCache protocol
- Pre-processor: Extracts the key information from the request and preprocess
- Context Buffer: Maintains session context
- Encoder: Embed the text into a dense vector for similarity search
Expand Down
11 changes: 5 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
.. GPT Cache documentation master file, created by
.. GPTCache documentation master file, created by
sphinx-quickstart on Tue Apr 4 12:07:10 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to GPT Cache!
Welcome to GPTCache!
=====================================

----
Expand All @@ -26,7 +26,7 @@ Welcome to GPT Cache!
Large Language Models (LLMs) are a promising and transformative technology that has rapidly advanced in recent years. These models are capable of generating natural language text and have numerous applications, including chatbots, language translation, and creative writing. However, as the size of these models increases, so do the costs and performance requirements needed to utilize them effectively. This has led to significant challenges in developing on top of large models such as ChatGPT.

To address this issue, we have developed **GPT Cache**, a project that focuses on caching responses from language models, also known as a semantic cache. The system offers two major benefits:
To address this issue, we have developed **GPTCache**, a project that focuses on caching responses from language models, also known as a semantic cache. The system offers two major benefits:

- **Quick response to user requests:** the caching system provides faster response times compared to large model inference, resulting in lower latency and faster response to user requests.
- **Reduced service costs:** most LLM services are currently charged based on the number of tokens. If user requests hit the cache, it can reduce the number of requests and lower service costs.
Expand Down Expand Up @@ -59,15 +59,14 @@ A good analogy for GptCache is to think of it as a more semantic version of Redi

We provide `benchmark <https://github.com/zilliztech/GPTCache/blob/main/examples/benchmark/benchmark_sqlite_faiss_onnx.py>`_ to illustrate the concept. In semantic caching, there are three key measurement dimensions: false positives, false negatives, and hit latency. With the plugin-style implementation, users can easily tradeoff these three measurements according to their needs.

You can take a look at `system architecture <./system.html>`_ and `modules <./module.html>`_ to learn about GPTCache design and architecture.
You can take a look at `modules <./module.html>`_ to learn about system design and architecture.

.. toctree::
:maxdepth: 2
:caption: Overview
:name: overview
:hidden:

system
module


Expand All @@ -92,7 +91,7 @@ For more information about API and examples, you can checkout `API References <.
Contributing
---------------

Would you like to contribute to the development of GPT Cache? Take a look at `our contribution guidelines <./contributing.html>`_.
Would you like to contribute to the development of GPTCache? Take a look at `our contribution guidelines <./contributing.html>`_.

.. toctree::
:maxdepth: 1
Expand Down
12 changes: 3 additions & 9 deletions docs/module.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ The LLM Adapter is designed to integrate different LLM models by unifying their
- [] Support OpenAI chatGPT API.
- [ ] Support other LLMs, such as Hugging Face Hub, Bard, Anthropic, and self-hosted models like LLaMa.

**API Reference**: You can find more about APIs and examples [here](./references/adapter.html).

## Embedding Generator

This **Embedding Generator** module is created to extract embeddings from requests for similarity search. GPTCache offers a generic interface that supports multiple embedding APIs, and presents a range of solutions to choose from.
Expand All @@ -23,8 +21,6 @@ This **Embedding Generator** module is created to extract embeddings from reques
- [] Support [SentenceTransformers](https://www.sbert.net) embedding API.
- [] Support [fastText](https://fasttext.cc) embedding API.

**API Reference**: You can find more about APIs and examples [here](./references/embedding.html).

## Cache Storage

**Cache Storage** is where the response from LLMs, such as ChatGPT, is stored. Cached responses are retrieved to assist in evaluating similarity and are returned to the requester if there is a good semantic match. At present, GPTCache supports SQLite and offers a universally accessible interface for extension of this module.
Expand All @@ -43,8 +39,6 @@ This **Embedding Generator** module is created to extract embeddings from reques
- [ ] Support [zincsearch](https://zinc.dev/)
- [ ] Support other storages

**API Reference**: You can find more about APIs and examples [here](./references/cache.html).

## Vector Store

The **Vector Store** module helps find the K most similar requests from the input request's extracted embedding. The results can help assess similarity. GPTCache provides a user-friendly interface that supports various vector stores, including Milvus, Zilliz Cloud, and FAISS. More options will be available in the future.
Expand Down Expand Up @@ -77,9 +71,9 @@ This module collects data from both the **Cache Storage** and **Vector Store**,
- [] Distance represented by applying linalg.norm from numpy to the embeddings.
- [ ] BM25 and other similarity measurements
- [ ] Support other models

**API Reference**: You can find more about APIs and examples [here](./references/similarity_evaluation.html).



<br>

#### Note:
Not all combinations of different modules may be compatible with each other. For instance, if we disable the **Embedding Extractor**, the **Vector Store** may not function as intended. We are currently working on implementing a combination sanity check for **GPTCache**.
10 changes: 5 additions & 5 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

**Note**:

- You can quickly try GPT cache and put it into a production environment without heavy development. However, please note that the repository is still under heavy development.
- You can quickly try GPTCache and put it into a production environment without heavy development. However, please note that the repository is still under heavy development.
- By default, only a limited number of libraries are installed to support the basic cache functionalities. When you need to use additional features, the related libraries will be **automatically installed**.
- Make sure that the Python version is **3.8.1 or higher**, check: `python --version`
- If you encounter issues installing a library due to a low pip version, run: `python -m pip install --upgrade pip`.
Expand All @@ -18,7 +18,7 @@ pip install gptcache
### dev install

```bash
# clone gpt cache repo
# clone GPTCache repo
git clone https://github.com/zilliztech/GPTCache.git
cd GPTCache

Expand Down Expand Up @@ -74,7 +74,7 @@ print(f'Answer: {response_text(response)}\n')

```

### OpenAI API + GPT Cache, exact match cache
### OpenAI API + GPTCache, exact match cache

> If you ask ChatGPT the exact same two questions, the answer to the second question will be obtained from the cache without requesting ChatGPT again.
Expand All @@ -87,7 +87,7 @@ def response_text(openai_resp):

print("Cache loading.....")

# To use GPT cache, that's all you need
# To use GPTCache, that's all you need
# -------------------------------------------------
from gptcache.core import cache
from gptcache.adapter import openai
Expand All @@ -113,7 +113,7 @@ for _ in range(2):
print(f'Answer: {response_text(response)}\n')
```

### OpenAI API + GPT Cache, similar search cache
### OpenAI API + GPTCache, similar search cache

> After obtaining an answer from ChatGPT in response to several similar questions, the answers to subsequent questions can be retrieved from the cache without the need to request ChatGPT again.
Expand Down
6 changes: 3 additions & 3 deletions docs/system.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## 🧐 System flow

![GPT Cache Flow](GPTCache.png)
![GPTCache Flow](GPTCache.png)

The core process of the system is shown in the diagram above:

Expand All @@ -25,9 +25,9 @@ After obtaining the corresponding result list from the cache, the model needs to

## 🤩 System Structure

![GPT Cache Structure](GPTCacheStructure.png)
![GPTCache Structure](GPTCacheStructure.png)

- **Adapter**: The user interface to adapt different LLM model requests to the GPT cache protocol, like: OpenAI chatGPT API or Hugging Face Hub, Anthropic, and self-hosted models like LLaMa.
- **Adapter**: The user interface to adapt different LLM model requests to the GPTCache protocol, like: OpenAI chatGPT API or Hugging Face Hub, Anthropic, and self-hosted models like LLaMa.
- **Pre-processor**: Extracts the key information from the request and preprocess, like: basic analysis and parse of the request.
- **Encoder**: Embed the text into a dense vector for similarity search, like: Use [ONNX](https://onnx.ai/) with the GPTCache/paraphrase-albert-onnx model for English text embedding.
- **Cache manager**: which includes searching, saving, or evicting data. It includes a database of scalar data and a database of vector data. The scalar data is historical question and answer data of the user, that is, historical question data and historical answer data, which is readable. The vector data is an array or others obtained by the process of the Encoder module, which is used for similarity search and is not commonly readable. The Cache Manage module performs a similarity search in the database of the vector data according to the vector data of the user's request, and obtains the result of the similarity search.
Expand Down
3 changes: 2 additions & 1 deletion gptcache/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ class Config:
"""Pass configuration.
:param log_time_func: optional, customized log time function
:similarity_threshold: threshold to determine where embeddings are similar to each other
:param similarity_threshold: a threshold ranged from 0 to 1 to filter search results with similarity score higher than the threshold.
When it is 0, there is no hits. When it is 1, all search results will be returned as hits.
:type similarity_threshold: float
Example:
Expand Down

0 comments on commit cd779d7

Please sign in to comment.