- How to set the
embedding
function - How to set the
data manager
class - How to set the
similarity evaluation
interface - Other cache init params
- Benchmark
Please note that not all data managers are compatible with an embedding function.
Nothing to do. Only map data
manager can be configured for use.
def to_embeddings(data, **kwargs):
return data
When creating an Embedding object, the model will be loaded. It is important to remember to pass the dimension to the data manager.
from gptcache.core import cache, Config
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
from gptcache.embedding import Onnx
onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(embedding_func=onnx.to_embeddings,
data_manager=data_manager,
similarity_evaluation=SearchDistanceEvaluation(),
)
cache.set_openai_key()
The usage of the following models is similar to the above.
OpenAI
from gptcache.embedding import OpenAI
openai = OpenAI()
# openai.dimension
# openai.to_embeddings
Huggingface
from gptcache.embedding import Huggingface
huggingface = Huggingface()
# huggingface.dimension
# huggingface.to_embeddings
Cohere
from gptcache.embedding import Cohere
cohere = Cohere()
# cohere.dimension
# cohere.to_embeddings
SentenceTransformer
from gptcache.embedding import SBERT
sbert = SBERT()
# sbert.dimension
# sbert.to_embeddings
FastText
from gptcache.embedding import FastText
fast_text = FastText()
# fast_text.dimension
# fast_text.to_embeddings
The function has two parameters: the preprocessed string and parameters reserved for user customization. To acquire these parameters, a similar method to the one above is used: kwargs.get("embedding_func", {})
.
Example code
def to_embeddings(data, **kwargs):
return data
class Cohere:
def __init__(self, model: str="large", api_key: str=None, **kwargs):
self.co = cohere.Client(api_key)
self.model = model
if model in self.dim_dict():
self.__dimension = self.dim_dict()[model]
else:
self.__dimension = None
def to_embeddings(self, data):
if not isinstance(data, list):
data = [data]
response = self.co.embed(texts=data, model=self.model)
embeddings = response.embeddings
return np.array(embeddings).astype('float32').squeeze(0)
@property
def dimension(self):
if not self.__dimension:
foo_emb = self.to_embeddings("foo")
self.__dimension = len(foo_emb)
return self.__dimension
@staticmethod
def dim_dict():
return {
"large": 4096,
"small": 1024
}
Note that if you intend to use the model, it should be packaged with a class. The model will be loaded when the object is created to avoid unnecessary loading when not in use. This also ensures that the model is not loaded multiple times during program execution.
MapDataManager, default
Store all data in a map data structure, using the question as the key.
from gptcache.manager import get_data_manager
from gptcache import cache
data_manager = get_data_manager()
cache.init(data_manager=data_manager)
cache.set_openai_key()
Cached storage and Vector store
The user's question and answer data can be stored in a general database such as SQLite or MySQL, while the vector obtained through the question text embedding is stored in a separate vector database.
from gptcache import cache
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
import numpy as np
d = 8
def mock_embeddings(data, **kwargs):
return np.random.random((d, )).astype('float32')
cache_base = CacheBase('sqlite')
vector_base = VectorBase('faiss', dimension=d)
data_manager = get_data_manager(cache_base, vector_base)
cache.init(embedding_func=mock_embeddings,
data_manager=data_manager,
similarity_evaluation=SearchDistanceEvaluation(),
)
cache.set_openai_key()
Support general database
- SQLite.
- PostgreSQL.
- MySQL.
- MariaDB.
- SQL Server.
- Oracle.
Support vector database
- Milvus
- Zilliz Cloud
- FAISS
- ChromaDB
Custom Store
First, you need to implement two interfaces, namely CacheStorage
and VectorBase
, and then create the corresponding data manager through the get_data_manager
method.
Reference: CacheStorage sqlalchemy VectorBase Faiss
from gptcache import cache
from gptcache.manager import get_data_manager
data_manager=get_data_manager(cache_base=CustomCacheStore(), vector_base=CustomVectorStore())
cache.init(data_manager=data_manager)
Exact match between two questions, currently only available for map data manager.
from gptcache import cache
from gptcache.similarity_evaluation.exact_match import ExactMatchEvaluation
cache.init(
similarity_evaluation=ExactMatchEvaluation(),
)
cache.set_openai_key()
SearchDistanceEvaluation
Using search distance to evaluate sentences pair similarity.
from gptcache import cache
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
cache.init(
similarity_evaluation=SearchDistanceEvaluation(),
)
cache.set_openai_key()
OnnxModelEvaluation
Using ONNX model to evaluate sentences pair similarity.
from gptcache import cache
from gptcache.similarity_evaluation.onnx import OnnxModelEvaluation
cache.init(
similarity_evaluation=OnnxModelEvaluation(),
)
cache.set_openai_key()
NumpyNormEvaluation
Using Numpy norm to evaluate sentences pair similarity.
from gptcache import cache
from gptcache.similarity_evaluation.np import NumpyNormEvaluation
cache.init(
similarity_evaluation=NumpyNormEvaluation(),
)
cache.set_openai_key()
Custom similarity evaluation
To meet the requirements, you will need to implement the SimilarityEvaluation
interface, which consists of two methods: evaluation
and range
.
- evaluation, The function takes three input values, namely user request data, cache data, and user-defined data. The last parameter, by using kwargs.get("evaluation_func", {}), is reserved for users.
- range, The return of the range function needs to return two values, which are the minimum and maximum values.
Reference: similarity evaluation dir
-
cache_enable_func: determines whether to use the cache.
Among them,
args
andkwargs
represent the original request parameters. If the function returns True, the cache is enabled.You can use this function to ensure that the cache is not enabled when the length of the question is too long, as the likelihood of caching the result is low in such cases.
def cache_all(*args, **kwargs): return True
-
pre_embedding_func: extracts key information from the request and preprocesses it to ensure that the input information for the encoder module's embedding function is simple and accurate.
The
data
parameter represents the original request dictionary object, while thekwargs
parameter is used to obtain user-defined parameters. By usingkwargs.get("pre_embedding_func", {})
, the main purpose is to allow users to pass additional parameters at a certain stage of the process.For example, it may extract only the last message in the message array of the OpenAI request body, or the first and last messages in the array.
def last_content(data, **kwargs): return data.get("messages")[-1]["content"]
def all_content(data, **kwargs): s = "" messages = data.get("messages") for i, message in enumerate(messages): if i == len(messages) - 1: s += message["content"] else: s += message["content"] + "\n" return s
-
config: includes cache-related configurations, which currently consist of the following:
log_time_func
,similarity_threshold
, andsimilarity_positive
.- log_time_func: The function logging time-consuming operations currently detects
embedding
andsearch
functions. - similarity_threshold
- similarity_positive: When set to
True
, a higher value indicates a higher degree of similarity. When set toFalse
, a lower value indicates a higher degree of similarity.
- log_time_func: The function logging time-consuming operations currently detects
-
next_cache: This points to the next cache object, which is useful for implementing multi-level cache functions.
from gptcache import cache, Cache from gptcache.manager import get_data_manager bak_cache = Cache() bak_data_file = "data_map_bak.txt" bak_cache.init(data_manager=get_data_manager(data_path=bak_data_file)) cache.init(data_manager=get_data_manager(), next_cache=bak_cache)
- cache_obj: customize request cache, use global variable cache by default.
onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
one_cache = Cache()
one_cache.init(embedding_func=onnx.to_embeddings,
data_manager=data_manager,
evaluation_func=pair_evaluation,
config=Config(
similarity_threshold=1,
similarity_positive=False,
),
)
question = "what do you think about chatgpt"
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": question}
],
cache_obj=one_cache
)
- cache_context: Custom cache parameters can be passed separately for each step in the caching process.
question = "what do you think about chatgpt"
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": question}
],
cache_context={
"pre_embedding_func": {},
"embedding_func": {},
"search_func": {},
"evaluation_func": {},
"save_func": {},
}
)
- cache_skip: skip the cache search, but still store the results returned by the LLM model. These stored results can be used to retry when the cached result is unsatisfactory. Additionally, during the startup phase of the cache system, you can avoid performing a cache search altogether and directly save the data, which can then be used for data accumulation.
question = "what do you think about chatgpt"
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": question}
],
cache_skip=True
)
For more details, please refer to: API Reference
The benchmark script about the Sqlite + Faiss + ONNX
Test data source: Randomly scrape some information from the webpage (origin), and then let chatgpt produce corresponding data (similar).
- threshold: answer evaluation threshold, A smaller value means higher consistency with the content in the cache, a lower cache hit rate, and a lower cache miss hit; a larger value means higher tolerance, a higher cache hit rate, and at the same time also have higher cache misses.
- positive: effective cache hit, which means entering
similar
to search and get the same result asorigin
- negative: cache hit but the result is wrong, which means entering
similar
to search and get the different result asorigin
- fail count: cache miss
data file: mock_data.json similarity evaluation func: pair_evaluation (search distance)
threshold | average time | positive | negative | fail count |
---|---|---|---|---|
0.95 | 0.12s | 425 | 25 | 549 |
0.9 | 0.23s | 804 | 77 | 118 |
0.8 | 0.26s | 904 | 92 | 3 |