forked from langchain-ai/langchain
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Generative Characters (langchain-ai#2859)
Add a time-weighted memory retriever and a notebook that approximates a Generative Agent from https://arxiv.org/pdf/2304.03442.pdf The "daily plan" components are removed for now since they are less useful without a virtual world, but the memory is an interesting component to build off. --------- Co-authored-by: Harrison Chase <[email protected]>
- Loading branch information
1 parent
a9310a3
commit 99c0382
Showing
7 changed files
with
1,771 additions
and
3 deletions.
There are no files selected for viewing
213 changes: 213 additions & 0 deletions
213
docs/modules/indexes/retrievers/examples/time_weighted_vectorstore.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "a90b7557", | ||
"metadata": {}, | ||
"source": [ | ||
"# Time Weighted VectorStore Retriever\n", | ||
"\n", | ||
"This retriever uses a combination of semantic similarity and recency.\n", | ||
"\n", | ||
"The algorithm for scoring them is:\n", | ||
"\n", | ||
"```\n", | ||
"semantic_similarity + (1.0 - decay_rate) ** hours_passed\n", | ||
"```\n", | ||
"\n", | ||
"Notably, hours_passed refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain \"fresh.\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "f22cc96b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import faiss\n", | ||
"\n", | ||
"from datetime import datetime, timedelta\n", | ||
"from langchain.docstore import InMemoryDocstore\n", | ||
"from langchain.embeddings import OpenAIEmbeddings\n", | ||
"from langchain.retrievers import TimeWeightedVectorStoreRetriever\n", | ||
"from langchain.schema import Document\n", | ||
"from langchain.vectorstores import FAISS\n" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "6af7ea6b", | ||
"metadata": {}, | ||
"source": [ | ||
"## Low Decay Rate\n", | ||
"\n", | ||
"A low decay rate (in this, to be extreme, we will set close to 0) means memories will be \"remembered\" for longer. A decay rate of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "c10e7696", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Define your embedding model\n", | ||
"embeddings_model = OpenAIEmbeddings()\n", | ||
"# Initialize the vectorstore as empty\n", | ||
"embedding_size = 1536\n", | ||
"index = faiss.IndexFlatL2(embedding_size)\n", | ||
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})\n", | ||
"retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.0000000000000000000000001, k=1) " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "86dbadb9", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"['129ba56b-7e7f-480b-83b3-8138a7f5db4a']" | ||
] | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"yesterday = datetime.now() - timedelta(days=1)\n", | ||
"retriever.add_documents([Document(page_content=\"hello world\", metadata={\"last_accessed_at\": yesterday})])\n", | ||
"retriever.add_documents([Document(page_content=\"hello foo\")])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"id": "a580be32", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 16, 15, 46, 43, 860748), 'created_at': datetime.datetime(2023, 4, 16, 15, 46, 14, 469670), 'buffer_idx': 1})]" | ||
] | ||
}, | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# \"Hello World\" is returned first because it is most salient, and the decay rate is close to 0., meaning it's still recent enough\n", | ||
"retriever.get_relevant_documents(\"hello world\")" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "ca056896", | ||
"metadata": {}, | ||
"source": [ | ||
"## High Decay Rate\n", | ||
"\n", | ||
"With a high decay factor (e.g., several 9's), the recency score quickly goes to 0! If you set this all the way to 1, recency is 0 for all objects, once again making this equivalent to a vector lookup.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "dc37669b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Define your embedding model\n", | ||
"embeddings_model = OpenAIEmbeddings()\n", | ||
"# Initialize the vectorstore as empty\n", | ||
"embedding_size = 1536\n", | ||
"index = faiss.IndexFlatL2(embedding_size)\n", | ||
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})\n", | ||
"retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.999, k=1) " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "fa284384", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"['8fff7ef8-3a30-40f3-b42e-b8d5c7850863']" | ||
] | ||
}, | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"yesterday = datetime.now() - timedelta(days=1)\n", | ||
"retriever.add_documents([Document(page_content=\"hello world\", metadata={\"last_accessed_at\": yesterday})])\n", | ||
"retriever.add_documents([Document(page_content=\"hello foo\")])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"id": "7558f94d", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 16, 15, 46, 17, 646927), 'created_at': datetime.datetime(2023, 4, 16, 15, 46, 14, 469670), 'buffer_idx': 1})]" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# \"Hello Foo\" is returned first because \"hello world\" is mostly forgotten\n", | ||
"retriever.get_relevant_documents(\"hello world\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "bf6d8c90", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.