Advanced RAG, Video by Donato Capitella on Self Query, Parent Document Retrival and HyDE #10
Replies: 2 comments 1 reply
-
Ah yes, around the parent document retrieval, if you save the records in the database "hierarchically" you can make it so that when it finds a chunk, it returns a larger sections of the document, e.g. the whole section, or even the whole document. There is also the concept of having chunk overlap, so you get more context. here is the bit of langchain around "parent document retrieval": https://python.langchain.com/docs/modules/data_connection/retrievers/parent_document_retriever |
Beta Was this translation helpful? Give feedback.
-
This video is really nice - the earlier part about the limitations of semantic search made me think about some stuff I'd not considered before! |
Beta Was this translation helpful? Give feedback.
-
https://www.youtube.com/watch?v=dCEMod64dko&ab_channel=DonatoCapitella
Often technologies are strong in some contexts and weak in others.
I like to frame these techniques by the strength that they are utilising, and also by the weakness they are mitigating.
RAG -> LLMS are strong when using the context window
Example: CHAT GPT is stronger when you ask it follow up questions.
Context has a weakness of it's window limit.
Solution -> put the best things in the context.
Self Query -> Semantic search is strong for natural language
Semantic search is weak for discrete or structured data
Keyword search is strong in those contexts
Example: "Wine from 1980" is semanticly similar to "Wine from 1940" But that's no good!
Solution -> Get an LLM to scan the query and determine whether keyword filters or limits should be used
Create a query that generates a filter if it would be useful.
Parent document retrieval -> Semantic search and vectorisation is strong when done on small chunks
Context can get weak when it is cut into small chunks.
Example: When a counter example is given within a document which is arguing the opposite point.
Solution -> Have parent and child chunks. Seach on the children, retrieve the parents
HyDE -> Semantic search is weak when the question is not in the same form as the documents in the database
Example: Query is in the form of questions but documents are in the form of reviews.
Solution -> Have an LLM generate hypothetical documents from the query, then do vectorised semantic search from these hypothetcal documents to retrieve real ones.
Beta Was this translation helpful? Give feedback.
All reactions