Skip to content

Commit

Permalink
Merge pull request #33 from nhsengland/advanced_rag_discussion
Browse files Browse the repository at this point in the history
Advanced rag discussion
  • Loading branch information
SamHollings authored Mar 26, 2024
2 parents 1f810ae + d28de30 commit 430c862
Showing 1 changed file with 45 additions and 3 deletions.
48 changes: 45 additions & 3 deletions WP3_Advanced_RAG_discussion.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,56 @@ E-->G[Retrieved content]
This technique requires an LLM to produce a potential document, from which the answer to the question could be drawn, and then use that as a semantic tool.
Other techniques get the LLM to try and answer the question without context, and then use that answer to improve the generation or retrieval.

Retrieval can also be improved through fine tuning of the embedder models.
A specialised embedder, fine tuned on the kind of documents or questions expected in the task, can produce a much more precise vector space and improve retrieval results.


### Augmentation
Once the materials have been retrieved, the original query can be augmented with extra context.
LLMs perform quite differently depending on how the prompt is worded. Increasingly the task of prompt engineering is being automated and tackled by LLMs themselves.

Re-Rank
We know that LLMs are better at retrieving facts from context. More specifically, they have been found to be better at retrieving facts from the start and end of context.
One of the simpler re-rank ideas is to place the most relevant retrieved materials at the start and end of the augmented prompt.


### Generation
Fine Tuning

## Modular RAG
Getting the generation right is the ultimate goal of the RAG pipeline, and all other parts can be evaluated by the performance of the generated content.
Once the query has been augmented, an LLM will attempt to generate an answer. This is another area where fine tuning can be utilised.

Often RAG is pitted against fine tuning, but this is with respect to the models ability to retrieve facts, rather than the abilty to generate good answers.
Both techniques can be used together, with a vectorised database being the main, updatable knowledge store, and the generator being fine tuned for the task. The fine tuning is then not so much that the model will be tuned on the right knowledge, but that it will be familiar with the right form of language expected in the output.


### Modular RAG
The modular RAG paradigm is slowly becoming the norm in the RAG domain due to its versatility and flexibility, allowing:
- the adaption of modules within the RAG process to suit your specific problem,
- for a serialized pipeline or an end-to-end training approach across multiple modules.

## The Future
Looking forward there are a few ways that RAG could go.
With that comes options for where our focus is as a project looking into RAG, we have a particular interest in models being open and runnable on small compute.

### Contextual Language Models
Recently the original writers of the RAG paper released an approach they are calling RAG 2.0, or Contextual Language Models.
The basic idea being a move away from off the shelf embedders and frozen LLMS to a fully tuneable, fully contextualised pipeline.
https://contextual.ai/introducing-rag2/

### Agents
Some of the language around modular RAG and self tuning systems starts to sound like Agents. RAG's specialism is in grounding generated answers in the truth, and the incorporation of the vectorised data store, but these concepts are absorbed into Agent based LLMs without issue.

### Massive context windows
LLMs are improving, and one of the measures that is increasing rapidly is the size of the context window.
RAG concepts can still play a part in this in two ways,
1. No matter the size of the context window, retrieval is still important for efficiency and quality of output.
2. How the huge context window is used is an augmentation task.

### Making the best Open model
Open source models are improving but still lag behind the best corporate LLMs.
There would be significant utility behind an Open model + RAG solution which could be production ready.

### Evaluation
No matter which direction LLMs or RAG pipelines go, getting evaluation right remains crucial.
But we should be careful not to assume that one evaluation metric always transfers to other usecases.

## Agents

0 comments on commit 430c862

Please sign in to comment.