Guidance makes it easy to write prompts / programs to control language models with rich output structure.
Simple output structure like Chain of Thought and its many variants (e.g. with ART,) has been shown to improve LLM performance.
The advent of more powerful LLMs like GPT-4 allows for even richer output structure, and guidance
makes that structure easier and cheaper.
pip install guidance
Let's take a simple task from BigBench, where the goal is to identify whether a given sentence contains an anachronism.
Here is a simple two-shot prompt for it, with a human-crafted chain-of-thought sequence:
import guidance
guidance.llm = guidance.llms.OpenAI("text-davinci-003")
instruction = 'Given a sentence tell me whether it contains an anachronism (i.e. whether it could have happened or not based on the time periods associated with the entities).'
examples = [
{'input': 'I wrote about shakespeare',
'entities': [{'entity': 'I', 'time': 'present'}, {'entity': 'Shakespeare', 'time': '16th century'}],
'reasoning': 'I can write about Shakespeare because he lived in the past with respect to me.',
'answer': 'No'},
{'input': 'Shakespeare wrote about me',
'entities': [{'entity': 'Shakespeare', 'time': '16th century'}, {'entity': 'I', 'time': 'present'}],
'reasoning': 'Shakespeare cannot have written about me, because he died before I was born',
'answer': 'Yes'}
]
structure_prompt = guidance(
'''{{instruction}}
----
{{~! Few shot examples here ~}}
{{~#each examples}}
Sentence: {{this.input}}
Entities and dates:{{#each this.entities}}
{{this.entity}}: {{this.time}}{{/each}}
Reasoning: {{this.reasoning}}
Anachronism: {{this.answer}}
---
{{~/each}}
{{~! Input example here}}
Sentence: {{input}}
Entities and dates:
{{gen "entities"}}
Reasoning:{{gen "reasoning"}}
Anachronism:{{#select "answer"}} Yes{{or}} No{{/select}}''')
structure_prompt(examples=examples, input='The T-rex bit my dog', instruction=instruction)
Given a sentence tell me whether it contains an anachronism (i.e. whether it could have happened or not based on the time periods associated with the entities). ---- Sentence: I wrote about shakespeare Entities and dates: I: present Shakespeare: 16th century Reasoning: I can write about Shakespeare because he lived in the past with respect to me. Anachronism: No --- Sentence: Shakespeare wrote about me Entities and dates: Shakespeare: 16th century I: present Reasoning: Shakespeare cannot have written about me, because he died before I was born Anachronism: Yes --- Sentence: The T-rex bit my dog Entities and dates: T-rex: 65 million years ago My dog: present Reasoning: The T-rex lived millions of years before my dog, so it cannot have bitten my dog. Anachronism: Yes
We compute accuracy on the validation set, and compare it to using the same two-shot examples above without the output structure, as well as to the best reported result here. The results below agree with existing literature, in that even a very simple output structure drastically improves performance, even compared against much larger models.
Model | Accuracy |
---|---|
Few-shot learning with guidance examples, no CoT output structure | 63.04% |
PALM (3-shot) | Around 69% |
Guidance | 76.01% |