Skip to content

dayyyyyyyyyy/tau-bench

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Paper: https://arxiv.org/abs/2406.12045

Setup

  1. Clone this repository:
git clone https://github.com/sierra-research/tau-bench && cd ./tau-bench
  1. Install from source (which also installs required packages):
pip install -e .
  1. Set up your OpenAI / Anthropic / Google / Mistral / AnyScale API keys as environment variables.
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GEMINI_API_KEY=...
MISTRAL_API_KEY=...
ANYSCALE_API_KEY=...

Run

Run a function calling agent on the τ-retail environment:

python run.py --env retail --model gpt-4o --max_concurrency 10

Set max concurrency according to your API limit.

About

Code and Data for Tau-Bench

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%