τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Setup

git clone https://github.com/sierra-research/tau-bench && cd ./tau-bench

pip install -e .

Set up your OpenAI / Anthropic / Google / Mistral / AnyScale API keys as environment variables.

OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GEMINI_API_KEY=...
MISTRAL_API_KEY=...
ANYSCALE_API_KEY=...

Run a function calling agent on the τ-retail environment:

python run.py --env retail --model gpt-4o --max_concurrency 10

Set max concurrency according to your API limit.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
tau_bench		tau_bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.py		run.py
setup.py		setup.py