τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Paper: https://arxiv.org/abs/2406.12045

install DF CX SDK

gsutil cp gs://agent-evals/v3alpha1_dialogflow-v3alpha1-py.tar /content/v3alpha1_dialogflow-v3alpha1-py.tar
tar -xvf content/v3alpha1_dialogflow-v3alpha1-py.tar
venv/bin/python3 dialogflow-v3alpha1-py/setup.py sdist
venv/bin/pip install content/v3alpha1_dialogflow-v3alpha1-py.tar

venv/bin/pip install google-cloud-dialogflow-cx

Setup

Clone this repository:

git clone https://github.com/sierra-research/tau-bench && cd ./tau-bench

Install from source (which also installs required packages):

pip install -e .

Set up your OpenAI / Anthropic / Google / Mistral / AnyScale API keys as environment variables.

OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
GEMINI_API_KEY=...
MISTRAL_API_KEY=...
ANYSCALE_API_KEY=...

Run

Run a function calling agent on the τ-retail environment:

python run.py --env retail --model gpt-4o --max_concurrency 10

Set max concurrency according to your API limit.

Run a decibel agent with gemini as user model:

venv/bin/python3 tau-bench/run.py --env retail  --agent_strategy decibel --agent_id 429da584-b933-4372-822c-52d124ba5a26 --project_id df-decibel2-dev-test  --start_index 0 --end_index -1  --user_model gemini-1.5-pro

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
tau_bench		tau_bench
LICENSE		LICENSE
README.md		README.md
dump_tools.py		dump_tools.py
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

install DF CX SDK

Setup

Run

About

Releases

Packages

Languages

License

jingyun19/tau-bench

Folders and files

Latest commit

History

Repository files navigation

τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

install DF CX SDK

Setup

Run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages