Code for the Paper "TravelPlanner: A Benchmark for Real-World Planning with Language Agents".
[Website]• [Paper] • [Dataset] • [Leaderboard] • [Environment] • [Twitter]
TravelPlanner is a benchmark crafted for evaluating language agents in tool-use and complex planning within multiple constraints.
For a given query, language agents are expected to formulate a comprehensive plan that includes transportation, daily meals, attractions, and accommodation for each day.
For constraints, from the perspective of real world applications, TravelPlanner includes three types of them: Environment Constraint, Commonsense Constraint, and Hard Constraint.
- Create a conda environment and install dependency:
conda create -n travelplanner python=3.9
conda activate travelplanner
pip install -r requirements.txt
- Download the database and unzip it to the
TravelPlanner
directory (i.e.,your/path/TravelPlanner
).
In the two-stage mode, language agents are tasked to with employing various search tools to gather information. Based on the collected information, language agents are expected to deliver a plan that not only meet the user’s needs specified in the query but also adheres to commonsense constraints.
export OUTPUT_DIR=path/to/your/output/file
# We support MODEL in ['gpt-3.5-turbo-X','gpt-4-1106-preview','gemini','mistral-7B-32K','mixtral']
export MODEL_NAME=MODEL_NAME
export OPENAI_API_KEY=YOUR_OPENAI_KEY
# if you do not want to test google model, like gemini, just input "1".
export GOOGLE_API_KEY=YOUR_GOOGLE_KEY
# SET_TYPE in ['validation', 'test']
export SET_TYPE=validation
cd agents
python tool_agents.py --set_type $SET_TYPE --output_dir $OUTPUT_DIR --model_name $MODEL_NAME
The generated plan will be stored in OUTPUT_DIR/SET_TYPE.
TravelPlanner also provides an easier mode solely focused on testing their planning ability. The sole-planning mode ensures that no crucial information is missed, thereby enabling agents to focus on planning itself.
Please refer to paper for more details.
export OUTPUT_DIR=path/to/your/output/file
# We support MODEL in ['gpt-3.5-turbo-X','gpt-4-1106-preview','gemini','mistral-7B-32K','mixtral']
export MODEL_NAME=MODEL_NAME
export OPENAI_API_KEY=YOUR_OPENAI_KEY
# if you do not want to test google model, like gemini, just input "1".
export GOOGLE_API_KEY=YOUR_GOOGLE_KEY
# SET_TYPE in ['validation', 'test']
export SET_TYPE=validation
# STRATEGY in ['direct','cot','react','reflexion']
export STRATEGY=direct
cd tools/planner
python sole_planning.py --set_type $SET_TYPE --output_dir $OUTPUT_DIR --model_name $MODEL_NAME --strategy $STRATEGY
from datasets import load_dataset
# test can be substituted by "train" and "validation".
data = load_dataset('osunlp/TravelPlanner','test')['test']
-
-
Baseline Code
-
Query Construction Code
-
Evaluation Code
-
Plan Parsing and Element Extraction Code
-
-
- Release Environment Database
- Database Field Introduction
If you have any problems, please contact Jian Xie, Kai Zhang, Yu Su
If our paper or related resources prove valuable to your research, we kindly ask for citation.
@article{Xie2024TravelPlanner,
author = {Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su},
title = {TravelPlanner: A Benchmark for Real-World Planning with Language Agents},
journal = {arXiv preprint arXiv: 2402.01622},
year = {2024}
}