- 2025-02-18: Add data viewer and dataset for better visualization of Text2World.
conda create -n text2world python=3.8 -y
conda activate text2world
pip install -r requirements.txt
Running the following command will generate files with PROMPT_TYPE="desc2domain_zeroshot_cot"
and DESCRIPTION_TYPE="corrected_description"
in the _generated_pddl/_all_gen
directory.
bash generate.sh ${MODEL} ${CORRECTION_TIME}
To try different values for $PROMPT_TYPE
and $DESCRIPTION_TYPE
, you can manually modify them in the generate.sh
script.
OPENAI_API_TYPE="open_ai"
OPENAI_API_BASE=...
OPENAI_API_KEY=...
# OpenAI O-series
o1-mini
o1-preview
o3-mini
# OpenAI GPT-4
gpt-4o
gpt-4o-mini
gpt-4-turbo
chatgpt-4o-latest
# OpenAI GPT-3.5
gpt-3.5-turbo-0125
gpt-3.5-turbo-1106
# Anthropic Claude
claude-3.5-sonnet
# Meta Llama-2
llama2-7b
llama2-13b
llama2-70b
# Meta LlaMA-3.1
llama3.1-8b
llama3.1-70b
# DeepSeek
deepseek-reasoner
deepseek-v3
# Meta CodeLlaMA
codellama-7b
codellama-13b
codellama-34b
codellama-70b
If you need to configure your own LLM, you can modify utils/text2world.yaml
to define your custom model, as an example:
${NAME}:
name: gpt # API type
engine: ${ENGINE}
context_length: 128000
use_azure: False
temperature: 0.
top_p: 1
retry_delays: 20
max_retry_iters: 100
stop:
max_tokens: 4000
use_parser: False
First, create a project using the following command. It will create a folder with the same name under _generated_pddl
. Note that $PROJECT_NAME
cannot be _all_gen
.
bash create_project.sh $PROJECT_NAME
Next, please manually copy the generated content of the models you are interested in evaluating from _generated_pddl/_all_gen
to _generated_pddl/$PROJECT_NAME
.
Finally, run the following command to evaluate all models in the project. The evaluation results will be generated in _generated_pddl/_eval_result/$PROJECT_NAME
, including detailed scores for all PDDL files generated by each model and an overall leaderboard in the _result_board.txt
file.
bash evaluate.sh $PROJECT_NAME
If you find this work useful, please consider citing the following papers:
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation:
@misc{hu2025text2worldbenchmarkinglargelanguage,
title={Text2World: Benchmarking Large Language Models for Symbolic World Model Generation},
author={Mengkang Hu and Tianxing Chen and Yude Zou and Yuheng Lei and Qiguang Chen and Ming Li and Hongyuan Zhang and Wenqi Shao and Ping Luo},
year={2025},
eprint={2502.13092},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.13092},
}