rSDE-Bench: Requirement-Oriented Software Development Benchmark

Code and data for paper "Self-Evolving Multi-Agent Collaboration Networks for Software Development".

👋 Overview

rSDE-Bench is a requirement-oriented benchmark designed to evaluate the ability of models to handle software-level coding tasks. Unlike instruction-based approaches, rSDE-Bench uses detailed software requirements as input, specifying each functionality and constraint of the software. The benchmark includes automatic evaluation through unit tests, providing a more realistic assessment aligned with real-world software development practices.

🚀 Set Up

Make sure to use python 3.8 or later:

conda create -n rsde_bench python=3.8
conda activate rsde_bench

Check out and install this repository:

git clone https://github.com/yuzhu-cai/rSDE-Bench.git
cd rSDE-Bench
pip install -r requirement.txt

💽 Usage

Warning

Operating System: Ensure that you are running this project on an operating system with a graphical user interface. Currently, Windows and macOS are supported.

Dependencies: Make sure all dependencies are correctly installed and the appropriate Python environment is activated.

Use the following command to generate the software included in rSDE-Bench using the GPT, Claude, or Gemini APIs. The generated code will be stored in the codes directory.

python run_infer.py

Evaluate the software code generated in the codes directory with the following command:

python run_eval.py

To aggregate the performance and differences of the software code generated under various settings, run:

python update_result.py

✍️ Citation

If you find our work helpful, please use the following citations.

@misc{hu2024selfevolvingmultiagentcollaborationnetworks,
      title={Self-Evolving Multi-Agent Collaboration Networks for Software Development}, 
      author={Yue Hu and Yuzhu Cai and Yaxin Du and Xinyu Zhu and Xiangrui Liu and Zijie Yu and Yuchen Hou and Shuo Tang and Siheng Chen},
      year={2024},
      eprint={2410.16946},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2410.16946}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets/figs		assets/figs
codes		codes
evaluator		evaluator
inference		inference
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt
run_eval.py		run_eval.py
run_infer.py		run_infer.py
update_result.py		update_result.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rSDE-Bench: Requirement-Oriented Software Development Benchmark

👋 Overview

🚀 Set Up

💽 Usage

✍️ Citation

About

Releases

Packages

Languages

License

yuzhu-cai/rSDE-Bench

Folders and files

Latest commit

History

Repository files navigation

rSDE-Bench: Requirement-Oriented Software Development Benchmark

👋 Overview

🚀 Set Up

💽 Usage

✍️ Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages