AutoSWE is a system for producing entire code repositories from a PRD.md file. It is designed to produce the artifacts necessary for the DevBench benchmark which was established to evaluate the effectiveness of LLM-based code generation. It differs from typical evaluations of LLM-coding systems in that it is designed to evaluate the entire software engineering process, not just bug fixes or code completion.
DevBench has five evaluation tasks:
- Software Design
- Environment Setup
- Implementation
- Acceptance Testing
- Unit Testing
We have implemented a system that can automatically generate the artifacts for these tasks. The system uses LangGraph to orchestrate the control flow of the system and the artifacts are accumulated in a state
object. We use Pydantic to validate for structured outputs of the LLMs for each task - such as requesting dictionaries with specific keys. The system will also check for and handle installing necessary dependencies to run the code it generates.
We use LangChain, LangGraph, and LangSmith for tracing the OpenAI API calls and the state
of the system. We use GPT-4o as the LLM.
We use langgraph to manage the control flow of the system and nodes prefixed with "approve_" evaluate the documents/code and either approve the documents/code or circle back with a message regarding what is incorrect. They have conditional edges to route the flow of the system.
In the graph state
we accumulate "documents" produced by each node in the graph these are our final artifacts/outputs.
A .env.
file is required (see .env.example). It should contain the following:
OPENAI_API_KEY="your key"
LANGCHAIN_API_KEY="your key"
LANGCHAIN_TRACING_V2="true"
LANGCHAIN_PROJECT="autoSWE-1"
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
You will need to make an account at Langchain.
To point at a specific PRD.md file and run the full program:
python -m system.main --prd_path system/benchmark_data/python/particle-swarm-optimization/docs/PRD.md
Alternatively, you can run the example PRD.md file in the repo:
python -m system.main