readme

smith-nathanh · Nov 25, 2024 · 6e53db6 · 6e53db6
1 parent 30ada94
commit 6e53db6
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # autoSWE
 
-autoSWE is a software engineering automation tool designed to produce a fully-functioning code repository based on input Produce Requirements Document (PRD). It produces the artifacts measured in [DevBench ](https://github.com/open-compass/DevBench), a software engineering benchmark designed to test the efficacy of LLM-based code generation systems.
+autoSWE is a software engineering automation tool designed to produce a fully-functioning code repository based on a Produce Requirements Document (PRD). It produces the artifacts measured in [DevBench ](https://github.com/open-compass/DevBench), a software engineering benchmark designed to test the efficacy of LLM-based code generation systems.
 
 For more detailed information on the features, tools, and processes used in autoSWE, please refer to the [system/README.md](system/README.md) document.
 

diff --git a/system/README.md b/system/README.md
@@ -1,6 +1,6 @@
 # autoSWE
 
-AutoSWE is a system for producing entire code repositories from a PRD.md file. It is designed to produce the artifacts necessary for the [DevBench ](https://github.com/open-compass/DevBench) benchmark designed to evaluate the effectiveness of LLM-based code generation. It differs from typical evaluations of LLM-coding systems in that it is designed to evaluate the entire software engineering process, not just bug fixes or code completion.
+AutoSWE is a system for producing entire code repositories from a PRD.md file. It is designed to produce the artifacts necessary for the [DevBench ](https://github.com/open-compass/DevBench) benchmark which was established to evaluate the effectiveness of LLM-based code generation. It differs from typical evaluations of LLM-coding systems in that it is designed to evaluate the entire software engineering process, not just bug fixes or code completion.
 
 DevBench has five evaluation tasks:
 
@@ -13,6 +13,8 @@ DevBench has five evaluation tasks:
 
 We have implemented a system that can automatically generate the artifacts for these tasks. The system uses LangGraph to orchestrate the control flow of the system and the artifacts are accumulated in a `state` object. We use Pydantic to validate for structured outputs of the LLMs for each task - such as requesting dictionaries with specific keys. The system will also check for and handle installing necessary dependencies to run the code it generates.
 
+We use LangChain, LangGraph, and LangSmith for tracing the OpenAI API calls and the `state` of the system. We use GPT-4o as the LLM.
+
 ### Control flow
 
 We use langgraph to manage the control flow of the system and nodes prefixed with "approve_" evaluate the documents/code and either approve the documents/code or circle back with a message regarding what is incorrect. They have conditional edges to route the flow of the system.