Skip to content

Commit

Permalink
Evaluation README: Add TheAgentCompany (All-Hands-AI#5777)
Browse files Browse the repository at this point in the history
  • Loading branch information
li-boxuan authored Dec 24, 2024
1 parent 725e71a commit ecff5c6
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ temperature = 0.0

## Supported Benchmarks

The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), and [miscellaneous assistance](#misc-assistance) tasks.
The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), [miscellaneous assistance](#misc-assistance), and [real-world](#real-world) tasks.

### Software Engineering

Expand Down Expand Up @@ -73,6 +73,10 @@ The OpenHands evaluation harness supports a wide variety of benchmarks across [s
- ProofWriter: [`evaluation/benchmarks/logic_reasoning`](./benchmarks/logic_reasoning)
- ScienceAgentBench: [`evaluation/benchmarks/scienceagentbench`](./benchmarks/scienceagentbench)

### Real World

- TheAgentCompany: [`evaluation/benchmarks/the_agent_company`](./benchmarks/the_agent_company)

## Result Visualization

Check [this huggingface space](https://huggingface.co/spaces/OpenHands/evaluation) for visualization of existing experimental results.
Expand Down

0 comments on commit ecff5c6

Please sign in to comment.