Name		Name	Last commit message	Last commit date
parent directory ..
eval_utils		eval_utils
scripts		scripts
README.md		README.md
run_infer.py		run_infer.py

README.md

DiscoveryBench with OpenHands

DiscoveryBench (Paper) contains 264 tasks collected across 6 diverse domains, such as biology, economics, and sociology. It incorporates discovery workflows from published papers to approximate the real-world challenges faced by researchers.

Setup Environment and LLM Configuration

Please follow instructions mentioned here to setup OpenHands development environment and LLMs locally
Execute the bash script to start DiscoveryBench Evaluation

./evaluation/discoverybench/scripts/run_infer.sh [YOUR MODEL CONFIG]

Replace [YOUR MODEL CONFIG] with any model the model that you have set up in config.toml

Run Inference on DiscoveryBench Instances

When the run_infer.sh script is started, it will automatically pull the latest DiscoveryBench instances & set up the agent environment. The OpenHands agent is invoked to process the task within this environment, producing a hypothesis. We then evaluate it against the “gold” hypothesis provided by DiscoveryBench. The evaluation result, along with the agent chat history is logged to output.jsonl under evaluation_outputs.

./evaluation/discoverybench/scripts/run_infer.sh [MODEL_CONFIG] [GIT_COMMIT] [AGENT] [EVAL_LIMIT] [NUM_WORKERS]

MODEL_CONFIG: Name of the model you want to evaluate with
GIT_COMMIT: This should be the git commit hash or release tag for OpenHands, e.g., HEAD or a specific tag like 0.6.2.
AGENT: Use CoderActAgent, right now it only supports that.
EVAL_LIMIT: Number of samples to evaluate.
NUM_WORKERS: Number of workers to parallelize the evaluation process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discoverybench

discoverybench

README.md

DiscoveryBench with OpenHands

Setup Environment and LLM Configuration

Run Inference on DiscoveryBench Instances

Files

discoverybench

Directory actions

More options

Directory actions

More options

Latest commit

History

discoverybench

Folders and files

parent directory

README.md

DiscoveryBench with OpenHands

Setup Environment and LLM Configuration

Run Inference on DiscoveryBench Instances