Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__main__.py		__main__.py
discovery_cpu_task.sbatch		discovery_cpu_task.sbatch
discovery_evaluation.sh		discovery_evaluation.sh
discovery_v100_array.sbatch		discovery_v100_array.sbatch
do_executions.py		do_executions.py
do_executions.sbatch		do_executions.sbatch
find_executions.py		find_executions.py
find_executions.sbatch		find_executions.sbatch
pass_k.sbatch		pass_k.sbatch
per_problem_pass_rates.sbatch		per_problem_pass_rates.sbatch
pipeline.sh		pipeline.sh
pipeline_k1.sbatch		pipeline_k1.sbatch
prepare_evaluation.sh		prepare_evaluation.sh

README.md

Discovery (a new approach)

mkdir rundir sbatch find_executions.sbatch PROMPTS_ROOT

PROMPTS_ROOT may have subdirectories.

sbatch do_executions.sbatch PROMPTS_ROOT

Evaluation on Discovery

These instructions will run inference and evaluation on the Northeastern Discovery cluster. It should be possible to easily adapt the scripts for other Slurm clusters.

Prerequisites

On a compute node, run

singularity pull docker://ghcr.io/nuprl/multipl-e-evaluation

This wll create the file multipl-e-evaluation_latest.sif, which is the container. The file cluster/discovery_evaluation.sh assumes that the file is saved as /work/arjunguha-research-group/arjun/containers/multipl-e-evaluation_latest.sif.

You also need an environment that has the MultiPL-E dependencies. On Discovery, you can use source ~a.guha/bin/gpuenv, which activates an appropriate Conda environment.

Running the Evaluation

You can do this on the login node or a compute node with limited resources.

Activate an appropriate environment:
```
source ~a.guha/bin/gpuenv
```

Enter the root of the MultiPL-E repository:

cd /work/arjunguha-research-group/arjun/repos/MultiPL-E

Create a directory for experiment results:
```
mkdir experiments
```
You can re-use this directory to incrementally add new experiments.
Create a file called experiments/inference.sh. Each line of the file should run inference. For example:
```
python -m inference --model-name inference.bigcode_mha --root-dataset humaneval --lang py --temperature 0.2 --batch-size 50
```
We will not run this shell script directly. Instead, we will run each line on a separate GPU node. Therefore, ensure that no command spans multiple lines (i.e., do not use trailing \) and do not include the #! on the first line.
Run ./cluster/pipeline.sh experiments

You will receive an email at your @northeastern.edu address when complete.

The script puts all logs files in experiments/logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster

cluster

README.md

Discovery (a new approach)

Evaluation on Discovery

Prerequisites

Running the Evaluation

Files

cluster

Directory actions

More options

Directory actions

More options

Latest commit

History

cluster

Folders and files

parent directory

README.md

Discovery (a new approach)

Evaluation on Discovery

Prerequisites

Running the Evaluation