mkdir rundir sbatch find_executions.sbatch PROMPTS_ROOT
PROMPTS_ROOT may have subdirectories.
sbatch do_executions.sbatch PROMPTS_ROOT
These instructions will run inference and evaluation on the Northeastern Discovery cluster. It should be possible to easily adapt the scripts for other Slurm clusters.
On a compute node, run
singularity pull docker://ghcr.io/nuprl/multipl-e-evaluation
This wll create the file multipl-e-evaluation_latest.sif
, which is the
container. The file cluster/discovery_evaluation.sh assumes that the file is
saved as
/work/arjunguha-research-group/arjun/containers/multipl-e-evaluation_latest.sif
.
You also need an environment that has the MultiPL-E dependencies. On Discovery,
you can use source ~a.guha/bin/gpuenv
, which activates an appropriate
Conda environment.
You can do this on the login node or a compute node with limited resources.
-
Activate an appropriate environment:
source ~a.guha/bin/gpuenv
-
Enter the root of the MultiPL-E repository:
cd /work/arjunguha-research-group/arjun/repos/MultiPL-E
-
Create a directory for experiment results:
mkdir experiments
You can re-use this directory to incrementally add new experiments.
-
Create a file called
experiments/inference.sh
. Each line of the file should run inference. For example:python -m inference --model-name inference.bigcode_mha --root-dataset humaneval --lang py --temperature 0.2 --batch-size 50
We will not run this shell script directly. Instead, we will run each line on a separate GPU node. Therefore, ensure that no command spans multiple lines (i.e., do not use trailing
\
) and do not include the#!
on the first line. -
Run
./cluster/pipeline.sh experiments
You will receive an email at your
@northeastern.edu
address when complete.The script puts all logs files in
experiments/logs
.