Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload modules on a multi-arch cluster #796

Open
pfermi opened this issue Oct 21, 2024 · 0 comments
Open

Reload modules on a multi-arch cluster #796

pfermi opened this issue Oct 21, 2024 · 0 comments

Comments

@pfermi
Copy link

pfermi commented Oct 21, 2024

When the head node and the compute node have different microarchitectures, modules loaded in the head node must be reloaded in the compute node to get the right binaries. The command to force this reloading is module update, but if only done once, already loaded modules appear as inactive.

EESSI/2023.06 loaded successfully

Inactive Modules:
  1) FFTW.MPI/3.3.10-gompi-2023b
  2) FFTW/3.3.10-GCC-13.2.0
  3) FlexiBLAS/3.3.1-GCC-13.2.0
  4) GCC/13.2.0
  5) GCCcore/13.2.0
  6) GROMACS
  7) OpenBLAS/0.3.24-GCC-13.2.0
  8) OpenMPI/4.1.6-GCC-13.2.0
  9) OpenSSL/1.1
 10) PMIx/4.2.6-GCCcore-13.2.0
 11) Python-bundle-PyPI/2023.10-GCCcore-13.2.0
 12) Python/3.11.5-GCCcore-13.2.0
 13) SQLite/3.43.1-GCCcore-13.2.0
 14) ScaLAPACK/2.2.0-gompi-2023b-fb
 15) SciPy-bundle/2023.11-gfbf-2023b
 16) Tcl/8.6.13-GCCcore-13.2.0
 17) UCC/1.2.0-GCCcore-13.2.0
 18) UCX/1.15.0-GCCcore-13.2.0
 19) cffi/1.15.1-GCCcore-13.2.0
 20) cryptography/41.0.5-GCCcore-13.2.0
 21) foss/2023b
 22) gfbf/2023b
 23) gompi/2023b
 24) hwloc/2.9.2-GCCcore-13.2.0
 25) libevent/2.1.12-GCCcore-13.2.0
 26) libfabric/1.19.0-GCCcore-13.2.0
 27) libffi/3.4.4-GCCcore-13.2.0
 28) libpciaccess/0.17-GCCcore-13.2.0
 29) libxml2/2.11.5-GCCcore-13.2.0
 30) mpi4py/3.1.5-gompi-2023b
 31) networkx/3.2.1-gfbf-2023b
 32) numactl/2.0.16-GCCcore-13.2.0
 33) pybind11/2.11.1-GCCcore-13.2.0
 34) virtualenv/20.24.6-GCCcore-13.2.0

Instead we have to apply module update twice, getting:

EESSI/2023.06 loaded successfully

Activating Modules:
  1) FFTW.MPI/3.3.10-gompi-2023b
  2) FFTW/3.3.10-GCC-13.2.0
  3) FlexiBLAS/3.3.1-GCC-13.2.0
  4) GCC/13.2.0
  5) GCCcore/13.2.0
  6) GROMACS/2024.3-foss-2023b
  7) OpenBLAS/0.3.24-GCC-13.2.0
  8) OpenMPI/4.1.6-GCC-13.2.0
  9) OpenSSL/1.1
 10) PMIx/4.2.6-GCCcore-13.2.0
 11) Python-bundle-PyPI/2023.10-GCCcore-13.2.0
 12) Python/3.11.5-GCCcore-13.2.0
 13) SQLite/3.43.1-GCCcore-13.2.0
 14) ScaLAPACK/2.2.0-gompi-2023b-fb
 15) SciPy-bundle/2023.11-gfbf-2023b
 16) Tcl/8.6.13-GCCcore-13.2.0
 17) UCC/1.2.0-GCCcore-13.2.0
 18) UCX/1.15.0-GCCcore-13.2.0
 19) cffi/1.15.1-GCCcore-13.2.0
 20) cryptography/41.0.5-GCCcore-13.2.0
 21) foss/2023b
 22) gfbf/2023b
 23) gompi/2023b
 24) hwloc/2.9.2-GCCcore-13.2.0
 25) libevent/2.1.12-GCCcore-13.2.0
 26) libfabric/1.19.0-GCCcore-13.2.0
 27) libffi/3.4.4-GCCcore-13.2.0
 28) libpciaccess/0.17-GCCcore-13.2.0
 29) libxml2/2.11.5-GCCcore-13.2.0
 30) mpi4py/3.1.5-gompi-2023b
 31) networkx/3.2.1-gfbf-2023b
 32) numactl/2.0.16-GCCcore-13.2.0
 33) pybind11/2.11.1-GCCcore-13.2.0
 34) virtualenv/20.24.6-GCCcore-13.2.0

Finally, in order to have this reload done automatically when a job is submitted to Slurm, write the following code to a file and point to it with the BASH_ENV variable. It is important to note that reloading the EESSI module causes an infinite loop if the BASH_ENV variable is set.

#!/bin/bash
#
# This script is sourced by Slurm when launching a job with SBATCH or SRUN.

# BASH_ENV must be unset when loading/reloading the EESSI module to avoid infinite loop
unset BASH_ENV
original_MODULEPATH="${MODULEPATH}"
module -q update
if [[ "${MODULEPATH}" != "${original_MODULEPATH}" ]]; then
  echo "Reloading for architecture ${EESSI_SOFTWARE_SUBDIR}"
  module update
fi
unset original_MODULEPATH
@EESSI EESSI deleted a comment Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant