-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teething issue with EESSI module file #694
Comments
That's maybe as expected, but it makes finding a particular version of a particular package a bit more complicated |
I think that's the consequence of https://github.com/EESSI/software-layer/blob/2023.06-software.eessi.io/init/modules/EESSI/2023.06.lua#L62 (but I agree that it's annoying...). |
Hmm, this is tricky to solve. We want dynamic cache support but we don't want Lmod to try to update the existing cache. Is there a way to do this with the time limits on cache files? |
Maybe dynamic cache support "just works", I'll give it a try |
No, I tried |
What if we create a spider cache for all the But then Lmod has to be configured to know where to find it, I guess... |
Not sure that is what we want, wouldn't Lmod report the possibilities for every architecture then? We may have to craft an overall solution with the help of the Lmod BDFL |
Let's tag him then: @rtmclay |
With our modulefiles sitting on SSD, we are moving away from having system spider cache files. I don't know how well the caching works with CVMFS. So maybe you don't need system spider caches at all. This is something you guys need to check. If you do need spider cache files then I would recommend having a spider cache file for each arch. You might also be able to use the ideas in https://lmod.readthedocs.io/en/latest/350_community.html to handle the various arch's. |
We do indeed have a spider cache per architecture, and we protect these paths with if ( mode() ~= "spider" ) then
prepend_path("MODULEPATH", eessi_module_path)
end
-- add our spider cache
prepend_path("LMOD_RC", pathJoin(eessi_software_path, "/.lmod/lmodrc.lua")) The problem is that if you do a EDIT ocaisa@LAPTOP-O6HF2IKC:~$ module purge
ocaisa@LAPTOP-O6HF2IKC:~$ module spider GROMACS
Lmod has detected the following error: Unable to find: "GROMACS".
ocaisa@LAPTOP-O6HF2IKC:~$ export LMOD_RC=/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/skylake_avx512/.lmod/lmodrc.lua
ocaisa@LAPTOP-O6HF2IKC:~$ module spider GROMACS
Lmod has detected the following error: Unable to find: "GROMACS".
ocaisa@LAPTOP-O6HF2IKC:~$ module load EESSI
EESSI/2023.06 loaded successfully
ocaisa@LAPTOP-O6HF2IKC:~$ module spider GROMACS
------------------------------------
GROMACS: GROMACS/2024.1-foss-2023b
------------------------------------
|
Is it that, in our setup with a variable EDIT: That didn't work for me either |
Lmod only reports modules that are in the modulepath or are can be found via walking the tree. This code:
prevents Lmod from knowing about It seems to me that you hide modules or you show them. Can you explain exactly what you want with a simple module tree? If you are going to compute spider caches, why not provide them all the time? |
Ok, I think I see the issue now. In our case the spider cache and the module path are architecture dependent (so both could be considered to need To test this, I split the module file in two, the first does everything except add to the We could make a big fat spider cache which would remove the need for the EDIT: This is not entirely correct as my session was messed up a little by me tweaking the module file on the fly. The EESSI module file is now: -- Load all the EESSI environment settings from the matching base
-- module file (which is hidden), including identifying the
-- architecture and setting the appropriate Lmod spider cache to use
always_load(pathJoin('base', '.' .. myModuleVersion()))
-- Add the modulepaths we want
prepend_path("MODULEPATH", os.getenv("EESSI_MODULEPATH"))
prepend_path("MODULEPATH", os.getenv("EESSI_SITE_MODULEPATH"))
haveDynamicMPATH()
if mode() == "load" then
LmodMessage("EESSI/" .. myModuleVersion() .. " loaded successfully")
end with the general setup being ocaisa@LAPTOP-O6HF2IKC:~$ module --show-hidden avail
--------------- /home/ocaisa/software-layer/init/modules ---------------
base/.2023.06 (H) EESSI/2023.06
Where:
H: Hidden Module Trying to search within this context I get: ocaisa@LAPTOP-O6HF2IKC:~$ module spider GROMACS
Lmod has detected the following error: Unable to find:
"GROMACS".
ocaisa@LAPTOP-O6HF2IKC:~$ module load base/.2023.06
ocaisa@LAPTOP-O6HF2IKC:~$ module spider GROMACS
--------------------------------------------------------------------
GROMACS: GROMACS/2024.1-foss-2023b
--------------------------------------------------------------------
Description:
... so it seems the cache file must visible to Lmod when the |
I was on a system where TCL modules was available, but I needed Lmod so I tried to initialise it. This lead to issues when running scripts (as their module tool was initialising itself on top of Lmod in the subshell) so I tried to create a bash function that can be called within scripts to check EESSI and Lmod are available. When testing this on my local machine (which has Lmod), I saw: alanc@~$ type module
module is a function
module ()
{
local __lmod_my_status;
local __lmod_sh_dbg;
if [ -z "${LMOD_SH_DBG_ON+x}" ]; then
case "$-" in
*v*x*)
__lmod_sh_dbg='vx'
;;
*v*)
__lmod_sh_dbg='v'
;;
*x*)
__lmod_sh_dbg='x'
;;
esac;
fi;
if [ -n "${__lmod_sh_dbg:-}" ]; then
set +$__lmod_sh_dbg;
echo "Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for Lmod's output" 1>&2;
fi;
eval "$($LMOD_CMD bash "$@")" && eval $(${LMOD_SETTARG_CMD:-:} -s sh);
__lmod_my_status=$?;
if [ -n "${__lmod_sh_dbg:-}" ]; then
echo "Shell debugging restarted" 1>&2;
set -$__lmod_sh_dbg;
fi;
return $__lmod_my_status
}
alanc@~$ echo $LMOD_CMD
/usr/share/lmod/lmod/libexec/lmod
alanc@~$ source /cvmfs/software.eessi.io/versions/2023.06/init/lmod/bash
Lmod has detected the following error: The following module(s) are unknown: "EESSI/2023.06"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "EESSI/2023.06"
Also make sure that all modulefiles written in TCL start with the string #%Module
alanc@~$ module av
---------------------------------------------------------- /cvmfs/software.eessi.io/versions/2023.06/init/modules ----------------------------------------------------------
EESSI/2023.06
If the avail list is too long consider trying:
"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". and doing a alanc@~$ module reset
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: /cvmfs/software.eessi.io/versions/2023.06/init/modules
Lmod has detected the following error: The following module(s) are unknown: "EESSI/2023.06"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "EESSI/2023.06"
Also make sure that all modulefiles written in TCL start with the string #%Module but I don't understand why this happening. We must be missing something in the configuration of Lmod? @MaKaNu ? |
My local test VM does not have lmod locally installed, but I registered the following behavior: $ source /cvmfs/software.eessi.io/versions/2023.06/init/lmod/bash
EESSI/2023.06 loaded successfully Okay so far so good. ------------------------------------------------------------ /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/modules/all -------------------------------------------------------------
Abseil/20230125.2-GCCcore-12.2.0 HMMER/3.4-gompi-2023a OpenEXR/3.2.0-GCCcore-13.2.0 (D)
Abseil/20230125.3-GCCcore-12.3.0 (D) HPL/2.3-foss-2023b OpenFOAM/v2312-foss-2023a
.
.
.
------------------------------------------------------------------------------ /cvmfs/software.eessi.io/versions/2023.06/init/modules ------------------------------------------------------------------------------
EESSI/2023.06 (L) If I now try to reset: $ ml reset
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None
EESSI/2023.06 loaded successfully So here it seems that our EESSI module loaded again and is not removed from $MODULEPATH. I am not sure if this is the behavior we had intended. On the other hand @ocaisa Is it enough to install lmod locally or do I also need TCL modules to reproduce your behavior? |
I believe a local Lmod installation is enough to test this. I do have something in my
but I don't see how that would get triggered |
I think the key is |
This might be the reason why module EESSI/2023.06 is already available? In my scenario I just sourced the lmod init same like you did in your So I checked what happens if I now load the EESSI module and repeat the your steps again: same result as before. I come to the same conclusion, 'Resetting modules to system default' is the difficult part. What might be a solution:
While writing I realized we set our MODULEPATH and do append. Maybe this is also an issue |
Do we really need a default set I wonder if it is going to be problematic? In the initialisation scripts we can just explicitly load the module |
So you mean instead we should skip the LMOD init? Yes, I think so. |
I mocked something up for # Purge any modules before we start
if type module &> /dev/null; then
module purge
fi
# Choose an EESSI version
EESSI_VERSION="${EESSI_VERSION:-2023.06}"
# Initialise Lmod for the shell
. /cvmfs/software.eessi.io/versions/"$EESSI_VERSION"/compat/linux/$(uname -m)/usr/share/Lmod/init/bash
# If an environment exists, let's not mix
if [ ! -z "${MODULEPATH}" ]; then
module unuse "${MODULEPATH}"
fi
# Path to top-level module tree
module use /cvmfs/software.eessi.io/versions/"$EESSI_VERSION"/init/modules
module load EESSI/$EESSI_VERSION |
Seems fine for unloading all existing loaded modules.
Not mixing sounds good, but this does not allow for resetting to what ever the user had active before. If this is fine for the moment we could address it first and find a solution for resetting to Default in a later approach. Further, we might want to advance our test cases. |
Also, somehow logical but surprised me: $ module reset
The system default contains no modules
(env var: LMOD_SYSTEM_DEFAULT_MODULES is empty)
No changes in loaded modules This also means EESSI/2023.06 will not be unloaded. |
So, it seems that Lmod can't search inside EESSI until the module is loaded:
The text was updated successfully, but these errors were encountered: