Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The auto_install option on the mulled_singularity option is ignored #15717

Open
natefoo opened this issue Mar 6, 2023 · 14 comments
Open

The auto_install option on the mulled_singularity option is ignored #15717

natefoo opened this issue Mar 6, 2023 · 14 comments

Comments

@natefoo
Copy link
Member

natefoo commented Mar 6, 2023

Describe the bug
Or I misunderstand its purpose. My understanding would be that it prevents image pulling and conversion at tool runtime, meaning that it only occurs when initiated by an admin.

Galaxy Version and/or server at which you observed the bug
Galaxy Version: 23.0.rc1
Commit: 4f52b74

To Reproduce
Steps to reproduce the behavior:

  1. Set container resolvers:
    galaxy:
      container_resolvers:
        - type: mulled_singularity
          auto_install: false
  2. Enable a Singularity destination in the job conf
  3. Run a tool with an existing biocontainer matching its requirements

Expected behavior
Images are not pulled/built at tool runtime.

Additional context
I verified that auto_install is indeed set to False in the resolver init, I just don't think it's actually used as a conditional for the pull.

@mvdbeek
Copy link
Member

mvdbeek commented Mar 6, 2023

Images are not pulled/built at tool runtime.

I don't think that's possible unless you turn off the (non-cached) mulled resolver. auto_install determines if the resolver pulls in the image whenever it is used.

@natefoo
Copy link
Member Author

natefoo commented Mar 6, 2023

I don't think that's possible unless you turn off the (non-cached) mulled resolver.

The cached mulled resolver can't pull, though, so without a non-cached mulled resolver in the list, you would have to manually (as in on the command line, not clicking around in Manage Dependencies) install images.

auto_install determines if the resolver pulls in the image whenever it is used.

Maybe I am still misunderstanding, but this is what I want it to do, except it doesn't.

@mvdbeek
Copy link
Member

mvdbeek commented Mar 6, 2023

I don't think you want to use that auto_install="true" in production, it will pull in the job handler process.

My understanding would be that it prevents image pulling and conversion at tool runtime, meaning that it only occurs when initiated by an admin.

maybe we need to be clear on that, when you say tool runtime, do you mean in the job handler, or on the compute node ?

@natefoo
Copy link
Member Author

natefoo commented Mar 6, 2023

I don't think you want to use that auto_install="true" in production, it will pull in the job handler process.

Agreed, disabling it is exactly what I'm trying to do.

maybe we need to be clear on that, when you say tool runtime, do you mean in the job handler, or on the compute node ?

I mean in the job handler. In my testing, neither value of auto_install has any effect: the image is always pulled in the job handler regardless. I don't want it pulled in the handler or on the node, but as far as I can tell, it would never get to the point of pulling on the node anyway, since it's always pulled in the handler.

@bernt-matthias
Copy link
Contributor

xref #11675 (most likely not fix and wrong at all)

will try to drop a few lines tomorrow

Most important question to me is what the intention of auto_install was. I could not find out after digging 2 weeks in the sources of the container resolvers.

@natefoo
Copy link
Member Author

natefoo commented Mar 7, 2023

Most important question to me is what the intention of auto_install was. I could not find out after digging 2 weeks in the sources of the container resolvers.

Per the commit message on 8a34101:

Introduce auto_install ContainerResolver flag

If set to True (thats's the default), the container
resolvers that can pull or build images will do that,
if they are enabled.
You can set this to False and instead pass through
the DependencyManager to use these resolvers to pull
or build images.

Seems like it means that if set to false and the image is not already present, that resolver should be skipped and eventually if all the configured container resolvers are misses, resolution falls through to the dependency (non-container) resolvers, but @mvdbeek maybe I am still missing the intent here?

@mvdbeek
Copy link
Member

mvdbeek commented Mar 7, 2023

In my testing, neither value of auto_install has any effect: the image is always pulled in the job handler regardless.

With the caveat that it's been a while since I last used singularity: that was the whole point of adding auto_install: false and it did work for me.

Where we prevent the pulling on auto_install="false" is here: https://github.com/mvdbeek/galaxy/blob/6f945c1ccb659a85d453332a67b0746afe4d73a5/lib/galaxy/tool_util/deps/container_resolvers/mulled.py#L574

@mvdbeek
Copy link
Member

mvdbeek commented Mar 7, 2023

I did some archeology and this used to work:

<containers_resolvers>
  <explicit_singularity />
  <cached_mulled_singularity />
  <mulled_singularity auto_install="False"/>
 <build_mulled_singularity auto_install="False"/>
</containers_resolvers>

it did not pull in the job handler, and the only way to install images was through the UI or API.

@mvdbeek
Copy link
Member

mvdbeek commented Mar 7, 2023

Here's some confirmation on how it used to work: #7125 (comment)

@mvdbeek
Copy link
Member

mvdbeek commented Mar 7, 2023

https://github.com/mvdbeek/galaxy/blob/6f945c1ccb659a85d453332a67b0746afe4d73a5/lib/galaxy/jobs/rule_helper.py#L54 seems problematic, as does this: https://github.com/mvdbeek/galaxy/blob/6f945c1ccb659a85d453332a67b0746afe4d73a5/lib/galaxy/tool_util/deps/containers.py#L149 since the default is install=True (which we kept for legacy reasons).

These were retrofitted and not part of the original concept. If you do change this you want to be really careful because I think a few sites rely on this rather bad default.

@bernt-matthias
Copy link
Contributor

https://github.com/mvdbeek/galaxy/blob/6f945c1ccb659a85d453332a67b0746afe4d73a5/lib/galaxy/jobs/rule_helper.py#L54 seems problematic, as does this: https://github.com/mvdbeek/galaxy/blob/6f945c1ccb659a85d453332a67b0746afe4d73a5/lib/galaxy/tool_util/deps/containers.py#L149 since the default is install=True (which we kept for legacy reasons).

That was also my guess. But I did not want to suggest this before having better test coverage. Maybe call find_best_container_description with install set to a config property analogous to conda_auto_install? .. But also consider that docker run will cache the images implicitly during the tool execution (if execution happens locally).

These were retrofitted and not part of the original concept. If you do change this you want to be really careful because I think a few sites rely on this rather bad default.

Indeed. But on first thought a config variable seems to be a good idea.

Where we prevent the pulling on auto_install="false" is here:

That's still a point that I do not get. The if on the line that you mention only depends on install and cached_container_description (i.e. if the image has been pulled before). The actual pull happens in this very if branch on line 592.

This is, the image will be pulled regardless of the value of auto_install.

All that auto_install is doing at the moment is to decide if the location of the cached image is returned (auto_install="true") or the quay.io "URI" (auto_install="false"). Note that this makes no difference for docker, but only for singularity.

I have verified this in the "unit tests" (which should probably be implemented as integration test?) added here.

At the moment there seem to be three variables influencing pulling/building of the container

  • the class "property" builds_on_resolution which seems to be true for the build_* resolvers
  • auto_install (which apparently does something else in the mulled resolvers) and is used differently in the build* resolvers which is probably fine due to builds_on_resolution
  • install which seems to be always true when called during job preparation and when building the image via the admin UI

In the end I think that auto_install can not work as long as the code can differentiate if it is currently called during job preparation or if building/pulling is triggered via the admin UI / API. Currently we only have the boolean install which does not allow for this.

But (obviously) I have not fully understood the interplay of these three variables and how they are set in the different situations. I'm currently working on a better tests coverage (primarily to understand what is going on).

Also note: I think due to the differences between docker and singularity there will still be an inconsistency between mulled and mulled_singularity since for docker the container will be cached anyway (because the docker run executed in the tool/job script will add it to dockers list of cached images (on the compute node) anyway). Except if we make the docker container resolvers store/restore tar files via docker save/load.

I did some archeology and this used to work:

This config indeed "works", i.e. it will allow tools to execute. But it will also pull the images (regardless of auto_install). During the first resolving mulled_singularity will be successful (it will pull the container and return a container description pointing to the image file in the cache directory). In subsequent calls the cached_mulled_singularity resolver will be successful and return the path to the image in the cache folder.

The only difference that would happen if auto_install="false" would be used is that during the first call the returned container description would not point to the cached image but to the docker:// URI (but the image would still be pulled). For subsequent executions also the cached_mulled_singularity would be successful.

Btw. I would appreciate a review of the documentation (and typing) part in #15614. I'm already quite happy with it and currently work on the tests (input also very welcome).

@bernt-matthias
Copy link
Contributor

Hi @natefoo.

I have an implementation ready that adds a config variable container_auto_install (analogous to conda_auto_install) that triggers setting the install parameter of the function mentioned by @mvdbeek. This would totally ignore the symptom that you described here (that the resolvers auto_install is non / dysfunctional .. I'm not sure if it can be solved at all .. at the moment I think it would be the best to remove it completely) but it would solve the problem for the singularity containers.

It's tested and "works". The problem is that it can only work for singularity, since for docker we always have a docker pull in the job script ... Somehow I do not like the idea to have a config that only works for some of the resolvers...

Now I had an idea that might work. I would define globally

- type: cached_mulled_singularity
- type: mulled_singularity

and for my compute destinations/environments only

- type: cached_mulled_singularity

Thus during job processing only already cached images can be used.

Unfortunately the explicit resolvers work completely different than the mulled resolvers:

  • cached_mulled only looks up the cache and mulled creates the cache entry.
  • cached_explicit does both.

But we could easily add one that only looks up the cache .. or try to make it more consistent.

@bernt-matthias
Copy link
Contributor

I added the possible fix as c691d23 temporarily to #15614

@bernt-matthias
Copy link
Contributor

Now in topic/container-auto-install-fix in my fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants