Process prolog and epilog #540

pditommaso · 2017-11-30T17:02:06Z

The goal of this enhancement is to add two new process directives, namely: prolog and epilog that would make it possible to add to process task a common script prefix and suffix.

These are different from beforeScript and afterScript directives that are designed mainly for custom task configuration and, above all, are execute in the task wrapper context, that can be a separated environment from the task execution one (in particular when using containers).

Instead prolog content should be prepended to the user script just after the shebang declaration and epilog should appended to the user task script.

The text was updated successfully, but these errors were encountered:

pditommaso · 2017-12-07T08:53:59Z

This can be a little harder than expect. The basic idea was to prepend/append the user script with the prolog and epilog snippets when defined. However, provided the prolog and epilog should be BASH code, that would corrupt non-BASH user commands.

It cannot either included in the command launcher ie. .command.run, because that would not allow to execute it when the user command is executed in a container image.

ODiogoSilva · 2018-02-16T14:55:48Z

It cannot either included in the command launcher ie. .command.run, because that would not allow to execute it when the user command is executed in a container image.

This is due to the issue of not adding the bin directory to the path when using a container image? Is there any particular reason to prevent both from being using in a process?

In my case, I need to execute some bash scripts (which are in the bin directory) in the work directory before (and after) some processes (I originally though that the epilog and prolog were meant to allow the execution of such scripts, which wouldn't have impact on the user commands/templates). This is regardless of the processes running in a container image or not. In fact, I made a small tweak in nextflow to allow the usage of both bin and container images in the same process and it has been working fine. Maybe I'm missing some use cases where they would conflict somehow?

pditommaso · 2018-02-16T15:17:07Z

No, this happens because it's the .command.run that launch the container, therefore it cannot execute the prolog and the epilog.

One solution, that I would like to avoid it to use an intermediate wrapper that contains the prolog, run the command and finally executes the epilog

A better alternative, could be to embed the user prolog and epilog in a couple of variables in the .command.run via heredoc declaration, then append the container execution command line, but there could be tricky side effects on special characters expansion or in the max length of the command line.

stale · 2020-04-27T05:50:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ewels · 2020-04-27T08:50:06Z

Bump

abhi18av · 2021-05-17T11:47:01Z

With the usage of this functionality coming up in some nf-core workflows (see discussion here) as well as independent users, perhaps this might be an useful feature to implement in NF.

Building upon the idea shared earlier in the thread, I'm adding relevant doc links for reference https://tldp.org/LDP/abs/html/here-docs.html

ewels · 2021-05-17T12:00:03Z

Regarding the nf-core tool version calls, see this related issue: #879

Thinking about it now, it might be good to have both prolog/epilog and versionCmd (presumably working in the same way). This is because in nf-core pipelines we will presumably set epilog for every single process, meaning that if an end user wants to run something custom then they will not be able to without overwriting that and losing the version information.

Also need to think about how staging files in and out will work. For example, with the above scenario, overwriting epilog would break everything as the processes will be expecting the version call output file as a channel output.

If prolog and epilog are not meant for use by the end user and only the pipeline developer then neither of these are particularly problematic. But in that case, maybe they don't add so much functionality either (as they can be part of the script block as we currently do).

Final point: I would personally spell it epilogue 😉

pditommaso · 2021-05-17T12:01:10Z

It's a programming lang not Shakespeare 😆

ewels · 2021-05-17T12:08:45Z

It's a programming lang not Shakespeare 😆

haha 😆 Ok if it's a well known programming term that's fine. I mis-typed it about 4 times in that past issue before googling it to check I wasn't going insane. I haven't come across it in code before. epilog is good though 👍🏻

abhi18av · 2021-05-17T12:34:34Z

Also need to think about how staging files in and out will work. For example, with the above scenario, overwriting epilog would break everything as the processes will be expecting the version call output file as a channel output.

Perhaps the issue regarding the version info file could be solved on the Tower side (Tower report specific channels have been discussed elsewhere), since it'd be essentially a file containing the version string and then this could be displayed/gathered into a report via Tower.

Thoughts?

NOTE: Just wanted to confirm that we all agree this the epilog and prolog aren't supposed to be used for heavy processing i.e. they don't have the same use-case as script/shell/exec. This might create confusing usage patterns.

ewels · 2021-05-17T12:57:52Z

Tower is nice, but we want the version numbers even when people are running without Tower 😉 I view it as a fairly essential output from the pipeline.

pditommaso · 2021-05-17T13:00:36Z

Yeah, we discussed ages ago .. 😬 #879

kemin711 · 2023-08-18T05:11:42Z

Maybe related to prelog epilog concept. I have been thinking about one case: fileA -> fileB; fileB->fileC (fileC will be used for down stream process); fileB--compress-->fileB.gz, will not be used for any process, but want to publish it for later usage. In order to make my workflow faster, I don't want to wait for the compression to finish before lettig the fileC to be used for downstream processes. My question, is the current workflow language taking advantage of this? Do the donwstream process wait for the complete completion before its output being used by downstream processes? Or the result can be used as soon as the file is finished (this may not be possible because you have to wait for the completion of all command in the scripts section to finish). Might be beneficial to make the process separate the script section into the current scripts, and another section background_scripts.

bentsherman · 2023-08-18T12:38:57Z

In that case, you should define two separate processes, one to produce fileC and another to compress fileC. Any downstream processes can depend only on the first process while the second process would run "in the background".

kemin711 · 2023-08-18T18:22:17Z

That's how my workflow is designed. Only in special cases, where my files are large. The processes are done on the scratch. To avoid copying files, I did the opposite of normally we do, merged several processes into one. This is very rare. And it is only for performance. Essentially, we have large gzped fastq files. So do some operation with them. one way is to use the compressed as input, you then decompress on the fly. This saves space. Another way, is to inflate the .gz files, first, then do operations on them this can speed up. but at the cost of more storage usage. If we move the operation to /scratch, then the IO problem can be resolved. We want the main logic to process using inflated files, at the end of this main process, the downstream can start working with the result files from these large file, mean time, we can keep on compressing these file with high compression level then store these compressed file for future use (not this pipeline). There might be a possibility to enable the script section to branch. Currently script section is an atomic operation.

rollf · 2024-05-30T10:03:40Z

I used the following approach for now:

process {
  ...
  script:
  def prolog = task.ext.prolog ?: ''
  ...
  """
  $prolog
  <actual script>
  """"
}

    withName: MY_PROCESS {
	ext.prolog = "export PATH=/some/other/location:\$PATH"
    }

And then I use nf-core modules patch for nf-core modules as necessary. My use case: Run a custom docker image but use predefined nf-core modules. The custom docker image needed some setup to be run upfront before the nf-core-based module would work (hence the PATH adaptation in ext.prolog above).

I'm in favor for nextflow-based solution.

kemin711 · 2024-05-30T22:47:37Z

thanks for the feedback.

ewels · 2024-06-19T14:42:23Z

@rollf note that 24.04 release included a new directive eval (see docs) which I think is very similar to the prolog suggested in this issue 🤔

That might be a cleaner solution for your use case?

rollf · 2024-06-20T04:15:21Z

@ewels Thanks for the hint. I do not understand the suggestion, though. eval would allow me to add further output (channels) to the process, however, my use case is that I want to arbitrarily modify the existing script that runs in the container. Possibly I'm completely wrong here but I don't see the connection between both. 🤷

(As a side remark, eval seems to be missing in the overview here.)

ewels · 2024-06-20T04:46:57Z

Yeah you're right. I was thinking that eval adds snippets of code to the start of the process, outside of the script block - which is kind of what you want. But I hadn't really thought it through - you're right that it doesn't make sense here as you'd still need to edit the process. Apologies.

pditommaso · 2024-06-20T06:46:37Z

Added the missing reference to eval in the output overview. Thanks for reporting it. 9709067

pditommaso mentioned this issue Dec 6, 2017

beforeScript executed before setting the environment file #546

Closed

pditommaso added the feature request label Apr 9, 2018

pditommaso mentioned this issue Oct 2, 2018

Add a directive that allows the fetching of tool version meta information #879

Closed

pditommaso added the priority label Jan 7, 2019

pditommaso mentioned this issue Mar 1, 2020

Execute beforeScript directive after setting the task environment. #1511

Closed

stale bot added the wontfix label Apr 27, 2020

pditommaso added stale and removed wontfix labels Apr 27, 2020

stale bot closed this as completed Jun 26, 2020

pditommaso reopened this May 17, 2021

stale bot removed the stale label May 17, 2021

pditommaso added the pinned label May 17, 2021

pditommaso mentioned this issue May 17, 2021

[Enhancement]: Allow the beforeScript and afterScript directives to be executed from inside the container #1814

Closed

Kibubu mentioned this issue Aug 31, 2022

Add before-afterScript warning regarding containers #3167

Merged

pditommaso removed the priority label Sep 22, 2023

pditommaso mentioned this issue Oct 31, 2023

Allow to add custom traces and use them as metadata #4425

Closed

bentsherman removed the feature label Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process prolog and epilog #540

Process prolog and epilog #540

pditommaso commented Nov 30, 2017

pditommaso commented Dec 7, 2017

ODiogoSilva commented Feb 16, 2018

pditommaso commented Feb 16, 2018

stale bot commented Apr 27, 2020

ewels commented Apr 27, 2020

abhi18av commented May 17, 2021 •

edited

Loading

ewels commented May 17, 2021

pditommaso commented May 17, 2021 •

edited

Loading

ewels commented May 17, 2021

abhi18av commented May 17, 2021 •

edited

Loading

ewels commented May 17, 2021

pditommaso commented May 17, 2021

kemin711 commented Aug 18, 2023

bentsherman commented Aug 18, 2023

kemin711 commented Aug 18, 2023 •

edited

Loading

rollf commented May 30, 2024

kemin711 commented May 30, 2024 via email •

edited by ewels

Loading

ewels commented Jun 19, 2024

rollf commented Jun 20, 2024

ewels commented Jun 20, 2024

pditommaso commented Jun 20, 2024

Process prolog and epilog #540

Process prolog and epilog #540

Comments

pditommaso commented Nov 30, 2017

pditommaso commented Dec 7, 2017

ODiogoSilva commented Feb 16, 2018

pditommaso commented Feb 16, 2018

stale bot commented Apr 27, 2020

ewels commented Apr 27, 2020

abhi18av commented May 17, 2021 • edited Loading

ewels commented May 17, 2021

pditommaso commented May 17, 2021 • edited Loading

ewels commented May 17, 2021

abhi18av commented May 17, 2021 • edited Loading

ewels commented May 17, 2021

pditommaso commented May 17, 2021

kemin711 commented Aug 18, 2023

bentsherman commented Aug 18, 2023

kemin711 commented Aug 18, 2023 • edited Loading

rollf commented May 30, 2024

kemin711 commented May 30, 2024 via email • edited by ewels Loading

ewels commented Jun 19, 2024

rollf commented Jun 20, 2024

ewels commented Jun 20, 2024

pditommaso commented Jun 20, 2024

abhi18av commented May 17, 2021 •

edited

Loading

pditommaso commented May 17, 2021 •

edited

Loading

abhi18av commented May 17, 2021 •

edited

Loading

kemin711 commented Aug 18, 2023 •

edited

Loading

kemin711 commented May 30, 2024 via email •

edited by ewels

Loading