This example shows a pipeline that is made of two processes. The first process receives a
FASTA formatted file and splits it into file chunks whose names start with
the prefix seq_
.
The process that follows, receives these files and it simply reverses their content by using the rev
command line tool.
In more detail:
- line 1: The script starts with a shebang declaration. This allows you to launch your pipeline, as any other Bash script
- line 3: Declares a pipeline parameter named
params.in
that is initialized with the value$HOME/sample.fa
.This value can be overridden when launching the pipeline, by simply adding the option--in <value>
to the script command line - line 5: Defines a variable
sequences
holding a reference for the file whose name is specified by theparams.in
parameter - line 6: Defines a variable
SPLIT
whose value isgcsplit
when the script is executed on a Mac OSX orcsplit
when it runs on Linux. This is the name of the tool that is used to split the file. - lines 8-20: The process that splits the provided file.
- line 10: Opens the input declaration block. The lines following this clause are interpreted as input definitions.
- line 11: Defines the process input file. This file is received from the variable
sequences
and will be namedinput.fa
. - line 13: Opens the output declaration block. Lines following this clause are interpreted as output definitions.
- line 14: Defines that the process outputs files whose names match the pattern
seq_*
. These files are sent over the channelrecords
. - lines 16-18: The actual script executed by the process to split the provided file.
- lines 22-33: Defines the second process, that receives the splits produced by the previous process and reverses their content.
- line 24: Opens the input declaration block. Lines following this clause are interpreted as input definitions.
- line 25: Defines the process input file. This file is received through the channel
records
. - line 27: Opens the output declaration block. Lines following this clause are interpreted as output definitions.
- line 28: The standard output of the executed script is declared as the process output. This output is sent over the
channel
result
. - lines 30-32: The actual script executed by the process to reverse the content of the received files.
- line 35: Prints a result each time a new item is received on the
result
channel.
Tip
The above example can manage only a single file at a time. If you want to execute it for two (or more) different files you will need to launch it several times.
It is possible to modify it in such a way that it can handle any number of input files, as shown below.
In order to make the above script able to handle any number of files simply replace line 3 with the following line:
sequences = Channel.fromPath(params.in)
By doing this the sequences
variable is assigned to the channel created by the :ref:`channel-path` method. This
channel emits all the files that match the pattern specified by the parameter params.in
.
Given that you saved the script to a file named example.nf
and you have a list of FASTA files in a folder
named dataset/
, you can execute it by entering this command:
nextflow example.nf --in 'dataset/*.fa'
Warning
Make sure you enclose the dataset/*.fa
parameter value in single-quotation characters,
otherwise the Bash environment will expand the *
symbol to the actual file names and the example won't work.
You can find at this link a collection of examples introducing Nextflow scripting.
Check also Awesome Nextflow for a list of pipelines developed by the Nextflow community.