This section contains a number of smaller topics with links and examples meant to provide relatively concrete answers for specific tool development scenarios.
Galaxy's concept of data tables are meant to provide tools with access reference datasets or index data not tied to particular histories or users. A common example would be FASTA files for various genomes or mapper-specific indices of those files (e.g. a BWA index for the hg19 genome).
Galaxy data managers are specialized tools designed to populate tool data tables.
In the absence of an obvious DOI, tools may contain embedded BibTeX directly.
Futher reading:
- bibtex.xml (test tool with a bunch of random examples)
- bwa-mem.xml (BWA-MEM tool by Anton Nekrutenko demonstrating citation of an arXiv article)
- macros.xml (Macros for vcflib tool demonstrating citing a github repository)
Galaxy tools can be decorated to with container
tags indicated Docker
container ids that the tools can run inside of.
The longer term plan for the Tool Shed ecosystem is to be able to automatically build Docker containers for tool dependency descriptions and thereby obtain this Docker functionality for free and in a way that is completely backward compatible with non-Docker deployments.
Further reading:
- Complete tutorial on Github by Aaron Petkau. Covers installing Docker, building a Dockerfile, publishing to Docker Hub, annotating tools and configuring Galaxy.
- Another tutorial from the Galaxy User Group Grand Ouest.
- Landing page on the Galaxy Wiki
- Impementation details on Pull Request #401
Tool parameters support a validator
element (syntax)
to perform validation of a single parameter. More complex validation across
parameters can be performed using arbitrary Python functions using the
code
file syntax but this feature should be used sparingly.
Further reading:
- validator XML tag syntax on the Galaxy wiki.
- fastq_filter.xml (a FASTQ filtering tool demonstrating validator constructs)
- gffread.xml
(a tool by Jim Johnson demonstrating using regular expressions with
validator
tags) - code_file.xml, code_file.py (test files demonstrating defining a simple constraint in Python across two parameters)
- deseq2 tool
by Björn Grüning demonstrating advanced
code
file validation.
Input data parameters may specify multiple formats. For example
<param name="input" type="data" format="fastq,fasta" label="Input" />
If the command-line under construction doesn't require changes based
on the input type - this may just be referenced as $input
. However, if the
command-line under construction uses different argument names depending on
type for instance - it becomes important to dispatch on the underlying type.
In this example $input.ext
- would return the short code for the actual
datatype of the input supplied - for instance the string fasta
or
fastqsanger
would be valid responses for inputs to this parameter for the
above definition.
While .ext
may sometimes be useful - there are many cases where it is
inappropriate because of subtypes - checking if .ext
is equal to fastq
in the above example would not catch fastqsanger
inputs for instance. To
check if an input matches a type or any subtype thereof - the is_of_type
method can be used. For instance
$input.is_of_type('fastq')
would check if the input is of type fastq
or any derivative types such as
fastqsanger
.
If the output format of a tool's output cannot be known ahead of time,
Galaxy can be instructed to "sniff" the output and determine the data type
using the same method used for uploads. Adding the auto_format="true"
attribute to a tool's output enables this.
<output name="out1" auto_format="true" label="Auto Output" />
The variable $__user_email__
(as well as $__user_name__
and
$__user_id__
) is available when building up your command in
the tool's <command>
block. The following tool demonstrates the use of
this and a few other special parameters available to all tools.
To write tests that supply multiple values to a multiple="true"
select
or data
parameter - simply specify the multiple values as a comma seperated list.
Here are examples of each:
Here are some examples of testing tools that consume collections with type="data_collection"
parameters.
Here are some examples of testing tools that produce collections with output_collection
elements.
- collection_creates_list.xml
- collection_creates_list_2.xml
- collection_creates_pair.xml
- collection_creates_pair_from_type.xml
Tools which dynamically discover datasets
after the job is complete, either using the <discovered_datasets>
element,
the older default pattern approach (e.g. finding files with names like
primary_DATASET_ID_sample1_true_bam_hg18
), or the undocumented
galaxy.json
approach can be tested by placing discovered_dataset
elements beneath the corresponding output
element with the designation
corresponding to the file to test.
<test>
<param name="input" value="7" />
<output name="report" file="example_output.html">
<discovered_dataset designation="world1" file="world1.txt" />
<discovered_dataset designation="world2">
<assert_contents>
<has_line line="World Contents" />
</assert_contents>
</discovered_dataset>
</output>
</test>
The test examples distributed with Galaxy demonstrating dynamic discovery and the testing thereof include:
Tools which consume Galaxy composite datatypes can
generate test inputs using the composite_data
element demonstrated by the
following tool.
Tools which produce Galaxy composite datatypes can
specify tests for the individual output files using the extra_files
element
demonstrated by the following tool.
There is an idiom to supply test data for index during tests using Planemo.
To create this kind of test, one needs to provide a
tool_data_table_conf.xml.test
beside your tool's
tool_data_table_conf.xml.sample
file that specifies paths to test .loc
files which in turn define paths to the test index data. Both the .loc
files and the tool_data_table_conf.xml.test
can use the value
${__HERE__}
which will be replaced with the path to the directory the file
lives in. This allows using relative-like paths in these files which is needed
for portable tests.
An example commit demonstrating the application of this approach to a Picard tool can be found here.
These tests can then be run with the Planemo test command.
Warning
This idiom does not work with the Tool Shed test automated framework at this time and so these tests will largely only pass with Planemo.
A test
element can check the exit code of the underlying job using the
check_exit_code="n"
attribute.
Normally, all tool test cases described by a test
element are expected to
pass - but on can assert a job should fail by adding expect_failure="true"
to the test
element.
If your tool contains filter
elements, you can't verify properties of outputs
that are filtered out and do not exist. The test
element may contain an
expect_num_outputs
attribute to specify the expected number of outputs, this
can be used to verify that outputs not listed are expected to be filtered out during
tool execution.
Output metadata can be checked using metadata
elements in the XML
description of the output
.
Do not use planemo, Galaxy should be used to test its tools directly. The following two commands can be used to test Galaxy tools in an existing instance.
$ sh run_tests.sh --report_file tool_tests_shed.html --installed
This above command specifies the --installed
flag when calling
run_tests.sh
, this tells the test framework to test Tool Shed installed
tools and only those tools.
$ GALAXY_TEST_TOOL_CONF=config/tool_conf.xml sh run_tests.sh --report_file tool_tests_tool_conf.html functional.test_toolbox
The second command sets GALAXY_TEST_TOOL_CONF
environment variable, which
will restrict the testing framework to considering a single tool conf file
(such as the default tools that ship with Galaxy
config/tool_conf.xml.sample
and which must have their dependencies setup
manually). The last argument to run_tests.sh
, functional.test_toolbox
tells the test framework to run all the tool tests in the configured tool conf
file.
Note
Tip: To speed up tests you can use a pre-migrated database file the way Planemo
does by setting the following environment variable before running
run_tests.sh
.
$ export GALAXY_TEST_DB_TEMPLATE="https://github.com/jmchilton/galaxy-downloads/raw/master/db_gx_rev_0127.sqlite"