blastp (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download)
diamond (https://github.com/bbuchfink/diamond)
hmmscan (http://hmmer.org/)
Perl (https://www.perl.org/get.html) (v5.30.0)
PAP pipeline it is written in Snakemake and Perl. For greater convenience/ease of installing PAP, we use the Apptainer/Singularity container platform and build an image with the complete environment (script and dependencies) needed to run PAP.
You just need to download the Singularity image PAP and have installed "Apptainer/Singularity". If you don't have it installed, you can install it:
with Conda
conda install -c conda-forge singularity
Alternatively, x86_64 RPMs are available on GitHub immediately after each Apptainer release and they can be installed directly from there:
with RPMs
sudo yum install -y https://github.com/apptainer/apptainer/releases/download/v1.1.3/apptainer-1.1.3-1.x86_64.rpm
with DEB
wget https://github.com/apptainer/apptainer/releases/download/v1.1.3/apptainer_1.1.3_amd64.deb
sudo apt-get install -y ./apptainer_1.1.3_amd64.deb
For more details of the Apptainer installation process, go here.
Make sure you have all dependencies and databases properly installed. You also need to download and have all the 'bin' scripts in your path.
You can check Snakemake on their site for more details of this.
PAP <protein.fasta>
notes:
1- You need to put "PAP" in your path, otherwise you must give the whole path so that it can be found.
2- The input [fasta](https://en.wikipedia.org/wiki/FASTA_format) file must exist in your $HOME, otherwise you need to set the environment variable SINGULARITY_BIND
to bind paths where your sequences are located
ex: export SINGULARITY_BIND="../path/for/the/input/fasta"
For "protein.faa" file name run:
snakemake --cores <thread_numbers> -s /path/of/Snakefile
If protein fasta files have other names, then run:
snakemake --cores <thread_numbers> --config PROTREF="current_protein_fasta_filename" -s /path/of/Snakefile
PROTREF= "protein.faa" # Fasta file of the reference proteins that we want to transfer or annotate in our genome. Default: "protein.faa"
A file in tsv format with the annotation of the proteins.
Estrada K, Verleyen J. PAP:Parallel Annotation Pipeline. 2021. [Computer software] https://doi.org/10.5281/zenodo.7958138
Dr. Karel Estrada; M.C Jerome Verleyen
Twitter: @kjestradag
PAP wouldn't be the same without advice and suggestions from Alejandro Sánchez.
PAP uses Snakemake for pipeline development, Blastp and HMMER to perform alignments and SignalP for signal peptide prediction. Additionally, PAP takes information from other databases such as GO and KEGG and incorporates it into the final annotation report.