Phylogenetic Assignment of Named Global Outbreak LINeages
Pangolin runs on MacOS and Linux. The conda environment recipe may not build on Windows (I haven't tested it) but can be run using the Windows subsystem for Linux.
- Some version of conda, we use Miniconda3. Can be downloaded from here
- Your query fasta file
- Clone this repository and
cd pangolin
conda env create -f environment.yml
python setup.py install
orpip install .
- That's it
- Activate the environment
conda activate pangolin
- Run
pangolin <query>
pangolin: Phylogenetic Assignment of Named Global Outbreak LINeages
usage: pangolin <query> [options]
optional arguments:
-h, --help show this help message and exit
-o OUTDIR, --outdir OUTDIR
-n, --dry-run
-f, --force
-t THREADS, --threads THREADS
-u, --unlock
Your output will be a csv file with taxon name and lineage assigned, one line corresponding to each sequence in the fasta file provided
Example:
Taxon | Lineage | UFbootstrap |
---|---|---|
Virus1 | B.1 | 82 |
Virus2 | A.1 | 95 |
Virus3 | A.3 | 100 |
Virus4 | B.1.4 | 73 |
Pangolin was created by Áine O'Toole and JT McCrone. It uses lineages from Rambaut, Holmes & Pybus.
The following external software is run as part of pangolin:
L.-T. Nguyen, H.A. Schmidt, A. von Haeseler, B.Q. Minh (2015) IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.. Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300
D.T. Hoang, O. Chernomor, A. von Haeseler, B.Q. Minh, L.S. Vinh (2018) UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol., 35:518–522. https://doi.org/10.1093/molbev/msx281
Katoh, Standley 2013 (Molecular Biology and Evolution 30:772-780) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. (outlines version 7)
Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.