Skip to content

Software package for assigning SARS-CoV-2 genome sequences to global lineages.

License

Notifications You must be signed in to change notification settings

theboocock/pangolin

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pangolin

Phylogenetic Assignment of Named Global Outbreak LINeages

Requirements

Pangolin runs on MacOS and Linux. The conda environment recipe may not build on Windows (I haven't tested it) but can be run using the Windows subsystem for Linux.

  1. Some version of conda, we use Miniconda3. Can be downloaded from here
  2. Your query fasta file

Install pangolin

  1. Clone this repository and cd pangolin
  2. conda env create -f environment.yml
  3. python setup.py install or pip install .
  4. That's it

Usage

  1. Activate the environment conda activate pangolin
  2. Run pangolin <query>
pangolin: Phylogenetic Assignment of Named Global Outbreak LINeages

positional arguments:
  query

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        Output directory
  -d DATA, --data DATA  Data directory minimally containing a fasta alignment
                        and guide tree
  -n, --dry-run         Go through the motions but don't actually run
  -f, --force           Overwrite all output
  -t THREADS, --threads THREADS
                        Number of threads
  -v, --version         show program's version number and exit

Output

Your output will be a csv file with taxon name and lineage assigned, one line corresponding to each sequence in the fasta file provided

Example:

Taxon Lineage aLRT UFbootstrap
Virus1 B.1 80 82
Virus2 A.1 65 95
Virus3 A.3 100 100
Virus4 B.1.4 82 73

Resources for interpreting the aLRT and UFbootstrap output can be found here and here.

Authors

Pangolin was created by Áine O'Toole and JT McCrone. It uses lineages from Rambaut et al..

References

The following external software is run as part of pangolin:

iqtree

L.-T. Nguyen, H.A. Schmidt, A. von Haeseler, B.Q. Minh (2015) IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.. Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300

D.T. Hoang, O. Chernomor, A. von Haeseler, B.Q. Minh, L.S. Vinh (2018) UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol., 35:518–522. https://doi.org/10.1093/molbev/msx281

mafft

Katoh, Standley 2013 (Molecular Biology and Evolution 30:772-780) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. (outlines version 7)

snakemake

Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.

About

Software package for assigning SARS-CoV-2 genome sequences to global lineages.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%