Phylogenetic Assignment of Named Global Outbreak LINeages
Full pangolin documentation found at cov-lineages.org
Find the pangolin web application here, thanks to the Centre for Genomic Pathogen and Surveillance!
- Full documentation
- Requirements
- Install pangolin
- Check the install worked
- Updating pangolin
- Updating from pangolin v1.0 to pangolin v2.0
- Basic usage
- Output
- pangoLEARN description
- Citing pangolin
- References
Pangolin runs on MacOS and Linux. The conda environment recipe may not build on Windows (I haven't tested it) but can be run using the Windows subsystem for Linux.
- Some version of conda, we use Miniconda3. Can be downloaded from here
- Your query fasta file
- Clone this repository and
cd pangolin
conda env create -f environment.yml
conda activate pangolin
python setup.py install
- That's it
Troubleshooting install see the pangolin wiki
Note: we recommend using pangolin in the conda environment specified in the
environment.yml
file as per the instructions above. If you can't use conda for some reason, bear in mind the data files are hosted in two separate repositories at
- cov-lineages/lineages
- cov-lineages/pangoLEARN
you will need to pip install them alongside the other dependencies for pangolin (details found in environment.yml).
Type (in the pangolin environment):
pangolin -v
pangolin -pv
and you should see the versions of pangolin, and pangoLEARN data release printed respectively.
Note: Even if you have previously installed pangolin, as it is being worked on intensively, we recommend you check for updates before running.
To update pangolin and pangoLEARN automatically to the latest stable release:
conda activate pangolin
pangolin --update
If extra dependencies are introduced (for major releases) the full environment will need to be updated as below:
Alternatively, this can be done manually:
conda activate pangolin
git pull
pulls the latest changes from githubpython setup.py install
re-installs pangolin.conda env update -f environment.yml
updates the conda environment (you're unlikely to need to do this, but just in case!)pip install git+https://github.com/cov-lineages/pangoLEARN.git --upgrade
updates if there is a new data release
- If invoking data path (-d), changed to pangoLEARN instead of lineages
-d /home/vix/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangoLEARN/data
- The columns in the output file has also changed, unless running
--legacy
- No longer
UFBootstrap
,aLRT
orlineages_version
- New fields:
probability
andpangoLEARN_version
- Activate the environment
conda activate pangolin
- Run
pangolin <query>
, where<query>
is the name of your input file.
Your output will be a csv file with taxon name and lineage assigned, one line corresponding to each sequence in the fasta file provided
Example:
Taxon | Lineage | support | pangoLEARN_version | status | note |
---|---|---|---|---|---|
Virus1 | B.1 | 80 | 2020-04-27 | passed_qc | |
Virus2 | A.1 | 65 | 2020-04-27 | passed_qc | |
Virus3 | A.3 | 100 | 2020-04-27 | passed_qc | |
Virus4 | B.1.4 | 82 | 2020-04-27 | passed_qc | |
Virus5 | None | 0 | 2020-04-27 | fail | N_content:0.80 |
Virus6 | None | 0 | 2020-04-27 | fail | seq_len:0 |
Virus7 | None | 0 | 2020-04-27 | fail | failed to map |
There is a publication in prep for pangolin, but in the meantime please to link to this github github.com/cov-lineages/pangolin if you have used pangolin in your research.
The following external software is run as part of pangolin:
Heng Li, Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics, Volume 34, Issue 18, 15 September 2018, Pages 3094–3100, https://doi.org/10.1093/bioinformatics/bty191
Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.