stRainy is a graph-based phasing algorithm, that takes a de novo assembly graph (in gfa format) and simplifies it by combining phasing information and graph structure.
The recommended way of installing is though conda:
git clone https://github.com/katerinakazantseva/stRainy
cd stRainy
git submodule update --init
make -C submodules/Flye
conda env create -f environment.yml -n strainy
Note that if you use an M1 conda installation, you should run conda config --add subdirs osx-64
before installation.
Find details here
Once installed, you will need to activate the conda environemnt prior running:
conda activate strainy
./strainy.py
After successful installation, you should be able to run:
conda activate strainy
./strainy.py phase -o out_strainy -b test_set/toy.bam -g test_set/toy.gfa -t 4 -m hifi
./strainy.py transform -o out_strainy -b test_set/toy.bam -g test_set/toy.gfa -t 4 -m hifi
stRainy is under active development! The current version is optimized for a relatively simple bacterial communities (one or a few bacterial species, 2-5 strains each). Extending stRainy to larger metagenomes is a work in progress.
stRainy supports PacBio HiFi and Nanopore (Guppy5+) sequencing.
The inputs to metaPhase are:
- GFA file (can be produced with metaFlye or minigraph) and
- BAM file (reads aligned to the fasta reference generated from the GFA file).
How to get fasta from gfa and perform alignment (assuming ONT reads):
awk '/^S/{print ">"$2"\n"$3}’ assembly_graph.gfa > assembly_graph.fasta
minimap2 -ax map-ont assembly_graph.gfa reads.fastq | samtools sort -@4 -t 8 > assembly_graph.bam
samtools index assembly_graph.bam
strainy.py phase - performs reads clustering according to SNP positions using community detection approach
strainy.py transfom - transforms assembly graph
./strainy.py phase -o output_dir -b bam_file -g gfa_graph -m mode -t threads
Phasing stage clusters reads and produce csv files with read names and corresponding cluster names and BAM file wich visualise reads clustering
./strainy.py transform -o output_dir -b bam_file -g gfa_graph -m mode -t threads
Transform stage transform and simplify initial assembly graph, produce the final gfa file: transformed_after_simplification_merged.gfa
strainy.py [-h] [-s SNP] [-t THREADS] [-f FASTA] -o OUTPUT -b BAM -g GFA -m {hifi,nano} stage
positional arguments:
stage stage to run: either phase or transform
optional arguments:
-h, --help show this help message and exit
-s SNP, --snp SNP vcf file
-t THREADS, --threads THREADS
number of threads
-f FASTA, --fasta FASTA
fasta file
required named arguments:
-o OUTPUT, --output OUTPUT
output dir
-b BAM, --bam BAM bam file
-g GFA, --gfa GFA gfa file
-m {hifi,nano}, --mode {hifi,nano}
Consensus function of stRainy is Flye
Community detection algorithm is Karate club
stRainy was originally developed at at Kolmogorov lab at NCI
Code contributors:
- Ekaterina Kazantseva
- Ataberk Donmez
- Mikhail Kolmogorov
Ekaterina Kazantseva, Ataberk Donmez, Mihai Pop, Mikhail Kolmogorov. "stRainy: assembly-based metagenomic strain phasing using long reads" bioRxiv 2023, https://doi.org/10.1101/2023.01.31.526521
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.