A software to detect structural variations in human genomes by read extension, spliced alignment and local assembly.
Author: Peng Xu
Email: [email protected]
Draft date: June. 4, 2019
To cite: Xu, P., Chen, Y., Gao, M., and Chong, Z. (2021). ClipSV: improving structural variation detection by read extension, spliced alignment and tree-based decision rules. NAR Genom Bioinform 3, lqab003.
ClipSV was developed to detect structural variations by read extension, spliced alignment and local assembly. It primarily depends on clipped reads from short-read sequencing platform. ClipSV was optimized to discover INDELS (5bp-50bp) and structural variations (>=50bp) in human genomes.
The program was tested on a x86_64 Linux system with 12 cores, each with a 4GB physical memory. The work can be usually finished within 6 hours per 30x whole genome sequencing sample.
Dependency: Python3, samtools (https://github.com/samtools/samtools), minimap2 (https://github.com/lh3/minimap2), bedtools (https://bedtools.readthedocs.io/en/latest/), velvet(https://www.ebi.ac.uk/~zerbino/velvet/) should be installed in current path.
git clone https://github.com/penguab/ClipSV.git
Then, please also add this directory to your PATH:
export PATH=$PWD/ClipSV/:$PATH
ClipSV needs two files as inputs. The first is an indexed bam/cram file from whole genome sequencing. The second is the genome reference indexed by minimap2 (To generate index file, use command "minimap2 -d genome.mmi genome.fa").
For human studies, the latest genome with decoy and HLA cotigs is recommended: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa.
Quick start:
source activate python3
PATH=$PWD/ClipSV/:$PATH
clipsv.py -t 12 -b bam/cram -g genome.fa
Parameters:
clipsv.py -b <bam/cram file> -g <genome.fa> [-dtphv]
-b Indexed bam/cram file
-g Fasta file of genome sequence (Should be indexed by Minimap2 "minimap2 -d genome.mmi genome.fa")
----Optional---
-t Threads (default: 12)
-d Sequencing depth (default: automatically determined)
-p Prefix (default: ClipSV_out)
-v Version
-h Help
6/4/2019: First version released.