This is a multi-processes parallel version of Pindel to accelerate structural variation detection. Pindel's implementation can be found at https://github.com/xjtu-omics/pindel
Yang, Y., Wang, X., Xu, Y., Yang, C., Jiang, B., & Peng, S. (2021, December). ParaPindel: a scalable coordinated parallel detection framework for human genome-wide structural variation. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 574-579). IEEE.
git clone https://github.com/pengsl-lab/ParaPindel.git
cd ParaPindel
./INSTALL ./htslib
./INSTALL /home/user/path/to/your/htslib
mkdir result
(2) Create a configuration file test.config
for locating the bam file, including the location of the bam file (it is best to use an absolute path), InsertSize and a field named for the bam file. For example:
/home/user/path/to/yuor/test.bam 500 test
mpirun -np 4 ./paraPindel -f /path/to/your/reference.fasta -i /path/to/your/test.config -w 5 -W 5 -c ALL -T 8 -o ./result/test
The -np
parameter represents the number of processes. Other parameters can be viewed using the ./paraPindel -h
command.
(4) If it is on a multi-node cluster with slurm, use the following command(The specific submission instructions depend on the cluster you are using):
srun -N 4 ./paraPindel -f /path/to/your/reference.fasta -i /path/to/your/test.config -w 5 -W 5 -c ALL -T 8 -o ./result/test
Or submit according to the job method with the following command:
sbatch -N 4 ./paraPindel -f /path/to/your/reference.fasta -i /path/to/your/test.config -w 5 -W 5 -c ALL -T 8 -o ./result/test
-np
is a parameter that needs to be added when running the mpirun
command, not a parameter of ./paraPindel
, which represents how many processes are used for parallel detection. Similarly, -N
is the parameter when running the srun
or sbatch
command.
The -T
parameter is implemented in the current Pindel version (https://github.com/xjtu-omics/pindel) and represents the number of threads. -T 8
is specified here, which means that 8 threads are used in each process.
cd ./result
cat test_D_* > test_D
rm test_D_*
../pindel2vcf -p test_D -r /path/to/your/reference.fasta -R hg19 -d 20210606 -G -v test_D.vcf
cat test_D.vcf | vcf-sort > test_D_sorted.vcf
The pindel2vcf
tool is used to convert the variation results into vcf
format, which is implemented in Pindel. For the specific usage of pindel2vcf
, please refer to the Pindel homepage (https://github.com/xjtu-omics/pindel).