This code of review by variant calling pipeline using GATK4 and nextflow and download whole exome sequencing for metastatic colorectal cancer.
- Anaconda or Colab-pro write automatically wxs analysis pipeline code by jupyter notebook.
- Sra-toolkit download and fastq-dump by whole exome sequencing for metastatic colorectal cancer.
- Cutadapt trimmed wxs sequence data .
- Docker(>=19.03) or Singularity(by HPC) automatically run bioinformation container .
- Nextflow automatically bulid sequence anaylsis pipeline .
- GATK(=4.2.3) package variant call .
Part 1: Download whole exome sequencing for metastatic colorectal cancer(PRJNA726023) and trimmed by fold of Jupyter notebook
Step 1 : Download accession list by SRA run selector
Step 2 : Automatically download sequencing code by Automatic download sequence by sra toolkit.ipynb
Step 3 : Search adapter by Illumina Adapter Sequences
Step 4 : Automatically trimmed wxs sequence data by Use pair-end sequnce triming by cutapat.ipynb
$ ./nextflow run main.nf --help
=========================================
neoflow => WXS anylsis
=========================================
Usage:
nextflow run main.nf
Arguments:
--reads Reads data in fastq.gz or fastq format. For example, "*_{1,2}.fastq.gz"
--ref_dir Reference sequence folder
--seqtype Read type, dna or rna. Default is dna.
--singleEnd Single end or not, default is false (pair end reads)
--cpu The number of CPUs, default is 4.
--vcf_dir Folder of variant file , default is "./"
--help Print help message
The output of main.nf
is a txt format file containing vcf for a sample and zip file . This file is generated by vcf file
./nextflow run main.nf
--reads "./fastq_trimmed/SRR14463457_pass_{1,2}_trimmed.fastq.gz" \
--ref_dir ./reference_genome/GRCh37 \
--vcf_dir ./vcf \
--cpu 14
#!/usr/bin/sh
#SBATCH -A MST109178 # Account name/project number
#SBATCH -J Job_name # Job name
#SBATCH -p ngs186G # Partition Name 等同PBS裡面的 -q Queue name
#SBATCH -c 28 # 使用的數 請參考Queue資源設定
#SBATCH --mem=186g # 使用的記憶體量 請參考Queue資源設定
#SBATCH -o out.log # Path to the standard output file
#SBATCH -e err.log # Path to the standard error ouput file
#SBATCH [email protected] # email
#SBATCH --mail-type=BEGIN,END # 指定送出email時機 可為NONE, BEGIN, END, FAIL, REQUEUE, ALL
module load biology/Samtools/1.15.1
module load biology/OpenJDK/17.0.2+8
module load biology/BWA/0.7.17
module load biology/GATK/4.2.3.0
./nextflow run main.nf
--reads "./fastq_trimmed/SRR14463457_pass_{1,2}_trimmed.fastq.gz" \
--ref_dir ./reference_genome/GRCh37 \
--vcf_dir ./vcf \
--cpu 28
We appreciate Taiwania 3 by TWCC to variant call of colon cancer