Skip to content

twobrassiere/Variant_callng_colon_cancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Variant_callng_colon_cancer

This code of review by variant calling pipeline using GATK4 and nextflow and download whole exome sequencing for metastatic colorectal cancer.

Part 0: Setup

  1. Anaconda or Colab-pro write automatically wxs analysis pipeline code by jupyter notebook.
  2. Sra-toolkit download and fastq-dump by whole exome sequencing for metastatic colorectal cancer.
  3. Cutadapt trimmed wxs sequence data .
  4. Docker(>=19.03) or Singularity(by HPC) automatically run bioinformation container .
  5. Nextflow automatically bulid sequence anaylsis pipeline .
  6. GATK(=4.2.3) package variant call .

Part 1: Download whole exome sequencing for metastatic colorectal cancer(PRJNA726023) and trimmed by fold of Jupyter notebook

Step 1 : Download accession list by SRA run selector

Step 2 : Automatically download sequencing code by Automatic download sequence by sra toolkit.ipynb

Step 3 : Search adapter by Illumina Adapter Sequences

Step 4 : Automatically trimmed wxs sequence data by Use pair-end sequnce triming by cutapat.ipynb

We bulid workflow of variant calling pipeline using nextflowiamge

Part 2 : variant calling using nexflow

 $ ./nextflow run main.nf --help
=========================================
neoflow => WXS anylsis
=========================================
Usage:
nextflow run main.nf
Arguments:
  --reads                     Reads data in fastq.gz or fastq format. For example, "*_{1,2}.fastq.gz"
  --ref_dir                   Reference  sequence folder
  --seqtype                   Read type, dna or rna. Default is dna.
  --singleEnd                 Single end or not, default is false (pair end reads)
  --cpu                       The number of CPUs, default is 4.
  --vcf_dir                   Folder of variant file , default is "./"
  --help                      Print help message

The output of main.nf is a txt format file containing vcf for a sample and zip file . This file is generated by vcf file

Example by servse

 ./nextflow run  main.nf
 --reads "./fastq_trimmed/SRR14463457_pass_{1,2}_trimmed.fastq.gz" \ 
 --ref_dir ./reference_genome/GRCh37 \
 --vcf_dir ./vcf \
 --cpu 14

Example by Taiwania 3(HPC)

#!/usr/bin/sh
#SBATCH -A MST109178        # Account name/project number
#SBATCH -J Job_name         # Job name
#SBATCH -p ngs186G        # Partition Name 等同PBS裡面的 -q Queue name
#SBATCH -c 28              # 使用的數 請參考Queue資源設定
#SBATCH --mem=186g           # 使用的記憶體量 請參考Queue資源設定
#SBATCH -o out.log          # Path to the standard output file
#SBATCH -e err.log          # Path to the standard error ouput file
#SBATCH [email protected]    # email
#SBATCH --mail-type=BEGIN,END              # 指定送出email時機 可為NONE, BEGIN, END, FAIL, REQUEUE, ALL

module load  biology/Samtools/1.15.1
module load  biology/OpenJDK/17.0.2+8
module load biology/BWA/0.7.17
module load  biology/GATK/4.2.3.0
 ./nextflow run main.nf 
 --reads "./fastq_trimmed/SRR14463457_pass_{1,2}_trimmed.fastq.gz" \
 --ref_dir ./reference_genome/GRCh37 \
 --vcf_dir ./vcf \
 --cpu 28

Acknowledgements

We appreciate Taiwania 3 by TWCC to variant call of colon cancer

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published