54gene workflow: 54gene WGS germline

This workflow is designed to take either fastqs or gvcfs as input, and emit a joint-called multi-sample VCF. Please see Read the Docs for additional documentation.

You can find a small test dataset and pre-configured files for this pipeline here.

Authors

Esha Joshi
Cameron Palmer
Bari Jane Ballew (@bballew)

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository.

Step 1: Obtain a copy of this workflow

Clone this repository to your local system, into the place where you want to perform the data analysis.

    git clone [email protected]:data-analysis5/54gene-wgs-germline.git

Step 2: Configure workflow

The pipeline inputs include:

A configuration file
A manifest file
A list of intervals
A sex linker file
A MultiQC config file (provided)

Step 3: Install the run-time environment

If needed, install miniconda by following the steps here.

Create a conda environment with, minimally, the dependencies defined in environment.yaml.

# create the env
conda env create -f environment.yaml

Step 4: Execute workflow

Activate the conda environment:

conda activate 54gene-wgs-germline

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

To run the pipeline in a cluster environment, edit wrapper.sh as needed for your system, and then run via

bash run.sh

Alternatively, you may run snakemake pipelines on a cluster via something like this

snakemake --use-conda --cluster sbatch --jobs 100

Step 5: Investigate results

Upon pipeline completion, verify that all steps have completed without error by checking the top-level Snakemake log. The bottom few lines of should contain something like nnn of nnn steps (100%) done. Additional job logs (when run on a cluster) are stored in the logs/ directory.

All pipeline results are stored in the results/ directory.

The hard-filtered, joint-called VCF can be found in results/HaplotypeCaller/filtered/HC_variants.hardfiltered.vcf.gz.

For future joint-calling, the gVCFs are located at results/HaplotypeCaller/called/<sample>_all_chroms.g.vcf.gz.

Deduplicated and post-BQSR bams are found at results/bqsr/<sample>.bam.

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
config		config
docs		docs
workflow		workflow
.codespellrc		.codespellrc
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
run.sh		run.sh
wrapper.sh		wrapper.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

54gene workflow: 54gene WGS germline

Authors

Usage

Step 1: Obtain a copy of this workflow

Step 2: Configure workflow

Step 3: Install the run-time environment

Step 4: Execute workflow

Step 5: Investigate results

About

Releases

Packages

Languages

License

bballew/54gene-wgs-germline

Folders and files

Latest commit

History

Repository files navigation

54gene workflow: 54gene WGS germline

Authors

Usage

Step 1: Obtain a copy of this workflow

Step 2: Configure workflow

Step 3: Install the run-time environment

Step 4: Execute workflow

Step 5: Investigate results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages