Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
guideseq		guideseq
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
EMX1_visualization.svg		EMX1_visualization.svg
LICENSE		LICENSE
MANIFEST		MANIFEST
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
guideseq_flowchart.png		guideseq_flowchart.png
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

GUIDE-Seq

The guideseq package implements our data preprocessing and analysis pipeline for GUIDE-Seq data. It takes raw sequencing reads (FASTQ) as input and produces a table of annotated off-target sites as output.

Features

The package implements a pipeline consisting of a read preprocessing module followed by an off-target identification module. The preprocessing module takes raw reads (FASTQ) from a pooled multi-sample sequencing run as input. Reads are demultiplexed into sample-specific FASTQs and PCR duplicates are removed using unique molecular index (UMI) barcode information.

This package also produces visualizations of detected off-target sites, as seen below.

Dependencies

Python (2.6, 2.7, or PyPy)
bwa alignment tool
bedtools genome arithmetic utility
Reference genome fasta file (Example)

Getting Set Up

Using this software is easy, just make sure you have all of the dependencies installed and then grab a copy of this repository.

Download the bwa executable from their website. Extract the file and make sure you can run it by typing /path/to/bwa and getting the program's usage page.
Download the bedtools package by following directions from their website. Make sure you can run it by typing /path/to/bedtools or just bedtools and get the program's usage page.
Make sure you have a copy of a reference genome fasta file. (Example)
Download and extract the guideseq package. You can do this either by downloading the zip and extracting it manually, or by cloning the repository git clone --recursive https://github.com/aryeelab/guideseq.git.
Install the guideseq dependencies by entering the guideseq directory and running pip install -r requirements.txt.

Usage

Using this tool is simple, just create a .yaml manifest file referencing the dependencies and sample .fastq.gz file paths. Then, run python /path/to/guideseq.py all -m /path/to/manifest.yaml

Below is an example manifest.yaml file:

reference_genome: /Volumes/Media/hg38/hg38.fa
output_folder: ../test/output

bwa: bwa
bedtools: bedtools

undemultiplexed:
    forward: ../test/data/undemux.r1.fastq.gz
    reverse: ../test/data/undemux.r2.fastq.gz
    index1: ../test/data/undemux.i1.fastq.gz
    index2: ../test/data/undemux.i2.fastq.gz

samples:
    control:
        target:
        barcode1: CTCTCTAC
        barcode2: CTCTCTAT
        description: Control

    EMX1:
        target: GAGTCCGAGCAGAAGAAGAANGG
        barcode1: TAGGCATG
        barcode2: TAGATCGC
        description: Round 3 Adli

Absolute paths are recommended. Be sure to point the bwa and bedtools paths directly to their respective executables.

Once you have a manifest file created, you can simply execute python PATH/TO/guideseq.py all -m PATH/TO/MANIFEST.YAML to run the entire pipeline. All output files, including the results of each individual step, will be placed in the output_folder.

Running Pipeline Steps Individually

You can also run each step of the pipeline individually by running python PATH/TO/guideseq.py [STEP] [OPTIONS]. Supported commands are:

all: Run all pipeline steps (manifest required)
demultiplex: Demultiplex undemultiplexed files (manifest required)
umitag: UMI-tag demultiplexed files
consolidate: Consolidate UMI-tagged files
align: Align consolidated reads to a reference genome
identify: Identify offtarget sites from aligned reads
filter: Filter identified background sites from identified treatment sites
visualize: Produce visualization of off-target sites from result of the identify step

Testing

To run tests, you must first create a .genome text file in the guideseq root folder with a single line containing the absolute path to the hg38 reference genome .fasta file. Then, you can simply run tox to run the full test pipeline.

License

This software is licensed under the GNU AGPL license. For usage information about this license, see the GNU AGPL information page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GUIDE-Seq

Features

Dependencies

Getting Set Up

Usage

Running Pipeline Steps Individually

Testing

License

About

Releases 9

Packages

Contributors 6

Languages

License

aryeelab/guideseq

Folders and files

Latest commit

History

Repository files navigation

GUIDE-Seq

Features

Dependencies

Getting Set Up

Usage

Running Pipeline Steps Individually

Testing

License

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 6

Languages

Packages