forked from loosolab/TOBIAS
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit b1d54d3
Showing
83 changed files
with
1,075,301 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
*.bam filter=lfs diff=lfs merge=lfs -text | ||
*.bw filter=lfs diff=lfs merge=lfs -text | ||
*.gz filter=lfs diff=lfs merge=lfs -text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
*.pyc | ||
*.c | ||
.snakemake/ | ||
build/ | ||
dist/ | ||
*.egg | ||
*.egg-info |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2017 MPI for Heart and Lung Research | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
include README.md | ||
include LICENSE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
TOBIAS - Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal | ||
======================================= | ||
|
||
Introduction | ||
------------ | ||
|
||
ATAC-seq (Assay for Transposase-Accessible Chromatin using high-throughput sequencing) is a sequencing assay for investigating genome-wide chromatin accessibility. The assay applies a Tn5 Transposase to insert sequencing adapters into accessible chromatin, enabling mapping of regulatory regions across the genome. Additionally, the local distribution of Tn5 insertions contains information about transcription factor binding due to the visible depletion of insertions around sites bound by protein - known as _footprints_. | ||
|
||
**TOBIAS** is a collection of command-line bioinformatics tools for performing footprinting analysis on ATAC-seq data, and includes: | ||
|
||
<img align="right" width=150 src="/figures/tobias.png"> | ||
|
||
- Correction of Tn5 insertion bias | ||
- Calculation of footprint scores within regulatory regions | ||
- Estimation of bound/unbound transcription factor binding sites | ||
- Visualization of footprints within and across different conditions | ||
|
||
For information on each tool, please see the [wiki](https://github.molgen.mpg.de/loosolab/TOBIAS/wiki/). | ||
|
||
Installation | ||
------------ | ||
TOBIAS is written as a python package and can be quickly installed within a conda environment using: | ||
```bash | ||
$ git clone https://github.molgen.mpg.de/loosolab/TOBIAS | ||
$ cd TOBIAS | ||
$ conda env create -f snakemake_pipeline/environments/tobias.yaml | ||
$ conda activate TOBIAS_ENV | ||
$ python setup.py install | ||
``` | ||
Please see the [installation](https://github.molgen.mpg.de/loosolab/TOBIAS/wiki/installation) page for more info. | ||
|
||
Usage | ||
------------ | ||
All tools are available through the command-line as ```TOBIAS <TOOLNAME>```, for example: | ||
``` | ||
$ TOBIAS ATACorrect | ||
__________________________________________________________________________________________ | ||
TOBIAS ~ ATACorrect | ||
__________________________________________________________________________________________ | ||
ATACorrect corrects the cutsite-signal from ATAC-seq with regard to the underlying | ||
sequence preference of Tn5 transposase. | ||
Usage: | ||
TOBIAS ATACorrect --bam <reads.bam> --genome <genome.fa> --peaks <peaks.bed> | ||
Output files: | ||
- <outdir>/<prefix>_uncorrected.bw | ||
- <outdir>/<prefix>_bias.bw | ||
- <outdir>/<prefix>_expected.bw | ||
- <outdir>/<prefix>_corrected.bw | ||
- <outdir>/<prefix>_atacorrect.pdf | ||
(...) | ||
``` | ||
|
||
Snakemake pipeline | ||
------------ | ||
|
||
You can run each TOBIAS tool independently or as part of a pipeline using the included snakemake workflow. Simply set the paths to required data within snakemake_pipeline/TOBIAS.config and run using: | ||
```bash | ||
$ cd snakemake_pipeline | ||
$ conda activate TOBIAS_ENV | ||
$ snakemake --snakefile TOBIAS.snake --configfile TOBIAS.config --cores [number of cores] --keep-going | ||
``` | ||
For further info on setup, configfile and output, please consult the [wiki](https://github.molgen.mpg.de/loosolab/TOBIAS/wiki/snakemake-pipeline). | ||
|
||
License | ||
------------ | ||
This project is licensed under the [MIT license](LICENSE). | ||
|
||
|
||
Contact | ||
------------ | ||
Mette Bentsen (mette.bentsen (at) mpi-bn.mpg.de) |
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
from setuptools import setup, Extension | ||
import numpy as np | ||
|
||
def readme(): | ||
with open('README.md') as f: | ||
return f.read() | ||
|
||
ext_modules = [Extension("tobias.utils.ngs", ["tobias/utils/ngs.pyx"], include_dirs=[np.get_include()]), | ||
Extension("tobias.utils.sequences", ["tobias/utils/sequences.pyx"], include_dirs=[np.get_include()]), | ||
Extension("tobias.utils.signals", ["tobias/utils/signals.pyx"], include_dirs=[np.get_include()])] | ||
|
||
setup(name='tobias', | ||
version='1.0.0', | ||
description='Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal', | ||
long_description=readme(), | ||
url='https://github.molgen.mpg.de/loosolab/TOBIAS', | ||
author='Mette Bentsen', | ||
author_email='[email protected]', | ||
license='MIT', | ||
packages=['tobias', 'tobias.footprinting', 'tobias.utils', 'tobias.plotting', 'tobias.motifs'], | ||
entry_points = { | ||
'console_scripts': ['TOBIAS=tobias.TOBIAS:main'] | ||
}, | ||
install_requires=[ | ||
'setuptools_cython', | ||
'numpy', | ||
'scipy', | ||
'pyBigWig', | ||
'pysam', | ||
'pybedtools', | ||
'matplotlib>=2', | ||
'scikit-learn', | ||
'pandas', | ||
'pypdf2', | ||
'xlsxwriter', | ||
'adjustText', | ||
], | ||
#dependency_links=['https://github.com/jhkorhonen/MOODS/tarball/master'], | ||
classifiers = [ | ||
'License :: OSI Approved :: MIT License', | ||
'Intended Audience :: Science/Research', | ||
'Topic :: Scientific/Engineering :: Bio-Informatics', | ||
'Programming Language :: Python :: 3' | ||
], | ||
zip_safe=False, | ||
include_package_data=True, | ||
ext_modules = ext_modules, | ||
scripts=["tobias/utils/peak_annotation.sh"] | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#-------------------------------------------------------------------------# | ||
#-------------------------- TOBIAS input data ----------------------------# | ||
#-------------------------------------------------------------------------# | ||
|
||
data: | ||
control: [test_data/control_s1.bam, test_data/control_s2.bam] #list of bam files | ||
treatment: [test_data/treatment_s1.bam] #list of bam files | ||
|
||
run_info: | ||
organism: human #mouse/human | ||
fasta: test_data/genome.fa #.fasta-file containing organism genome | ||
blacklist: test_data/blacklist.bed #.bed-file containing blacklisted regions | ||
gtf: test_data/genes.gtf #.gtf-file for annotation of peaks | ||
motifs: test_data/motifs/ #directory containing motifs (single files in meme or JASPAR pfm format) | ||
output: test_output/ #output directory | ||
|
||
|
||
|
||
#-------------------------------------------------------------------------# | ||
#----------------------- Default module parameters -----------------------# | ||
#-------------------------------------------------------------------------# | ||
|
||
macs: "--nomodel --shift -100 --extsize 200 --broad" | ||
|
||
# for parameter description see uropa manual: http://uropa-manual.readthedocs.io/config.html | ||
# adjust filter attribute for given gtf: ensembl gene_biotype / genecode gene_type | ||
# other optional parameters: --filter_attribute gene_biotype --attribute_value protein_coding | ||
uropa: "--feature gene --feature_anchor start --distance [10000,1000] --show_attribute gene_name,gene_id,gene_biotype" | ||
|
||
atacorrect: "" | ||
footprinting: "" | ||
bindetect: "" | ||
plotting: "" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
""" | ||
Upper level TOBIAS snake | ||
""" | ||
|
||
import os | ||
import subprocess | ||
import itertools | ||
|
||
#Set config | ||
if workflow.overwrite_configfile != None: | ||
configfile: str(workflow.overwrite_configfile) | ||
else: | ||
configfile: 'TOBIAS.config' | ||
CONFIGFILE = str(workflow.overwrite_configfile) | ||
|
||
include: "snakefiles/helper.snake" | ||
#shell.prefix("") | ||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------- CHECK FORMAT OF CONFIG FILE -------------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
required = [("data",), | ||
("run_info",), | ||
("run_info", "organism"), | ||
("run_info", "fasta"), | ||
("run_info", "blacklist"), | ||
("run_info", "gtf"), | ||
("run_info", "motifs"), | ||
("run_info", "output"), | ||
] | ||
|
||
#Check if all keys are existing and contain information | ||
for key_list in required: | ||
lookup_dict = config | ||
for key in key_list: | ||
try: | ||
lookup_dict = lookup_dict[key] | ||
if lookup_dict == None: | ||
print("ERROR: Missing input for key {0}".format(key_list)) | ||
except: | ||
print("ERROR: Could not find key(s) \"{0}\" in configfile {1}. Please check that your configfile has right format for TOBIAS.".format(":".join(key_list), CONFIGFILE)) | ||
sys.exit() | ||
|
||
#Check if there is at least one condition with bamfiles | ||
if len(config["data"]) > 0: | ||
for condition in config["data"]: | ||
if len(config["data"][condition]) == 0: | ||
print("ERROR: Could not find any bamfiles in \"{0}\" in configfile {1}".format(":".join(("data", condition)), CONFIGFILE)) | ||
else: | ||
print("ERROR: Could not find any conditions (\"data:\{condition\}\") in configfile {0}".format(CONFIGFILE)) | ||
sys.exit() | ||
|
||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------- WHICH FILES/INFO WERE INPUT? ------------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
input_files = [] | ||
|
||
#Files related to experimental data (bam) | ||
CONDITION_IDS = list(config["data"].keys()) | ||
for condition in CONDITION_IDS: | ||
if not isinstance(config["data"][condition], list): | ||
config['data'][condition] = [config['data'][condition]] | ||
input_files.extend(config['data'][condition]) | ||
|
||
|
||
#Flatfiles independent from experimental data (run_info) | ||
FASTA = config['run_info']['fasta'] | ||
BLACKLIST = config['run_info']['blacklist'] | ||
GTF = config['run_info']['gtf'] | ||
OUTPUTDIR = config['run_info']["output"] | ||
BLACKLIST = config['run_info']['blacklist'] | ||
MOTIFDIR = config['run_info']['motifs'] | ||
|
||
input_files.extend([FASTA, BLACKLIST, GTF]) | ||
|
||
|
||
#---------- Test that input files exist -----------# | ||
for file in input_files: | ||
if file != None: | ||
full_path = os.path.abspath(file) | ||
if not os.path.exists(full_path): | ||
exit("ERROR: The following file given in config does not exist: {0}".format(full_path)) | ||
|
||
|
||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------ WHICH FILES SHOULD BE CREATED? -----------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
output_files = [] | ||
|
||
#--------------------------------- MOTIFS --------------------------------------# | ||
#Identify IDS of motifs | ||
files = os.listdir(MOTIFDIR) | ||
MOTIF_FILES = {} | ||
for file in files: | ||
full_file = os.path.join(MOTIFDIR, file) | ||
with open(full_file) as f: | ||
for line in f: | ||
if line.startswith("MOTIF"): | ||
columns = line.rstrip().split() | ||
ID = columns[2] + "_" + columns[1] | ||
ID = filafy(ID) | ||
elif line.startswith(">"): | ||
columns = line.replace(">", "").rstrip().split() | ||
ID = columns[1] + "_" + columns[0] | ||
ID = filafy(ID) | ||
MOTIF_FILES[ID] = full_file | ||
|
||
TF_IDS = list(MOTIF_FILES.keys()) | ||
|
||
|
||
#---------------------------- OUTPUT PER CONDITION -----------------------------# | ||
|
||
id2bam = {condition:{} for condition in CONDITION_IDS} | ||
|
||
for condition in CONDITION_IDS: | ||
|
||
config_bams = config['data'][condition] | ||
sampleids = [os.path.splitext(os.path.basename(bam))[0] for bam in config_bams] | ||
id2bam[condition] = {sampleids[i]:config_bams[i] for i in range(len(sampleids))} # Link sample ids to bams | ||
|
||
|
||
PLOTNAMES = expand("{condition}_{plotname}", condition=CONDITION_IDS, plotname=["heatmap", "aggregate"]) | ||
if len(CONDITION_IDS) > 1: | ||
PLOTNAMES.extend(["heatmap_comparison", "aggregate_comparison"]) | ||
|
||
output_files.append(expand(os.path.join(OUTPUTDIR, "footprinting", "{condition}_footprints.bw"), condition=CONDITION_IDS)) | ||
|
||
#output_files.append(os.path.join(OUTPUTDIR, "overview", "TFBS_distance.txt")) | ||
output_files.append(os.path.join(OUTPUTDIR, "TFBS", "bindetect_results.txt")) | ||
output_files.append(os.path.join(OUTPUTDIR, "overview", "bindetect_results.txt")) | ||
|
||
#Visualization | ||
output_files.extend(expand(os.path.join(OUTPUTDIR, "TFBS", "{TF}", "plots", "{TF}_{plotname}.pdf"), TF=TF_IDS, plotname=PLOTNAMES)) | ||
output_files.extend(expand(os.path.join(OUTPUTDIR, "overview", "all_{plotname}.pdf"), plotname=PLOTNAMES)) | ||
|
||
|
||
#-------------------------- OUTPUT ACROSS CONDITIONS ---------------------------# | ||
|
||
""" | ||
COMPARE_COND = 0 | ||
if len(CONDITION_IDS) > 1: | ||
COMPARE_COND = 1 # flag | ||
output_files.extend(expand(os.path.join(OUTPUTDIR, "TFBS", "{TF}", "plots", "{TF}_heatmap_comparison.pdf"), TF=TF_IDS)) | ||
output_files.extend(expand(os.path.join(OUTPUTDIR, "TFBS", "{TF}", "plots", "{TF}_aggregate_comparison.pdf"), TF=TF_IDS)) | ||
#output_files.extend([os.path.join(OUTPUTDIR, "overview", "diff_bind_plot.pdf")]) | ||
|
||
""" | ||
#-------------------------------- OTHER OUTPUT ---------------------------------# | ||
|
||
|
||
|
||
|
||
#-------------------------------------------------------------------------------# | ||
#--------------------- WHICH SNAKE MODULES SHOULD BE USED? ---------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
include: "snakefiles/preprocessing.snake" | ||
include: "snakefiles/footprinting.snake" | ||
include: "snakefiles/visualization.snake" | ||
|
||
|
||
|
||
#-------------------------------------------------------------------------------# | ||
#------------------------ DEAL WITH SPECIAL ENVIRONMENTS -----------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
""" | ||
sys_env = subprocess.check_output(['conda', 'env', 'list'], universal_newlines=True) | ||
env_list = [line.split()[0] for line in sys_env.split("\n") if len(line.split()) > 0] | ||
|
||
# default TOBIAS environment | ||
if "TOBIAS_ENV" not in env_list: | ||
print("Creating TOBIAS environment for the first time") | ||
subprocess.call(["conda", "env", "create", "--file", "environments/tobias.yaml"]) | ||
|
||
# python 2 related envs | ||
if "MACS_ENV" not in env_list: | ||
print("Creating macs environment for the first time") | ||
subprocess.call(["conda", "env", "create", "--file", "environments/macs.yaml"]) | ||
|
||
""" | ||
#-------------------------------------------------------------------------------# | ||
#---------------------------------- RUN :-) ------------------------------------# | ||
#-------------------------------------------------------------------------------# | ||
|
||
rule all: | ||
input: | ||
output_files | ||
message: "Rule all" | ||
|
Oops, something went wrong.