Notes on ChIP-seq and other-seq-related tools

ChIP-seq, ATAC-seq related tools and genomics data analysis resources. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.

Table of content

Databases
- Motif DBs
ChIP-seq
DNAse-seq
ATAC-seq
- ATAC-seq pipelines
Histone-seq
- Broad peak analysis
Technology
Machine learning
Misc

Databases

UniBind database - TFBS predictions of approx. 56 million TFBSs with experimental and computational support for direct TF-DNA interactions for 644 TFs in > 1000 cell lines and tissues. Processed approx. 10,000 public ChIP-seq datasets from nine species using ChIP-eat. ChIP-eat combines both computational (high PWM score) and experimental (centrality to ChIP-seq peak summit) support to find high-confidence direct TF-DNA interactions in a ChIP-seq experiment-specific manner, uses the DAMO tool. Input data - ReMap 2018 and GTRD. Robust and permissive collections. Over 197,000 Cis-regulatory modules. Downloads of BED, FASTA, PWMs, Tracks for the UCSC GenomeBrowser, API, Enrichment analysis, online with or without background, differential enrichment. UniBind Enrichment BitBucket.

Paper
Puig, Rafael Riudavets, Paul Boddie, Aziz Khan, Jaime Abraham Castro-Mondragon, and Anthony Mathelier. “UniBind: Maps of High-Confidence Direct TF-DNA Interactions across Nine Species” BMC Genomics, (December 2021) https://doi.org/10.1186/s12864-021-07760-6
Gheorghe, Marius, Geir Kjetil Sandve, Aziz Khan, Jeanne Chèneby, Benoit Ballester, and Anthony Mathelier. “A Map of Direct TF–DNA Interactions in the Human Genome.” Nucleic Acids Research 47, no. 4 (February 28, 2019): e21–e21. https://doi.org/10.1093/nar/gky1210

ReMap is an integrative analysis of Homo sapiens, Mus musculus and Arabidopsis thaliana transcriptional regulators from DNA-binding experiments such as ChIP-seq, ChIP-exo, DAP-seq from public sources (GEO, ENCODE, ENA). Human hg38 and Arabidopsis TAOR10. All peaks, non-redundant peaks, cis-Regulatory Modules. GitHub. Download genomic coordinates.

Paper
Chèneby, Jeanne, Zacharie Ménétrier, Martin Mestdagh, Thomas Rosnet, Allyssa Douida, Wassim Rhalloussi, Aurélie Bergon, Fabrice Lopez, and Benoit Ballester. “[ReMap 2020: A Database of Regulatory Regions from an Integrative Analysis of Human and Arabidopsis DNA-Binding Sequencing Experiments](https://doi.org/10.1093/nar/gkz945).” Nucleic Acids Research, October 29, 2019
Hammal, Fayrouz, Pierre de Langen, Aurélie Bergon, Fabrice Lopez, and Benoit Ballester. “ReMap 2022: A Database of Human, Mouse, Drosophila and Arabidopsis Regulatory Regions from an Integrative Analysis of DNA-Binding Sequencing Experiments.” Nucleic Acids Research, November 9, 2021, gkab996. https://doi.org/10.1093/nar/gkab996.

ADASTRA - the database of Allelic Dosage-corrected Allele-Specific human Transcription factor binding sites (over 500K sites across 1073 human TFs and 649 cell types, reprocessed data from GTRD, pipeline at GitHub) at nearly 270K SNPs. Background Allele Dosage (BAD) maps. Many SNPs overlap eQTLs.

Paper
Abramov, Sergey, Alexandr Boytsov, Daria Bykova, Dmitry D. Penzar, Ivan Yevshin, Semyon K. Kolmykov, Marina V. Fridman, et al. “Landscape of Allele-Specific Transcription Factor Binding in the Human Genome.” Nature Communications 12, no. 1 (December 2021): 2751. https://doi.org/10.1038/s41467-021-23007-0.

ANANASTRA - ANnotation and enrichment ANalysis of Allele-Specific TRAnscription factor binding at SNPs. Annotates a given list of SNPs with allele-specific binding events across a wide range of transcription factors and cell types using ADASTRA. Enrichment analysis of SNPs in cell type-specific TFBSs (Fisher's exact, one-sided). API.

Paper
Boytsov, Alexandr, Sergey Abramov, Ariuna Z Aiusheeva, Alexandra M Kasianova, Eugene Baulin, Ivan A Kuznetsov, Yurii S Aulchenko, et al. “ANANASTRA: Annotation and Enrichment Analysis of Allele-Specific Transcription Factor Binding at SNPs.” Nucleic Acids Research, April 21, 2022, gkac262. https://doi.org/10.1093/nar/gkac262.

Catchitt - method for predicting TFBSs, leader of ENCODE-DREAM challenge. Other methods - table in supplementary. AUPRC to benchmark performance. DNAse-seq is the best predictor, RNA-seq and sequence-based features are not informative. Java implementation, predicted peaks for 32 transcription factors in 22 primary cell types and tissues (682 total) BED hg19 files, conservative and relaxed predictions, download.

Paper
Keilwagen, Jens, Stefan Posch, and Jan Grau. “Accurate Prediction of Cell Type-Specific Transcription Factor Binding.” Genome Biology 20, no. 1 (December 2019). https://doi.org/10.1186/s13059-018-1614-y

C4S DB - Comprehensive Collection and Comparison for ChIP-Seq Database. Over 16K human ChIP-seq experiments. Data aligned to GRCh37 (hs37d5) genome. "Gene browser" and "global similarity" views. Search for gene symbol, tissue/cell line, ChIP target, sample description/ID.

Paper
Anzawa, Hayato, and Kengo Kinoshita. “C4S DB: Comprehensive Collection and Comparison for ChIP-Seq Database.” Journal of Molecular Biology, May 2023, 168157. https://doi.org/10.1016/j.jmb.2023.168157.

RAEdb - enhancer database. Enhancers identified from STARR-seq and MPRA studies. Epromoters - promoters containing enhancers. Human (hg38)/mouse (mm10) data, select cell lines. BED/FASTQ download. Links to EnhancerAtlas, VISTA, SuperEnhancer databases.

Paper
Cai, Zena, Ya Cui, Zhiying Tan, Gaihua Zhang, Zhongyang Tan, Xinlei Zhang, and Yousong Peng. “RAEdb: A Database of Enhancers Identified by High-Throughput Reporter Assays.” Database: The Journal of Biological Databases and Curation 2019 (January 1, 2019). https://doi.org/10.1093/database/bay140.

ChIP-Atlas - a large database and analysis suite of public ChIP-seq and DNAse-seq experiments (Over 76K experiments, SRA uniformly processed data). Analyses: Visualization of peaks in IGV browser, BED file download, Target genes identification, Colocalization of factors (antigens), Enrichment analysis - permutation enrichment of BED regions, with custom background possible. GitHub, Documentation.

Paper
Oki, Shinya, Tazro Ohta, Go Shioi, Hideki Hatanaka, Osamu Ogasawara, Yoshihiro Okuda, Hideya Kawaji, Ryo Nakaki, Jun Sese, and Chikara Meno. “ChIP‐Atlas: A Data‐mining Suite Powered by Full Integration of Public ChIP‐seq Data” EMBO Reports, (December 2018) https://doi.org/10.15252/embr.201846255

TRRUST database (Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining). Over 8K regulatory interactions for 800 TFs in human, and over 6K interactions for 828 mouse TFs. Mouse and human TF regulatory networks overlap, complement each other. More information than in PAZAR, TFactS, TRED, TFe databases. Download, TSV format. Tools: 1. Search a gene, 2. Enrichment of key regulators for query genes.

Paper
Han, Heonjong, Jae-Won Cho, Sangyoung Lee, Ayoung Yun, Hyojin Kim, Dasom Bae, Sunmo Yang, et al. “TRRUST v2: An Expanded Reference Database of Human and Mouse Transcriptional Regulatory Interactions.” Nucleic Acids Research 46, no. D1 (January 4, 2018): D380–86. https://doi.org/10.1093/nar/gkx1013.

GTRD - transcription factor binding sites and data (ChIP-seq, ChIP-seo, DNAse-seq, MNase-seq, ATAC-seq, RNA-seq), uniformly processed, over 35K experiments. Seven species, TFs linked to CIS-BP. All cell types are assigned onthology. Experiment search, processed data/peaks download (BED, bigBed, bigWig).

Paper
Yevshin, Ivan, Ruslan Sharipov, Tagir Valeev, Alexander Kel, and Fedor Kolpakov. “GTRD: A Database of Transcription Factor Binding Sites Identified by ChIP-Seq Experiments.” Nucleic Acids Research 45, no. D1 (January 4, 2017): D61–67. https://doi.org/10.1093/nar/gkw951.
Kolmykov, Semyon, Ivan Yevshin, Mikhail Kulyashov, Ruslan Sharipov, Yury Kondrakhin, Vsevolod J Makeev, Ivan V Kulakovskiy, Alexander Kel, and Fedor Kolpakov. “GTRD: An Integrated View of Transcription Regulation.” Nucleic Acids Research 49, no. D1 (January 8, 2021): D104–11. https://doi.org/10.1093/nar/gkaa1057.

Cistrome DB v3.0 - a resource of ChIP-seq, A T AC-seq and DNase-seq data from humans and mice. One-page interface to search by target gene and cell type, by gneomic region, find similar BED sets for the uploaded BED.

Paper
Taing, Len, Ariaki Dandawate, Sehi L’Yi, Nils Gehlenborg, Myles Brown, and Clifford A Meyer. “Cistrome Data Browser: Integrated Search, Analysis and Visualization of Chromatin Data.” Nucleic Acids Research, November 16, 2023, gkad1069. https://doi.org/10.1093/nar/gkad1069.

Cistrome DB - ChIP-seq peaks for TFs, histone modifications, DNAse/ATAC. Downloadable cell type-specific, hg38 BED files. Toolkit to answer questions like "What factors regulate your gene of interest?", "What factors bind in your interval?", "What factors have a significant binding overlap with your peak set?"

Paper
Zheng R, Wan C, Mei S, Qin Q, Wu Q, Sun H, Chen CH, Brown M, Zhang X, Meyer CA, Liu XS. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res, 2018 Nov 20. https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky1094/5193328
Mei S, Qin Q, Wu Q, Sun H, Zheng R, Zang C, Zhu M, Wu J, Shi X, Taing L, Liu T, Brown M, Meyer CA, Liu XS. Cistrome data browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse. Nucleic Acids Res, 2017 Jan 4;45(D1):D658-D662. https://academic.oup.com/nar/article/45/D1/D658/2333932

CODEX ChIP-seq - CODEX provides access to processed and curated NGS experiments, including ChIP-Seq (transcription factors and histones), RNA-Seq and DNase-Seq. Human, mouse. Download tracks, analyze correlations, motifs, compare between organisms, more.

Paper
Sánchez-Castillo, Manuel and Ruau, David and Wilkinson, Adam C. and Ng, Felicia S.L. and Hannah, Rebecca and Diamanti, Evangelia and Lombard, Patrick and Wilson, Nicola K. and Gottgens, Berthold. "CODEX: a next-generation sequencing experiment database for the haematopoietic and embryonic stem cell communities" Nucleic Acids Research, Database Issue, September 2014 https://doi.org/10.1093/nar/gku895

hTFtarget - database of TF-gene target regulations from >7K human ChIP-seq experiments.

Motif DBs

CIS-BP (The Catalog of Inferred Sequence Binding Preferences) - database of inferred sequence binding preferences. DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. PBM microarray assays to analyze TF binding preferences. Closely related DBDs (70% Amino Acid identity) almost always have very similar DNA sequence preferences, enabling inference of motifs for approx. 34% of the 70,000 known or predicted eukaryotic TFs. Tools to scan single sequence for TF binding, two sequences for differential TF binding (including SNP effect scan), protein scan, motif scan. Bulk download of PWMs, protein sequences, TF information, logos.

Paper
Weirauch, Matthew T., Ally Yang, Mihai Albu, Atina G. Cote, Alejandro Montenegro-Montero, Philipp Drewe, Hamed S. Najafabadi, et al. “Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity.” Cell 158, no. 6 (September 2014): 1431–43. https://doi.org/10.1016/j.cell.2014.08.009.

HOCOMOCO (Homo sapiens comprehensive model collection) - TFBS models and PWMs. Human- and mouse-specific models. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices. Uniformly processed data from GTRD, peaks called with four peak callers (). Used ChIPMunk in four computational models, including using DNA shape. Added MoLoTool, a web app to scan DNA sequences for TFBSs with PWMs. One model per TF is manually selected. Twice as many models as in JASPAR.

Paper
Kulakovskiy, Ivan V., Yulia A. Medvedeva, Ulf Schaefer, Artem S. Kasianov, Ilya E. Vorontsov, Vladimir B. Bajic, and Vsevolod J. Makeev. “HOCOMOCO: A Comprehensive Collection of Human Transcription Factor Binding Sites Models.” Nucleic Acids Research 41, no. D1 (January 1, 2013): D195–202. https://doi.org/10.1093/nar/gks1089.
Kulakovskiy, Ivan V., Ilya E. Vorontsov, Ivan S. Yevshin, Anastasiia V. Soboleva, Artem S. Kasianov, Haitham Ashoor, Wail Ba-alawi, et al. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44, no. D1 (January 4, 2016): D116–25. https://doi.org/10.1093/nar/gkv1249.

Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Ruslan N Sharipov, Alla D Fedorova, Eugene I Rumynskiy, Yulia A Medvedeva, et al. “HOCOMOCO: Towards a Complete Collection of Transcription Factor Binding Models for Human and Mouse via Large-Scale ChIP-Seq Analysis.” Nucleic Acids Research 46, no. D1 (January 4, 2018): D252–59. https://doi.org/10.1093/nar/gkx1106.

SwissRegulon - a database of regulatory motifs (PWMs) across model organisms (prokaryots, eukaryots). Data partly comes from JASPAR and TRANSFAC, reprocessing of ChIP-seq experiments. GBrowse for browsing TFBSs. Other tools.

Paper
Pachkov, Mikhail, Piotr J. Balwierz, Phil Arnold, Evgeniy Ozonov, and Erik van Nimwegen. “SwissRegulon, a Database of Genome-Wide Annotations of Regulatory Sites: Recent Updates.” Nucleic Acids Research 41, no. D1 (November 23, 2012): D214–20. https://doi.org/10.1093/nar/gks1145.

tangermeme - Python interface to the MEME suite

ChIP-seq

ChIP-seq-analysis - ChIP-seq analysis notes from Ming Tang

ChIP-seq pipelines

ChIP-AP - ChIP-seq analysis pipeline integrating multiple tools and peak callers (FastQC, Clumpify and BBDuk from the BBMap Suite, Trimmomatic, BWA, Samtools, deepTools, MACS2, GEM, SICER2, HOMER, Genrich, IDR, and the MEME-Suite). QC, cleanup, alignment, peak-calling, pathway analysis. High-confidence peaks based on overlaps by different peak callers. Input - single- or paired-end FASTQ files or aligned BAM files. Conda installable. Command line and GUI. Documentation.

Paper
Suryatenggara, Jeremiah, Kol Jia Yong, Danielle E. Tenen, Daniel G. Tenen, and Mahmoud A. Bassal. "ChIP-AP: an integrated analysis pipeline for unbiased ChIP-seq analysis." Briefings in Bioinformatics 23, no. 1 (January 2022) https://doi.org/10.1093/bib/bbab537

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data. http://www.regulatory-genomics.org, https://github.com/CostaLab/reg-gen
- HINT (Hmm-based IdeNtification of Transcription factor footprints) is a framework for detection of DNA footprints from DNase-Seq and histone modification ChIP-Seq data.
- Motif Analysis tools allows the search of motifs with binding sites enriched in particular genomic regions
- ODIN and THOR are HMM-based approaches to detect and analyse differential peaks in pairs of ChIP-seq data.
- RGT-Viz is a collection of tests for association analysis and tools for visualizaiton of genomic data such as files in BED and BAM format
- Triplex Domain Finder (TDF) statistically characterizes the triple helix potential of RNA and DNA regions.
ENCODE3 pipeline v1 specifications, https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#heading=h.9ecc41kilcvq
CHIPS - A Snakemake pipeline for quality control and reproducible processing of chromatin profiling data (ChIP-seq, ATAC-seq). Alignment, extensive QC, peak calling, downstream analysis (annotation, motif finding, putative targets). Generates an HTML report, plotly interactive plots. Distributed as a Conda recipe.
- Taing, Len, Gali Bai, Clara Cousins, Paloma Cejas, Xintao Qiu, Zachary T. Herbert, Myles Brown, et al. “CHIPS: A Snakemake Pipeline for Quality Control and Reproducible Processing of Chromatin Profiling Data.” F1000Research, (June 30, 2021)
AQUAS ChIP-seq processing pipeline - The AQUAS pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications, https://github.com/kundajelab/chipseq_pipeline
Crunch - Completely Automated Analysis of ChIP-seq Data, http://crunch.unibas.ch/, https://www.biorxiv.org/content/early/2016/03/09/042903
ChiLin - QC, peak calling, motif analysis for ChIP-seq and DNAse-seq data used by CistromeDb. References to other tools. https://github.com/cfce/chilin
- Qin, Qian, Shenglin Mei, Qiu Wu, Hanfei Sun, Lewyn Li, Len Taing, Sujun Chen, et al. “ChiLin: A Comprehensive ChIP-Seq and DNase-Seq Quality Control and Analysis Pipeline.” BMC Bioinformatics 17, no. 1 (October 3, 2016): 404. https://doi.org/10.1186/s12859-016-1274-4.
ChIP-eat - a pipeline for aligning reads, calling peaks, predicting TFBSs. https://bitbucket.org/CBGR/chip-eat/src/master/
- Gheorghe, Marius, Geir Kjetil Sandve, Aziz Khan, Jeanne Chèneby, Benoit Ballester, and Anthony Mathelier. “A Map of Direct TF–DNA Interactions in the Human Genome.” Nucleic Acids Research 47, no. 4 (February 28, 2019): e21–e21. https://doi.org/10.1093/nar/gky1210.
ChIPLine - a pipeline for ChIP-seq analysis, https://github.com/ay-lab/ChIPLine

Normalization

BAMscale - BAMscale is a one-step tool for either 1) quantifying and normalizing the coverage of peaks or 2) generated scaled BigWig files for easy visualization of commonly used DNA-seq capture based methods.
CHIPIN - ChIP-seq Intersample Normalization using gene expression. Assumption - non-differential genes should have non-differential peaks.
S3norm - Chip-seq normalization to sequencing depth AND signal-to-noise ratio to the common reference. Negative Binomial for modeling background, convert counts to -log10(p-values), use monotonic nonlinear model to match the means of the common peaks and backgrounds in two datasets. https://github.com/guanjue/S3norm
- Xiang, Guanjue, Cheryl Keller, Belinda Giardine, Lin An, Ross Hardison, and Yu Zhang. “S3norm: Simultaneous Normalization of Sequencing Depth and Signal-to-Noise Ratio in Epigenomic Data.” BioRxiv, January 1, 2018, 506634. https://doi.org/10.1101/506634.

CUT&RUN

CUT&Tag Data Processing and Analysis Tutorial
Methods with detailed commands of CUT&RUN data analysis - from Divya S. Vinjamur et al. "ZNF410 represses fetal globin by devoted control of CHD4/NuRD," bioRxiv, August 31, 2020.
CUT&TAG technology Cleavage Under Target and Tagmentation. Compared with CUT&RUN that uses MNase, it uses Tn5 transposase, reactions performed within intact cells, performed on a solid support (tethered). Better suited for low cell numbers, low cost. Tested on H3K27me3 and RNAPII profiling in K562, compared with the same CUT&RUN data. Sharper peaks, nearly 20X more that ChIP-seq. Compared with ATAC-seq in K562, H3K4me2, better signal-to-noise ratio, even at low sequencing depth. Tested using NPAT and CTCF transcription factors. Methods - alignment (bowtie2) and peak calling (MACS2) settings.
- Kaya-Okur, Hatice S., Steven J. Wu, Christine A. Codomo, Erica S. Pledger, Terri D. Bryson, Jorja G. Henikoff, Kami Ahmad, and Steven Henikoff. “CUT&Tag for Efficient Epigenomic Profiling of Small Samples and Single Cells.” Nature Communications 10, no. 1 (December 2019)
CUT&RUN technology, chromatin profiling strategy, antibody-targeted controlled cleavage by micrococcal nuclease. Cost-efficient, low input requirements, easier.
- Skene, Peter J, and Steven Henikoff. “An Efficient Targeted Nuclease Strategy for High-Resolution Mapping of DNA Binding Sites.” Genes and Chromosomes
SEARC (Sparse Enrichment Analysis for CUT&RUN) peak caller for CUT&RUN data. Data-driven, peaks with respect to global background or IgG control. Compared to MACS2 and HOMER, more precise and maintains true positive rate at low read depth. Better call wide peaks. Input - bedGraph, output - BED. Command line and web server
- Meers, MP, Tenenbaum, D and Henikoff S (2019). "Peak calling by sparse enrichment analysis for CUT&RUN chromatin profiling". Epigenetics & Chromatin 2019 12:42.
CUT&RUNTools 2.0 - extended functionality to handle single-cell data, data normalization, peak calling (MACS2, SEACR), dimensionality reduction (Latent Semantic Indexing), downstream functional analysis.
- Yu, Fulong, Vijay G Sankaran, and Guo-Cheng Yuan. “CUT&RUNTools 2.0: A Pipeline for Single-Cell and Bulk-Level CUT&RUN and CUT&Tag Data Analysis,” Bioinformatics, 09 July 2021
CUT&RUNTools - a pipeline to fully process CUT&RUN data and identify protein binding and genomic footprinting from antibody-targeted primary cleavage data. Implemented in R, Python, Bach, runs under the SLURM job submission. At the core, creates a cut matrix of from enzyme cleavage data. Compared with Atactk and Centipede. (Tested, didn't work)
- Zhu, Qian. “CUT&RUNTools: A Flexible Pipeline for CUT&RUN Processing and Footprint Analysis,” 2019, 12.

Quality control

ChIPQC - Quality metrics for ChIPseq data
phantompeakqualtools - This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays. https://github.com/kundajelab/phantompeakqualtools

Peaks

Benchmarking of 14 ChIP-seq tools for peak calling and differential analysis (described in supplementary). Experimental and simulated data, narrow and broad peaks, with/without input, replicates. DEG enrichment analysis. Poor agreement. MACS2 performs OK for sharp peaks. Figure 7 - decision tree for tool selection.

Paper
Steinhauser, Sebastian, Nils Kurzawa, Roland Eils, and Carl Herrmann. “A Comprehensive Comparison of Tools for Differential ChIP-Seq Analysis,” n.d., 14.

LanceOtron - deep learning-based peak caller from TF and histone ChIP-seq, ATAC-seq, DNAse-seq. Input - bigWig coverage file (+input, if available). Image recognition using wide and deep model (logistic regression producing enrichment scores, CNN, multilayer perceptron, Fig. 1c, Methods). Trained on hand-labeled data. Outperforms MACS2. Visualization using MLV genome visualization software. Website with videos, documentation.

Paper
Hentges, Lance D., Martin J. Sergeant, Damien J. Downes, Jim R. Hughes, and Stephen Taylor. "LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq." bioRxiv (2021). https://doi.org/10.1101/2021.01.25.428108

epic2 - diffuse ChIP-seq peak caller, Cython reimplementation of SICER, 30X times faster, 1/7 memory use. Available on Conda and GitHub
- Stovner, Endre Bakken. “Epic2 Efficiently Finds Diffuse Domains in ChIP-Seq Data,” Bioinformatics. 2019 Nov 1
Genrich - Detecting sites of genomic enrichment in ChIP-seq and ATAC-seq. https://github.com/jsh58/Genrich, unpublished but highly tested and recommented, https://informatics.fas.harvard.edu/atac-seq-guidelines.html
mosaics - This package provides functions for fitting MOSAiCS and MOSAiCS-HMM, a statistical framework to analyze one-sample or two-sample ChIP-seq data of transcription factor binding and histone modification. https://bioconductor.org/packages/release/bioc/html/mosaics.html
RSEG - ChIP-seq broad domain analysis. http://smithlabresearch.org/software/rseg/
- Song, Qiang, and Andrew D Smith. “Identifying Dispersed Epigenomic Domains from ChIP-Seq Data.” Bioinformatics 27, no. 6 (2011): 870–871.
triform - finds enriched regions (peaks) in transcription factor ChIP-sequencing data. https://bioconductor.org/packages/release/bioc/html/triform.html

Enhancers

ROSE - rank-ordering of super-enhancers using H3K27ac ChIP-seq data, by the Young lab.
LILI - a pipeline by Boeva lab for detection of super-enhancers using H3K27ac ChIP-seq data, which includes explicit correction for copy number variation inherent to cancer samples. The pipeline is based on the ROSE algorithm originally developed by the Young lab.
CenhANCER - a cancer enhancer database, curating public H3K27ac ChIP-seq data from 805 primary tissue samples and 671 cell line samples across 41 cancer types. 57 029 408 typical enhancers, 978 411 super-enhancers and 226 726 enriched transcription factors. Annotated with SNPs. Table 1 - comparison with other resources (CancerEnD, OncoBase, OncoCis, ENdb, DiseaseEnhancer, SEdb, SEanalysis).

Paper
Luo, Zhi-Hui, Meng-Wei Shi, Yuan Zhang, Dan-Yang Wang, Yi-Bo Tong, Xue-Ling Pan, and ShanShan Cheng. “CenhANCER: A Comprehensive Cancer Enhancer Database for Primary Tissues and Cell Lines.” Database 2023 (May 18, 2023): baad022. https://doi.org/10.1093/database/baad022.

Visualization

DeepTools - a suite of python tools particularly developed for the efficient analysis and visualization of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq. deepStats - a stastitical toolbox with additional tools for deeptools and genomic signals.
Visualizations of ChIP-Seq data using Heatmaps, https://www.biostars.org/p/180314/
ChAsE - Chromatin Analysis & Exploration Tool. http://chase.cs.univie.ac.at/overview
ChromHeatMap - Heat map plotting by genome coordinate. https://bioconductor.org/packages/release/bioc/html/ChromHeatMap.html
BAM2WIG - a flexible tool to generate read coverage profile (WIG file) from a BAM file. http://www.epigenomes.ca/tools-and-software
EaSeq - peak calling (MACS), visualization, and analysis of ChIP-seq experiments. GUI, Windows-based, stand-alone. Figure 1, 3 - range of functionality, compared with other tools. https://easeq.net/downloadeaseq/. Description of tools: http://easeq.net/tools.pdf, Visualization examples: http://easeq.net/plots.pdf, Workflow examples: http://easeq.net/examples.pdf
- Lerdrup, Mads, Jens Vilstrup Johansen, Shuchi Agrawal-Singh, and Klaus Hansen. “An Interactive Environment for Agile Analysis and Visualization of ChIP-Sequencing Data.” Nature Structural & Molecular Biology 23, no. 4 (April 2016): 349–57. https://doi.org/10.1038/nsmb.3180.
pygv - a minimal, scriptable IGV-like genome browser for python
Zerone - combine multiple ChIP-seq profiles into one discretized profile. HMM with zero-inflated negative multinomial emissions across windowed genome. QC using SVM trained on ENCODE data to distinguish good from bad samples. Requires two negative controls. Compared against peaks called by MACS, BayesPeak, JAMM. https://github.com/nanakiksc/zerone
- Cuscó, Pol, and Guillaume J. Filion. “Zerone: A ChIP-Seq Discretizer for Multiple Replicates with Built-in Quality Control.” Bioinformatics 32, no. 19 (October 1, 2016): 2896–2902. https://doi.org/10.1093/bioinformatics/btw336.

Intersections

BedSect - web server for intersection analysis of genomic regions, UpSet and correlation plots. Gene-centric, GREAT enrichment analysis. Integrated with the GTRD database. GitHub.

Paper
Mishra, Gyan Prakash, Arup Ghosh, Atimukta Jha, and Sunil Kumar Raghav. “BedSect: An Integrated Web Server Application to Perform Intersection, Visualization, and Functional Annotation of Genomic Regions From Multiple Datasets.” Frontiers in Genetics 11 (February 5, 2020): 3. https://doi.org/10.3389/fgene.2020.00003.

Intervene - command line and web server for venn diagrams of overlaps of genomic regions (up to six sets), UpSet plot, correlation heatmap. Python (pybedtools, Seaborn, Matplotlib) and R (UpSetF, Corrplot, Venerable). BitBucket.

Paper
Khan, Aziz, and Anthony Mathelier. “Intervene: A Tool for Intersection and Visualization of Multiple Gene or Genomic Region Sets.” BMC Bioinformatics 18, no. 1 (December 2017): 287. https://doi.org/10.1186/s12859-017-1708-7.

Motif analysis

memes - an R package interfacing MEME suite (DREME, ME, FIMO, TOMTOM). Using universalmotif_df R/Bioconductor object to make results compatible across tools. De novo motif discovery, differential motifs, known motif enrichment analysis. Visualization capabilities. Case example on ChIP-seq peaks in Drosophila wing development. Requires installation of MEME suite. Docker container with RStudio and everything configured. Bioconductor, pkgdown website.

Paper
Nystrom, Spencer L, and Daniel J McKay. “Memes: A Motif Analysis Environment in R Using Tools from the MEME Suite.” PLOS COMPUTATIONAL BIOLOGY, n.d., 14.

ChEA3 - transcription factor enrichment in gene lists. Six reference libraries of TF regulatory signatures (ARCHS4, ENCODE, GTeX, ReMap, Enrichr, Literature). Fisher's exact test. Outperform VIPER, DoRothEA, BART, TFEA.ChIP, oPOSSUM, MAGICACT.

Paper
Keenan, Alexandra B, Denis Torre, Alexander Lachmann, Ariel K Leong, Megan L Wojciechowicz, Vivian Utti, Kathleen M Jagodnik, Eryk Kropiwnicki, Zichen Wang, and Avi Ma’ayan. “[ChEA3: Transcription Factor Enrichment Analysis by Orthogonal Omics Integration](https://doi.org/10.1093/nar/gkz446).” Nucleic Acids Research, (July 2, 2019)

TFEA.ChIP - R package for transcription factor enrichment of gene lists (hypergeometric and GSEA) using experimental ChIP-seq datasets (ENCODE, GEO). Tested on known signatures, compared with two PWM-based and ChIP-based, performs comparably or better.
- Puente-Santamaria, Laura, Wyeth W. Wasserman, and Luis Del Peso. "TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets." Bioinformatics, (2019)
PWMScan - web tool for scanning entire genomes with a position-specific weight matrix. Multiple genomes and assemblies hosted on the server. Multiple PWM collections for Eukaryotic DNA (JASPAR, HOCOMOCO, SwissRegulon, UniPROBE, CIS-BP, from Jomla, Isakova publications) matrix_scan C program for matching PWMs. Compared with other motif scanning tools (PoSSuMseqrch, Patser, RSAT, STORM, HOMER), overlap >99%. Output - BEDdetail format. Code.

Paper
Ambrosini, Giovanna, Romain Groux, and Philipp Bucher. “PWMScan: A Fast Tool for Scanning Entire Genomes with a Position-Specific Weight Matrix.” Edited by John Hancock. Bioinformatics 34, no. 14 (July 15, 2018): 2483–84. https://doi.org/10.1093/bioinformatics/bty127.

gimmemotifs - framework for TF motif analysis using an ensemble of motif predictors. maelstrom tool to detect differential motif activity between multiple different conditions. Includes manually curated database of motifs. Benchmark of 14 motif detection tools - Homer, MEME, BioProspector are among the top performing. Extensive analysis results. Documentation. Tweet with updates
- Bruse, Niklas, and Simon J. van Heeringen. “GimmeMotifs: An Analysis Framework for Transcription Factor Motif Analysis,” November 20, 2018
DECOD - Differential motif finder. k-mer-based. http://gene.ml.cmu.edu/DECOD/
- Huggins, Peter, Shan Zhong, Idit Shiff, Rachel Beckerman, Oleg Laptenko, Carol Prives, Marcel H. Schulz, Itamar Simon, and Ziv Bar-Joseph. “DECOD: Fast and Accurate Discriminative DNA Motif Finding.” Bioinformatics 27, no. 17 (September 1, 2011): 2361–67. https://doi.org/10.1093/bioinformatics/btr412.
Non-redundant TF motif matches genome-wide - Clustering of 2179 motif models. hg38/mm10 BED files download with coordinates
homerkit - Read HOMER motif analysis output in R.
LISA - epigenetic Landscape In silico Subtraction analysis, enriched TFs and chromatin regulators in a list of genes.
Logolas - R package for Enrichment Depletion Logos (EDLogos) and String Logos.
marge - API for HOMER in R for Genomic Analysis using Tidy Conventions, GitHub
motifbreakR - R package for predicting the disruptiveness of single nucleotide polymorphism on TFBSs. SNPs may be a list of rsIDs or a BED file. Includes MotifDB PWMs and others (ENCODE, Factorbook, Hocomoco, homer).
motifStack - R package for plotting stacked logos for single or multiple DNA, RNA and amino acid sequence.
pyjaspar - A Pythonic interface to query and access JASPAR transcription factor motifs
RcisTarget - R package to identify transcription factor binding motifs enriched on a list of genes or genomic regions.
rGADEM - R package for de novo motif discovery.

Differential peak detection

diffTF - differential TF activity calculation and integration with RNA-seq data for classification of TFs into activators or repressors. Differential analysis on consensus peaks using permutations or statistics (diffPeaks). Input: BAM, fasta files, RNA-seq counts, external TFBS data (HOCOMOCO, JASPAR, TRRUST, or ReMap). Applied to several datasets, including multiomics, recovers known biology, experimental validation. Implemented as a Snakemake pipeline. Singularity, conda installation. Documentation.

Paper
Berest, Ivan, Christian Arnold, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rasmussen, Holly Giles, Peter-Martin Bruch, et al. “Quantification of Differential Transcription Factor Activity and Multiomics-Based Classification into Activators and Repressors: DiffTF.” Cell Reports 29, no. 10 (December 2019): 3147-3159.e12. https://doi.org/10.1016/j.celrep.2019.10.106.

normR - a Bioconductor R package, data-driven normalization and difference calling approach for ChIP-seq data. Models ChIP- and control read counts by binomial mixture model. One component models the background, the other models the signal. Can work without control.
- Helmuth, Johannes, et al. "normR: Regime enrichment calling for ChIP-seq data." BioRxiv (2016)
csaw - Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control. https://bioconductor.org/packages/release/bioc/html/csaw.html
DiffBind - Differential Binding Analysis of ChIP-Seq Peak Data. https://bioconductor.org/packages/release/bioc/html/DiffBind.html

Enrichment

UniBind Enrichment Analysis, also differential enrichment. Input - BED file in hg38 version. LOLA as an enrichment and database engine.

Interpretation

Lisa - web server to determine the transcription factors and chromatin regulators that are directly responsible for the perturbation of a differentially expressed gene set (chrom-PR score). Using public and custom human and mouse DNase-seq, and H3K27ac ChIP-seq profiles (CistromeDB). Input: list of differential genes. GitHub.

Paper
Qin, Qian, Jingyu Fan, Rongbin Zheng, Changxin Wan, Shenglin Mei, Qiu Wu, Hanfei Sun, et al. “Lisa: Inferring Transcriptional Regulators through Integrative Modeling of Public Chromatin Accessibility and ChIP-Seq Data.” Genome Biology 21, no. 1 (December 2020): 32. https://doi.org/10.1186/s13059-020-1934-6.

Cistrome-GO - functional enrichment analysis of genes regulated by TFs in human and mouse. Solo mode (ChIP-seq peaks only) or ensemble mode (integrates ChIP-seq peaks and RNA-seq differentially expressed genes). Implementation of BETA method. MACS2 peaks, DESeq2 output. Gene-centric regulatory potential (RP) score (exponentially weighted by distance sum of peaks). Human (hg19/hg38), Mouse (mm9/mm10).

Paper
Li, Shaojuan, Changxin Wan, Rongbin Zheng, Jingyu Fan, Xin Dong, Clifford A. Meyer, and X. Shirley Liu. “Cistrome-GO: A Web Server for Functional Enrichment Analysis of Transcription Factor ChIP-Seq Peaks.” Nucleic Acids Research 47, no. W1 (July 2, 2019): W206–11. https://doi.org/10.1093/nar/gkz332.

Toolkit for Cistrome Data Browser - online tool to answer questions like:
- What factors regulate your gene of interest?
- What factors bind in your interval?
- What factors have a significant binding overlap with your peak set?
Chongzhi Zhang software page, http://faculty.virginia.edu/zanglab/software.htm
- BART (Binding Analysis for Regulation of Transcription), a bioinformatics tool for predicting functional transcription factors (TFs) that bind at genomic cis-regulatory regions to regulate gene expression in the human or mouse genomes, given a query gene set or a ChIP-seq dataset as input. http://bartweb.uvasomrc.io/
- MARGE (Model-based Analysis of Regulation of Gene Expression), a comprehensive computational method for inference of cis-regulation of gene expression leveraging public H3K27ac genomic profiles in human or mouse. http://cistrome.org/MARGE/
- MANCIE (Matrix Analysis and Normalization by Concordant Information Enhancement), a computational method for high-dimensional genomic data integration. https://cran.r-project.org/web/packages/MANCIE/index.html
- SICER (Spatial-clustering Identification of ChIP-Enriched Regions), a ChIP-Seq data analysis method. https://home.gwu.edu/~wpeng/Software.htm
UROPA - Universal RObustPeak Annotator, a command line based tool intended for genomic region annotation. Definition of overlap/proximity types. Documentation

Excludable

CUT&RUN blacklists for human (hg38) and mouse (mm10) genomes. Different biochemical properties than ChIP-seq, SEACR peak caller that uses global background. 20 C&R negative control datasets per human/mouse genome, consistently called artifactual peaks (the highest 0.1% signals in more than 30% of replicates, peaks extended by 1Kb) are assembled into blacklists. Also contain mitochondrial sequences (NUMTs). Tested bowtie2 and bowtie alignment strategies. Cover approximately 0.2% of the genome, removing reads overlapping them increases variability among samples (PCA). Compared with the Boyle's Blacklist-generated lists. BED coordinates in supplementary.

Paper
Nordin, Anna, Gianluca Zambanini, Pierfrancesco Pagella, and Claudio Cantù. “The CUT&RUN Blacklist of Problematic Regions of the Genome.” Preprint. Genomics, November 14, 2022. https://doi.org/10.1101/2022.11.11.516118.

Blacklist - Application for making ENCODE Blacklists, and links to canonical blacklists. C, C++.

Paper
Amemiya, Haley M., Anshul Kundaje, and Alan P. Boyle. “The ENCODE Blacklist: Identification of Problematic Regions of the Genome.” Scientific Reports 9, no. 1 (December 2019): 9354. https://doi.org/10.1038/s41598-019-45839-z.

GEM - mappability calculations for each genomic region, accounting for mismatches. Pre-calculated UCSC genome browser tracks for human and mouse. Mappability of genes, both protein-coding and non-protein coding. RPKUM - unique exons for quantifying gene expression.

Paper
Derrien, Thomas, Jordi Estellé, Santiago Marco Sola, David G. Knowles, Emanuele Raineri, Roderic Guigó, and Paolo Ribeca. “Fast Computation and Applications of Genome Mappability.” PloS One 7, no. 1 (2012): e30377. https://doi.org/10.1371/journal.pone.0030377.

Greenscreen - an approach for removing false-positive peaks (ultra-high noise) from ChIP-seq data (also, CUT&RUN) using MACS2 (broadpeak setting, optimized significance threshold and merging distance to match Blacklist-created regions). As effective as canonical blacklists, improves true factor binding overlap, improves Standardized Standard Deviation (SSD -> 1), improves replicate correlation structure. Uses as few as three samples, 99.9% overlap with Blacklist-created regions, smaller genomic footprint, same performance as Blacklist-generated.

Paper
Klasfeld, Sammy, and Doris Wagner. “Greenscreen Decreases Type I Errors and Increases True Peak Detection in Genomic Datasets Including ChIP-Seq.” Preprint. Genomics, March 1, 2022. https://doi.org/10.1101/2022.02.27.482177.

Manually annotated GRCh38 blacklisted regions on ENCODE data portal. Tweet by Anshul Kundaje
Repetitive centromeric, telomeric and satellite regions known to have low sequencing confidence - blacklisted regions defined by the ENCODE project - from Upton et al., “Epigenomic Profiling of Neuroblastoma Cell Lines.”
UCSC unusual regions on assembly structure, Tweet

DNAse-seq

DNAse-seq analysis guide. Tools for QC, peak calling, analysis, footprint detection, motif analysis, visualization, all-in-one tools (Table 2)
- Liu, Yongjing, Liangyu Fu, Kerstin Kaufmann, Dijun Chen, and Ming Chen. “A Practical Guide for DNase-Seq Data Analysis: From Data Management to Common Applications.” Briefings in Bioinformatics, July 12, 2018. https://doi.org/10.1093/bib/bby057.

ATAC-seq

awesome-atac-analysis Awesome ATAC-seq analysis by Nathan Sheffield.
Benchmarking ATAC-seq peak calling by Austin Montgomery
ATAC-seq analysis considerations. Considering multiple workflows, settling on csaw-based. Normalization by library complexity (downsampling) is important. Workflow and GitHub with all scripts.

Paper
Reske, Jake J., Mike R. Wilson, and Ronald L. Chandler. “ATAC-Seq Normalization Method Can Significantly Affect Differential Accessibility Analysis and Interpretation.” Epigenetics & Chromatin 13, no. 1 (December 2020): 22. https://doi.org/10.1186/s13072-020-00342-y.

UNMC_ATACseq_Tutorial - An open-source interactive pipeline tutorial for differential ATAC-seq footprint analysis on the cloud (Google, AWS, Azure)
OCHROdb - a database of open chromatin regions (over 1.4M). 828 DNAse-I experiments, 194 cell lines, uniformly processed, QC'd,peaks called using Hotspot, regulatory elements clustered across all samples, batch effect corrected, reproducible peaks statistically selected. Data from ENCODE, Roadmap Epigenomics Mapping Consortium (REMC), Blueprint Epigenome and Genomics of Gene Regulation (GGR). Downloadable metadata, curated DHS dataset (full and chromosome-specific, BED format with cell/tissue-specific columns with accessibility values), visualized in JBrowse.

Paper
Shooshtari, Parisa, Samantha Feng, Viswateja Nelakuditi, Reza Asakereh, Nader Hosseini Naghavi, Justin Foong, Michael Brudno, and Chris Cotsapas. “Developing OCHROdb, a Comprehensive Quality Checked Database of Open Chromatin Regions from Sequencing Data.” Scientific Reports 13, no. 1 (May 18, 2023): 8106. https://doi.org/10.1038/s41598-022-26791-x.

DNAseI hypersensitive sites from 733 biosamples (439 cell andtissue types and states). NMF to simplify pattern detection. NMF patterns better explain heritability. Data at ENCODE and Zenodo, data browser. Twitter, data download

Paper
Meuleman, Wouter, Alexander Muratov, Eric Rynes, Jessica Halow, Kristen Lee, Daniel Bates, Morgan Diegel, et al. “Index and Biological Spectrum of Human DNase I Hypersensitive Sites.” Nature, July 29, 2020. https://doi.org/10.1038/s41586-020-2559-3.

ATAC-seq pipelines

ENCODE ATAC-seq pipeline - ATAC-seq and DNase-seq processing pipeline by Anshul Kundaje
TOBIAS (Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal) - transcription factor footprinting framework for ATAC-seq data. Corrects for Tn5 bias (ATACorrect module, Figure 1). Outperforms HINT-ATAC, PIQ, Wellington, similar or better performance as msCentipede. Validated using paired ATAC-seq and ChIP-seq data. Visualization of aggregated ATAC-seq signals, differential and time course analysis, TF clustering, network building. Input - BAM file, genome FASTA, BED peaks. Output - bigWigs of uncorrected, corrected signals, expected and corrected symbols. Conda, Neftflow implementation.

Paper
Bentsen, Mette, Philipp Goymann, Hendrik Schultheis, Kathrin Klee, Anastasiia Petrova, René Wiegandt, Annika Fust, et al. “ATAC-Seq Footprinting Unravels Kinetics of Transcription Factor Binding during Zygotic Genome Activation.” Nature Communications 11, no. 1 (August 26, 2020): 4267. https://doi.org/10.1038/s41467-020-18035-1.

HINT-ATAC - a footprinting method considering ATAC-seq protocol biases. Uses a position dependency model (PDM) to learn the cleavage preferences (Methods). Compared against three footprinting methods, DNase2TF, PIQ, Wellington. PDMs are crucial for correction of cleavage bias for ATAC-seq for all methods. Also improves correction for DNAse-seq data. Comparison of protocols, Omni-ATAC (best performance), Fast-ATAC. Part of RGT, Regulatory Genomics Toolbox. Tutorial.

Paper
Li, Zhijian, Marcel H. Schulz, Thomas Look, Matthias Begemann, Martin Zenke, and Ivan G. Costa. “Identification of Transcription Factor Binding Sites Using ATAC-Seq.” Genome Biology, (December 2019). https://doi.org/10.1186/s13059-019-1642-2

HMMRATAC - hidden Markov model for ATAC-seq to identify open chromatin regions. Parametric modeling of nucleosome-free regions and three nucleosomal reatures (mono-, di-, and tri-nucleosomes). First, train on 1000 auto-selected regions, then predict. Tested on "active promoters" and "strong enhancers" chromatin states (positive examples), and "heterochromatin" (negative examples). Compared with MACS2, F-seq.

Paper
Tarbell, Evan D, and Tao Liu. “HMMRATAC: A Hidden Markov ModeleR for ATAC-Seq.” Nucleic Acids Research, June 14, 2019, gkz533. https://doi.org/10.1093/nar/gkz533

ATACseqQC - R package for ATAC-seq quality control and analysis. QC, preprocessing, read shift, peak calling, motif analysis, enrichment in nucleosome-free regions, plotting (heatmaps, library complexity). Table 1 - summary of functions. Additional material - examples of commands, table of comparison with other pipelines.

Paper
Ou, Jianhong, Haibo Liu, Jun Yu, Michelle A. Kelliher, Lucio H. Castilla, Nathan D. Lawson, and Lihua Julie Zhu. “ATACseqQC: A Bioconductor Package for Post-Alignment Quality Assessment of ATAC-Seq Data.” BMC Genomics 19, no. 1 (December 2018): 169. https://doi.org/10.1186/s12864-018-4559-3.

atac_chip_preprocess - Preprocessing workflow for ATAC-seq and ChIP-seq data, Nextflow pipeline.
ATAC-seq peak calling using MACS2: macs2 callpeak --nomodel --nolambda -- keep-dup all --call-summits -f BAMPE -g hs
ATACProc - ATAC-seq processing pipeline
atacseq - nf-core ATAC-seq peak-calling and differential analysis pipeline.
pepatac - A modular, containerized pipeline for ATAC-seq data processing. Examples and documentation

Histone-seq

Homer program ‘findPeaks’ with the style ‘histone’. Peaks within 1 kb were merged into a single peak. Broad peaks in H3K36me3, H3K27me3 and H3K9me3 were called using the Homer program ‘findPeaks’ with the options ‘-region –size 1000 –minDist 2500’. When Homer runs with these options, the initial sets of peaks were 1 kb wide and peaks within 2.5 kb were merged.

DEScan2 - broad peak (histone, ATAC, DNAse) analysis (peak caller, peak filtering and alignment across replicates, creation of a count matrix). Peak caller uses a moving window and calculated a Poisson likelihood of a peak as compared to a region outside the window. https://bioconductor.org/packages/release/bioc/html/DEScan2.html
- Righelli, Dario, John Koberstein, Nancy Zhang, Claudia Angelini, Lucia Peixoto, and Davide Risso. “Differential Enriched Scan 2 (DEScan2): A Fast Pipeline for Broad Peak Analysis.” PeerJ Preprints, 2018.
HMCan and HMCan-diff - histone ChIP-seq peak caller (and differential) that accounts for CNV, also for CG bias. Hidden Markov Model to detect peak signal. Control-FREEC to detect CNV in ChIP-seq data. Outperforms others, CCAT second best. https://www.cbrc.kaust.edu.sa/hmcan/
- Ashoor, Haitham, Aurélie Hérault, Aurélie Kamoun, François Radvanyi, Vladimir B. Bajic, Emmanuel Barillot, and Valentina Boeva. “HMCan: A Method for Detecting Chromatin Modifications in Cancer Samples Using ChIP-Seq Data.” Bioinformatics (Oxford, England) 29, no. 23 (December 1, 2013): 2979–86. https://doi.org/10.1093/bioinformatics/btt524.
- Ashoor, Haitham, Caroline Louis-Brennetot, Isabelle Janoueix-Lerosey, Vladimir B. Bajic, and Valentina Boeva. “HMCan-Diff: A Method to Detect Changes in Histone Modifications in Cells with Different Genetic Characteristics.” Nucleic Acids Research 45, no. 8 (05 2017): e58. https://doi.org/10.1093/nar/gkw1319.
RSEG - ChIP-seq analysis for identifying genomic regions and their boundaries marked by diffusive histone modification markers, such as H3K36me3 and H3K27me3, http://smithlabresearch.org/software/rseg/

Broad peak analysis

EDD - Enriched Domain Detector, a ChIP-seq peak caller for detection of megabase domains of enrichment.
epic2 - an ultraperformant reimplementation of SICER. It focuses on speed, low memory overhead and ease of use.
- Stovner, Endre Bakken, and Pål Sætrom. "epic2 efficiently finds diffuse domains in ChIP-seq data." Bioinformatics, (2019)
DEScan2 - Integrated peak and differential caller, specifically designed for broad epigenomic signals, R package.

Technology

ATAC-STARR-seq - updated protocol that combined transposase-accessible chromatin (ATAC-seq) with self-transcribing active regulatory region sequencing (STARR-seq) to selectively assay the regulatory potential of accessible DNA. Includes protocols for plasmid library generation, reporter assay, data analysis (peak-within-peak calling, adapted DESeq2 to normalize reporter RNA read counts to plasmid DNA read counts. Keep duplicates). Agrees with ATAC-seq, much less noisy. GSE181317 - data. GitHub - computational pipeline.

Paper
Hansen, Tyler J., and Emily Hodges. “Identifying Transcription Factor-Bound Activators and Silencers in the Chromatin Accessible Human Genome Using ATAC-STARR-Seq.” Preprint. Genomics, March 28, 2022. https://doi.org/10.1101/2022.03.25.485870.

CLIP-seq (cross-linking and immunoprecipitation) technology, detects sites bound by a protein to RNAs.Figure 1 - technology overview, Figure 2 - details of HITS-CLIP/iCLIP/irCLIP/eCLIP/PAR-CLIP/Proximity-CLIP. Computational analysis, Table 3 - peak detection software. Databases (doRiNA, ENCORI, POSTAR3).

Paper
Hafner, Markus, Maria Katsantoni, Tino Köster, James Marks, Joyita Mukherjee, Dorothee Staiger, Jernej Ule, and Mihaela Zavolan. "CLIP and complementary methods." Nature Reviews Methods Primers 1, no. 1 (2021): 1-23. https://doi.org/10.1038/s43586-021-00018-1

STARR-seq (self-transcribing active regulatory region sequencing) technology for enhancer identification. 3 min Video protocol. Applied to Drosophila genome. The majority (55.6%) of identified enhancers were located within introns, especially in the first intron (37.2%), and in intergenic regions (22.6%). Many genes appeared to be regulated by several independently functioning enhancers.

Paper
Arnold, Cosmas D., Daniel Gerlach, Christoph Stelzer, Łukasz M. Boryń, Martina Rath, and Alexander Stark. “Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-Seq.” Science 339, no. 6123 (March 2013): 1074–77. https://doi.org/10.1126/science.1232542.

Machine learning

maxATAC - TFBS prediction from ATAC-seq (bulk and pseudobulk) in any cell type (whole genome, chromosome, or region). Deep dilated convolutional neural networks, bigWig and BED predictions of TFBSs. Models avaliable for 127 human TFs (h5 files). Outperforms baseline (average ChIP-seq signal, motif scanning) for most TFs and cell lines. AUPR is similar to the top performer in the ENCODE-DREAM in vivo TFBS prediction challenge (0.4). OMNI-ATAC-seq data for three cell lines, to be available. ATAC-seq scaling to signal per replicate to 20 million mapped reads (RP20M) and min-max normalized to 99th percentile signals. Python, separate functions for each step (prepare, average, normalize, train, predict, benchmark, peaks, variants). Tweet 1, Tweet 2.

Paper
Cazares, Tareian A, Faiz W Rizvi, Balaji Iyer, Xiaoting Chen, Michael Kotliar, Joseph A Wayman, Anthony Bejjani, et al. “MaxATAC: Genome-Scale Transcription-Factor Binding Prediction from ATAC-Seq with Deep Neural Networks.” Preprint. Bioinformatics, January 29, 2022. https://doi.org/10.1101/2022.01.28.478235.

Segmentation and genome annotation (SAGA) algorithms review. Methods and tools for finding patterns from multiple ChIP-seq, histone-seq, etc. measures (Table 1). Hidden Markov Model (HMM), Dynamic Bayesian Network (DBN) algorithms. HMM intuition, math, solution algorithms. Visualization. Future work, challenges.

Paper
Libbrecht, Maxwell W., Rachel C. W. Chan, and Michael M. Hoffman. “Segmentation and Genome Annotation Algorithms for Identifying Chromatin State and Other Genomic Patterns.” Edited by Tamar Schlick. PLOS Computational Biology 17, no. 10 (October 14, 2021): e1009423. https://doi.org/10.1371/journal.pcbi.1009423.

Misc

covtobed - a tool to generate BED coverage tracks from BAM files. https://github.com/telatin/covtobed
UCSC Genome Browser API to retrieve DNA sequence from coordinates.
- https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM;start=4321;end=5678
- https://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:4336341,4336599

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Notes on ChIP-seq and other-seq-related tools

Table of content

Databases

Motif DBs

ChIP-seq

ChIP-seq pipelines

Normalization

CUT&RUN

Quality control

Peaks

Enhancers

Visualization

Intersections

Motif analysis

Differential peak detection

Enrichment

Interpretation

Excludable

DNAse-seq

ATAC-seq

ATAC-seq pipelines

Histone-seq

Broad peak analysis

Technology

Machine learning

Misc

About

Releases

Packages

License

mdozmorov/ChIP-seq_notes

Folders and files

Latest commit

History

Repository files navigation

Notes on ChIP-seq and other-seq-related tools

Table of content

Databases

Motif DBs

ChIP-seq

ChIP-seq pipelines

Normalization

CUT&RUN

Quality control

Peaks

Enhancers

Visualization

Intersections

Motif analysis

Differential peak detection

Enrichment

Interpretation

Excludable

DNAse-seq

ATAC-seq

ATAC-seq pipelines

Histone-seq

Broad peak analysis

Technology

Machine learning

Misc

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages