Skip to content

A scalable SCENIC workflow for single-cell gene regulatory network analysis

License

Notifications You must be signed in to change notification settings

chaetognatha/SCENICprotocol

Repository files navigation

A scalable SCENIC workflow for single-cell gene regulatory network analysis

Overview

SCENIC workflow diagram


Quick start

Running the pipeline in a Jupyter notebook

We recommend using this notebook as a template for running an interactive analysis in Jupyter. See the installation instructions for information on setting up a kernel with pySCENIC and other required packages.

Running the Nextflow pipeline on the example dataset

Requirements (Nextflow/containers)

The following tools are required to run the steps in this Nextflow pipeline:

The following container images will be pulled by nextflow as needed:

Download testing dataset

Download a minimum set of SCENIC database files for a human dataset (approximately 78 MB). This small test dataset takes approximately 30s to run using 6 threads on a standard desktop computer.

mkdir example && cd example/
# Transcription factors:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/allTFs_hg38.txt 
# Motif to TF annotation database:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/motifs.tbl
# Ranking databases:
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/genome-ranking.feather
# Finally, get a small sample expression matrix (loom format):
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/expr_mat.loom

Running the example pipeline

Either Docker or Singularity images can be used by specifying the appropriate profile (-profile docker or -profile singularity).

Using loom input
nextflow run aertslab/SCENICprotocol \
    -profile docker \
    --loom_input expr_mat.loom \
    --loom_output pyscenic_integrated-output.loom \
    --TFs allTFs_hg38.txt \
    --motifs motifs.tbl \
    --db *feather

By default, this pipeline uses the container tag specified by the --pyscenic_tag parameter. This is currently set to 0.9.16, which uses a container with both pySCENIC and Scanpy 1.4.4.post1 installed. A custom container can be used (e.g. one built on a local machine) by passing the name of this container to the --pyscenic_container parameter.

Expected output

The output of this pipeline is a loom-formatted file (by default: output/pyscenic_integrated-output.loom) containing: * The original expression matrix * The pySCENIC-specific results: * Regulons (TFs and their target genes) * AUCell matrix (cell enrichment scores for each regulon) * Dimensionality reduction embeddings based on the AUCell matrix (t-SNE, UMAP) * Results from the parallel best-practices analysis using highly variable genes: * Dimensionality reduction embeddings (t-SNE, UMAP) * Louvain clustering annotations

General requirements for this workflow

  • Python version 3.6 or greater
  • Tested on various Unix/Linux distributions (Ubuntu 18.04, CentOS 7.6.1810, MacOS 10.14.5)

References and more information

SCENIC

SCope

Scanpy

About

A scalable SCENIC workflow for single-cell gene regulatory network analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 77.2%
  • Nextflow 22.8%