Skip to content

Data analysis and simulations for the ZINB-WaVE paper

Notifications You must be signed in to change notification settings

drisso/zinb_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data analysis and simulations for the ZINB-WaVE paper

This repository is designed to allow interested people to reproduce the results and figures of the paper:

Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. ZINB-WaVE: A general and flexible method for signal extraction from single-cell RNA-seq data. bioRxiv. doi: https://doi.org/10.1101/125112

Dependencies

To be able to run the code in this repo, it is required to have R (>=3.3), python (>=2.7), and the following packages.

R packages

  • zinbwave
  • cluster
  • matrixStats
  • magrittr
  • RColorBrewer
  • ggplot2
  • reshape
  • dplyr
  • knitr
  • rmarkdown
  • mclust
  • cowplot
  • rARPACK
  • Rtsne
  • parallel
  • digest

Bioconductor packages

  • EDASeq
  • biomaRt
  • scRNAseq
  • SummarizedExperiment
  • edgeR
  • scran
  • scater
  • scone
  • DESeq2

python packages

A note on zinbwave version

To exactly reproduce the analyses of the paper, version 0.1.1 of the zinbwave package is required. This can be installed in R with the following code.

library(devtools)
install_github("drisso/[email protected]")

The zinbwave package is under active development and we are constantly fixing bugs, adding new features, and improving the documentation, hence we recommend to download the latest stable release from Bioconductor for all purposes other than exactly reproducing the analyses of our paper. To do so, use the following code.

install.packages("BiocManager")
BiocManager::install("zinbwave")

Getting started

Real data

For each of the real datasets analyzed in the paper, there are a .Rmd file and a .R file in the real_data folder, e.g., for the Patel data, the files are patel_covariates.Rmd and patel_plots.R.

One needs to compile the .Rmd file first. This will have two effects: (i) it will create an HTML report with useful analyses of the dataset; and (ii) it will create a .rda file with the results of zinbwave, pca, and zifa. Once this file is generated, one can use the .R file to generate the dataset-specific plots found in the paper.

To generate the plots related to silhouette width, one needs to source the silhouette.R file.

To generate the plots related to the goodness-of-fit, run the .Rmd files in the real_data folder starting with goodness_of_fit, e.g., for the Patel data, the file is goodness_of_fit_patel.Rmd.

The Patel data are stored in real_data/Patel.zip. Please unzip this file prior to run the Patel analysis.

Simulations

To create the simulated datasets from the real datasets used in the paper, first run the code in simFunction.R. Then, run the .R files in the folders in sims/figures. Finally, run figuresPaper.Rmd.

To simulate the datasets from the Lun & Marioni model, run lunSim.R. It uses file function.rds generated by the steps described in the Methods section of the paper. Then, run fitZinbLun.R.

To fit the simulated datasets with n=10,000 cells, we used a Makefile to launch jobs on a server. Alternatively, you can just call fitZinb10000.R from your terminal with the arguments you want.

For any questions or issues with the code on this repository, please use the "Issues" tab.

About

Data analysis and simulations for the ZINB-WaVE paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published