This repository is designed to allow interested people to reproduce the results and figures of the paper:
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. ZINB-WaVE: A general and flexible method for signal extraction from single-cell RNA-seq data. bioRxiv. doi: https://doi.org/10.1101/125112
To be able to run the code in this repo, it is required to have R
(>=3.3), python
(>=2.7), and the following packages.
- zinbwave
- cluster
- matrixStats
- magrittr
- RColorBrewer
- ggplot2
- reshape
- dplyr
- knitr
- rmarkdown
- mclust
- cowplot
- rARPACK
- Rtsne
- parallel
- digest
- EDASeq
- biomaRt
- scRNAseq
- SummarizedExperiment
- edgeR
- scran
- scater
- scone
- DESeq2
To exactly reproduce the analyses of the paper, version 0.1.1
of the zinbwave
package is required. This can be installed in R with the following code.
library(devtools)
install_github("drisso/[email protected]")
The zinbwave
package is under active development and we are constantly fixing bugs, adding new features, and improving the documentation, hence we recommend to download the latest stable release from Bioconductor for all purposes other than exactly reproducing the analyses of our paper. To do so, use the following code.
install.packages("BiocManager")
BiocManager::install("zinbwave")
For each of the real datasets analyzed in the paper, there are a .Rmd
file and a .R
file in the real_data
folder, e.g.,
for the Patel data, the files are patel_covariates.Rmd
and patel_plots.R.
One needs to compile the .Rmd file first. This will have two effects: (i) it will create an HTML report with useful analyses of
the dataset; and (ii) it will create a .rda
file with the results of zinbwave
, pca
, and zifa
. Once this file is generated,
one can use the .R
file to generate the dataset-specific plots found in the paper.
To generate the plots related to silhouette width, one needs to source the silhouette.R file.
To generate the plots related to the goodness-of-fit, run the .Rmd
files in the real_data
folder starting with goodness_of_fit
, e.g., for the Patel data, the file is goodness_of_fit_patel.Rmd.
The Patel data are stored in real_data/Patel.zip
. Please unzip this file prior to run the Patel analysis.
To create the simulated datasets from the real datasets used in the paper, first run the code in simFunction.R. Then, run the .R
files in the folders in sims/figures
. Finally, run figuresPaper.Rmd.
To simulate the datasets from the Lun & Marioni model, run lunSim.R. It uses file function.rds generated by the steps described in the Methods section of the paper. Then, run fitZinbLun.R.
To fit the simulated datasets with n=10,000 cells, we used a Makefile to launch jobs on a server. Alternatively, you can just call fitZinb10000.R from your terminal with the arguments you want.
For any questions or issues with the code on this repository, please use the "Issues" tab.