This directory contains source code for reproducing the analysis in:
Sutton, G.J., Poppe D., Simmons R.K., Walsh K., Nawaz U., Lister R., Gagnon-Bartsch J.A., and Voineagu I. Comprehensive evaluation of deconvolution methods for human brain gene expression. Nat Commun 13, 1358 (2022). https://doi.org/10.1038/s41467-022-28655-4
File naming conventions are as follows, and should be followed in order:
- Preprocessing: scripts that preprocess public and generated data
- BenchmarkingDatasets: estimating composition in simulated mixtures, and analyses of its output. Broadly corresponds to Figures 1-3. RNA: In vitro RNA mixtures of cultured neuronal and astrocyte RNA. DM: Mixtures derived by simulating pseudobulks from Darmanis et al.'s single cell RNA-seq. VL and CA: Mixtures derived by simulating pseudobulks from Velmeshev et al.'s or the Human Cell Atlas' single nucleus RNA-seq (VL and CA, respectively). Zhang: deconvolution of Zhang et al.'s immunopanned pure brain cell-types.
- ConfoundingComposition: simulation of datasets with confounded composition between groups, exploring DE driven by this phenomenon. Broadly corresponds to Figures 4-5. CompositionOnly: differential expression driven purely by group confounds in composition. CompositionAndExpression: detection of differential expression when confounded by composition. CIBx: analyses using CIBERSORTx to impute cell-type-specific expression
- BulkTissue: analyses of public bulk brain RNA-seq from Parikshak et al., the GTEx consortium, and an autism spectrum disorder dataset found in Parikshak et al. Broadly corresponds to Figure 6.
- Other: miscellaneous analysis scripts. CrossTissue: deconvolution of heart and pancreas data.
- Fun: contains general custom functions for calling in other scripts
The sequencing data generated in this study have been deposited in the GEO database under accession code GSE175772.
Processed signature data can be accessed as Supplementary Data 5 in our study. Links to access the raw data underlying this can be found in the table below.
Name | Species | Description | Reference and Link | Used In |
---|---|---|---|---|
CA | Human | Post-mortem brain tissue, smartSeq2 snRNA-seq | Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019). Human Cell Atlas website, specifically this | ------- |
DM | Human | Surgically-resected brain tissue, smartSeq scRNA-seq | Darmanis, S. et al. A survey of human brain transcriptome diversity at the single cell level. Proc. Natl Acad. Sci. USA 112, 7285–7290 (2015). GEO, or processed version on GitHub | ------- |
F5 | Human | Cultured brain cells, bulk CAGE-seq | Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014). FANTOM5 Website, see ./Files for easier processed versions in which TSS peaks are pooled to genes | ---- |
NG | Human | Post-mortem brain tissue, 10X snRNA-seq | Nagy, C. et al. Single-nucleus transcriptomics of the prefrontal cortex in major depressive disorder implicates oligodendrocyte precursor cells and excitatory neurons. Nat. Neurosci. 23, 771–781 (2020). GEO | ------- |
VL | Human | Post-mortem brain tissue, 10X snRNA-seq | Velmeshev, D. et al. Single-cell genomics identifies cell type–specific molecular changes in autism. Science (80-.) 364, 685–689 (2019). Data browser, use to access files 1 and 2 | ------- |
LK | Human | Post-mortem brain tissue, 10X snRNA-seq | Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018). GEO, specifically this please | ---- |
IP | Human | Surgically-resected brain tissue, bulk RNA-seq | Zhang, Y. et al. Purification and characterization of progenitor and mature human astrocytes reveals transcriptional and functional differences with mouse. Neuron 89, 37–53 (2016). Available on GEO, but use Table S4 | ------- |
MM | Mouse | Mouse brain, bulk RNA-seq | Zhang, Y. et al. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J. Neurosci. 34, 11929–11947 (2014). Paper website (We note that this website is now down, and provide the original downloaded file in ./Files/Zhang2014_mouseRNASeq_originalDownload.xlsx) | ---- |
TS | Mouse | Mouse brain, SmartSeq2 scRNA-seq | Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018). GEO, with metadata in Supplementary Table 10 | ------- |
Name | Samples | Details | Reference and Link |
---|---|---|---|
Parikshak | Human | Ribo-depleted RNA-seq from 251 post-mortem samples including frontal cortex, temporal cortex, and cerebellar vermis samples from 48 ASD and 49 control individuals, aged 2–67 | Parikshak, N. N. et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540, 423–427 (2016). GitHub, specifically this RDA, including metadata |
GTEx V7 Brain | Human | Poly-A+ RNA-seq of 1671 post-mortem samples, including 13 subregions which we classified as cortical, sub-cortical, cerebellar, or spinal | Consortium, Gte. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017). GTEx website, specifically V7 counts, and two metadata files A, and B |
We also generated cell-type-specific signatures for two non-brain tissues, to test the tissue-specificity of our results.
Name | Tissue | Description | Reference and Link | Used In |
---|---|---|---|---|
EN | Pancreas | FACS-isolated tissue, single-cell RNA-seq | Enge, M. et al. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell 171, 321–330e14 (2017). GEO | ---- |
BL | Pancreas | FACS-isolated tissue, bulk RNA-seq | Blodgett, D. M. et al. Novel observations from next-generation RNA sequencing of highly purified human adult and fetal islet cell subsets. Diabetes 64, 3172–3181 (2015). GEO | ------- |
FS | Pancreas | FACS-isolated tissue, bulk RNA-seq | Furuyama, K. et al. Diabetes relief in mice by glucose-sensing insulin-secreting human α-cells. Nature 567, 43–48 (2019). GEO | ------- |
FG | Pancreas | Cultured FACS-isolated tissue, bulk RNA-seq | Furuyama, K. et al. Diabetes relief in mice by glucose-sensing insulin-secreting human α-cells. Nature 567, 43–48 (2019). GEO | ------- |
F5 | Heart | Cultured cells, bulk CAGE-seq | Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014). FANTOM5 Website, see ./Files for easier processed versions in which TSS peaks are pooled to genes | ------- |
EN | Heart | Cultured cells, RNA-seq | Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012). ENCODE | ------- |
SC | Heart | Fresh tissue, scRNA-seq using iCell8 | Wang, L. et al. Single-cell reconstruction of the adult human heart during heart failure and recovery reveals the cellular landscape underlying cardiac function. Nat. Cell Biol. 22, 108–119 (2020). GEO | ------- |
Method | Classification | Reference and installation |
---|---|---|
DeconRNASeq | Partial deconvolution | Gong, T. & Szustakowski, J. D. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics 29, 1083–1085 (2013). Bioconductor |
CIBERSORT | Partial deconvolution | Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015). Note: the R source for this package is available by request at the CIBERSORT website |
dtangle | Partial deconvolution | Hunt, G. J., Freytag, S., Bahlo, M. & Gagnon-Bartsch, J. A. dtangle: accurate and robust cell type deconvolution. Bioinformatics 290262 (2018). CRAN |
MuSiC | Partial deconvolution | Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10, 380 (2019). GitHub |
Linseed | Complete deconvolution | Zaitsev, K., Bambouskova, M., Swain, A. & Artyomov, M. N. Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Nat. Commun. 10, 2209 (2019). GitHub |
Coex | Complete enrichment | Kelley, K. W., Nakao-Inoue, H., Molofsky, A. V. & Oldham, M. C. Variation among intact tissue samples reveals the core transcriptional features of human CNS cell classes. Nat. Neurosci. 21, 265397 (2018). Our own implementation is provided in the scripts. |
BrainInABlender | Enrichment | Hagenauer, M. H. et al. Inference of cell type content from human brain transcriptomic datasets illuminates the effects of age, manner of death, dissection, and psychiatric diagnosis. PLoS ONE 13, 89391 (2018). GitHub |
xCell | Enrichment | Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 1–14 (2017). GitHub |
In progress
In progress
These scripts assume the below directory structure. Feel free to rename the root as best suits you, however the scripts use ./BrainCellularComposition_GitHub. Please ave all downloaded files into ./Data/Raw
./BrainCellularComposition_GitHub
├── Data
│ ├── Preprocessed
│ │ └── CIB
│ └── Raw
├── Results
│ ├── ConfoundingComposition
│ │ ├── CompositionOnly
│ │ ├── CompositionAndExpression
│ │ └── CIBERSORTx
│ ├── BulkTissue
│ │ ├── Autism
│ │ ├── GTEx
│ │ └── Parikshak
│ ├── Other
│ │ ├── RNAvsCellContent
│ │ └── NonBrain
│ ├── BenchmarkingDatasets
│ │ ├── CA
│ │ ├── DM
│ │ ├── Supplementary
│ │ └── VL
└── Scripts
Deconvolution benchmarking in non-brain tissues, algorithm choice and data normalisation:
Avila Cobos, F., Alquicira-Hernandez, J., Powell, J.E. et al. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 11, 5650 (2020). https://doi.org/10.1038/s41467-020-19015-1
Jin, H., Liu, Z. A benchmark for RNA-seq deconvolution analysis under dynamic testing environments. Genome Biol 22, 102 (2021). https://doi.org/10.1186/s13059-021-02290-6
Mohammadi, S., Zuckerman, N., Goldsmith, A., & Grama, A. A critical survey of deconvolution methods for separating cell types in complex tissues. Proceedings of the IEEE, 105(2), 340-366 (2017). https://doi.org/10.1109/JPROC.2016.2607121.