Fix the `mean` of genes, only vary `DEFacGroup` between different simulations #165

yuw444 · 2023-08-03T22:17:12Z

Hi,

I really like to simulate two different datasets with the same mean of genes and different DEFacGroup among groups.

Here is my use case.

I like to generate 3 clusters, there are two conditions(wild type, experiment) within each cluster.
I saw the possibility of using splatPop, but I need to combine both Cell-group effects and conditional effects, which I don't know how.
So, I decided to simulate 3 datasets, each dataset only contains 2 clusters with very minimal DE genes; let these two clusters represent wild type and experiment; let each dataset represent one cell type. The scripts as attached.

library(splatter)
library(scater)
library(VariantAnnotation)
library(scMerge)

params_celltype1 <- newSplatParams( batchCells = 500,  nGenes = 10000,  seed = 926)
params_celltype2 <- newSplatParams( batchCells = 500,  nGenes = 10000,  seed = 926)
params_celltype3 <- newSplatParams( batchCells = 500,  nGenes = 10000,  seed = 926)

sim_celltype1 <- splatSimulateGroups(params_celltype1, group.prob = c(0.5, 0.5),  de.prob = 0.002, verbose = FALSE)
sim_celltype2 <- splatSimulateGroups(params_celltype1, group.prob = c(0.7, 0.3),  de.prob = 0.002, verbose = FALSE)
sim_celltype3 <- splatSimulateGroups(params_celltype1, group.prob = c(0.3, 0.7),  de.prob = 0.002, verbose = FALSE)

levels(colData(sim_celltype1)$Group) <- c("WT", "KO")
levels(colData(sim_celltype2)$Group) <- c("WT", "KO")
levels(colData(sim_celltype3)$Group) <- c("WT", "KO")

colData(sim_celltype1)$CellType <- "celltype1"
colData(sim_celltype2)$CellType <- "celltype2"
colData(sim_celltype3)$CellType <- "celltype3"

sim_combine <- scMerge::sce_cbind(
    list(
        sim_celltype1,
        sim_celltype2,
        sim_celltype3
    ),
    exprs = "counts",
    colData_names = c("Group", "CellType")
)

sim_combine <- logNormCounts(sim_combine)
sim_combine <- runUMAP(sim_combine)
plotUMAP(sim_combine, colour_by = "Group", shape_by = "CellType", point_size=3)

The UMAP is very encouraging, but I could only determine the true DE genes between conditions within the cell type, not the DE genes among the cell type as the means of the gene are different across cell types. I read through Can I get DE gene list for a simulated data set? #57, it is helpful. In this case, I guess the most of genes are DE genes in my simulation, which is unrealistic.

My questions are:

Could I use splatter like this?
If so, is there a way to control the mean of genes to be the same among the different simulations and just DE genes selected randomly?

Thanks,
Yu

The text was updated successfully, but these errors were encountered:

lazappi · 2023-08-04T06:33:58Z

Hi @yuw444

The splat simulation doesn't really have an idea of conditions so this kind of scenario is difficult to simulate. As you have seen two different simulations are completely independent so there is no relationship between Gene 1 in one simulation and Gene 1 in another. You can try using the batch effect parameters as conditions but note that this will produce a global shift between batches which may not be what you want.

yuw444 · 2023-08-10T04:54:57Z

@lazappi Thanks for your clarification.

lazappi added the question label Aug 4, 2023

yuw444 closed this as completed Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the `mean` of genes, only vary `DEFacGroup` between different simulations #165

Fix the `mean` of genes, only vary `DEFacGroup` between different simulations #165

yuw444 commented Aug 3, 2023

lazappi commented Aug 4, 2023

yuw444 commented Aug 10, 2023

Fix the mean of genes, only vary DEFacGroup between different simulations #165

Fix the mean of genes, only vary DEFacGroup between different simulations #165

Comments

yuw444 commented Aug 3, 2023

lazappi commented Aug 4, 2023

yuw444 commented Aug 10, 2023

Fix the `mean` of genes, only vary `DEFacGroup` between different simulations #165

Fix the `mean` of genes, only vary `DEFacGroup` between different simulations #165