Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the mean of genes, only vary DEFacGroup between different simulations #165

Closed
yuw444 opened this issue Aug 3, 2023 · 2 comments
Closed
Labels

Comments

@yuw444
Copy link

yuw444 commented Aug 3, 2023

Hi,

I really like to simulate two different datasets with the same mean of genes and different DEFacGroup among groups.

Here is my use case.

  1. I like to generate 3 clusters, there are two conditions(wild type, experiment) within each cluster.
  2. I saw the possibility of using splatPop, but I need to combine both Cell-group effects and conditional effects, which I don't know how.
  3. So, I decided to simulate 3 datasets, each dataset only contains 2 clusters with very minimal DE genes; let these two clusters represent wild type and experiment; let each dataset represent one cell type. The scripts as attached.
library(splatter)
library(scater)
library(VariantAnnotation)
library(scMerge)

params_celltype1 <- newSplatParams( batchCells = 500,  nGenes = 10000,  seed = 926)
params_celltype2 <- newSplatParams( batchCells = 500,  nGenes = 10000,  seed = 926)
params_celltype3 <- newSplatParams( batchCells = 500,  nGenes = 10000,  seed = 926)

sim_celltype1 <- splatSimulateGroups(params_celltype1, group.prob = c(0.5, 0.5),  de.prob = 0.002, verbose = FALSE)
sim_celltype2 <- splatSimulateGroups(params_celltype1, group.prob = c(0.7, 0.3),  de.prob = 0.002, verbose = FALSE)
sim_celltype3 <- splatSimulateGroups(params_celltype1, group.prob = c(0.3, 0.7),  de.prob = 0.002, verbose = FALSE)

levels(colData(sim_celltype1)$Group) <- c("WT", "KO")
levels(colData(sim_celltype2)$Group) <- c("WT", "KO")
levels(colData(sim_celltype3)$Group) <- c("WT", "KO")

colData(sim_celltype1)$CellType <- "celltype1"
colData(sim_celltype2)$CellType <- "celltype2"
colData(sim_celltype3)$CellType <- "celltype3"

sim_combine <- scMerge::sce_cbind(
    list(
        sim_celltype1,
        sim_celltype2,
        sim_celltype3
    ),
    exprs = "counts",
    colData_names = c("Group", "CellType")
)

sim_combine <- logNormCounts(sim_combine)
sim_combine <- runUMAP(sim_combine)
plotUMAP(sim_combine, colour_by = "Group", shape_by = "CellType", point_size=3)
  1. The UMAP is very encouraging, but I could only determine the true DE genes between conditions within the cell type, not the DE genes among the cell type as the means of the gene are different across cell types. I read through Can I get DE gene list for a simulated data set? #57, it is helpful. In this case, I guess the most of genes are DE genes in my simulation, which is unrealistic.
    plot

My questions are:

  1. Could I use splatter like this?
  2. If so, is there a way to control the mean of genes to be the same among the different simulations and just DE genes selected randomly?

Thanks,
Yu

@lazappi
Copy link
Collaborator

lazappi commented Aug 4, 2023

Hi @yuw444

The splat simulation doesn't really have an idea of conditions so this kind of scenario is difficult to simulate. As you have seen two different simulations are completely independent so there is no relationship between Gene 1 in one simulation and Gene 1 in another. You can try using the batch effect parameters as conditions but note that this will produce a global shift between batches which may not be what you want.

@yuw444
Copy link
Author

yuw444 commented Aug 10, 2023

@lazappi Thanks for your clarification.

@yuw444 yuw444 closed this as completed Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants