Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with reducing number of cells called #256

Open
racng opened this issue Aug 23, 2023 · 7 comments
Open

Help with reducing number of cells called #256

racng opened this issue Aug 23, 2023 · 7 comments
Assignees
Labels
user question User question about a specific dataset

Comments

@racng
Copy link

racng commented Aug 23, 2023

I have a particular sample that struggles to work well with cellbender. I am currently the latest cellbender v0.3.0. Its UMI curve has a weak knee structure. By eye, I am guessing there are around 30k cells but cellbender is overestimating that. I have tried running it with the default settings and also increasing the total-droplets-included=50000 and setting a low expected-cells=10000, but couldn't get the program to call cells at the expected levels. I have attached the html reports below for a sample that worked well (No. 3, default setting) vs. the sample having trouble (No. 8). Suggestions would be greatly appreciated! Thank you!

output_report_8.html.zip
output_report_3.html.zip

@sjfleming
Copy link
Member

To me, sample number 3 looks like the kind of output I would expect.

Sample 8 is not doing a great job calling cells, I agree with you. Can you post the first part of the cellbender log file, before the training starts, where cellbender says how many cells and empties there are, and estimates UMI counts in each? That will give me a better idea about what cellbender thinks it's seeing.

@sjfleming sjfleming self-assigned this Aug 23, 2023
@sjfleming sjfleming added the user question User question about a specific dataset label Aug 23, 2023
@racng
Copy link
Author

racng commented Aug 23, 2023

Here is the beginning of cellbender log file:

cellbender:remove-background: Command:
cellbender remove-background --input /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 --output results/qc/cellbender/8/output.h5 --cuda --posterior-batch-size 256 --total-droplets-included 50000 --expected-cells 10000
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash 525d18ef91)
cellbender:remove-background: 2023-08-22 23:58:26
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Features in dataset: 192 Antibody Capture, 36608 Gene Expression
cellbender:remove-background: Trimming features for inference.
cellbender:remove-background: 26853 features have nonzero counts.
cellbender:remove-background: Prior on counts for cells is 10819
cellbender:remove-background: Prior on counts for empty droplets is 1600
cellbender:remove-background: Excluding 9161 features that are estimated to have <= 0.1 background counts in cells.
cellbender:remove-background: Including 17692 features in the analysis.
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 800
cellbender:remove-background: Using 10000 probable cell barcodes, plus an additional 40000 barcodes, and 31805 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 1597 UMI counts.
cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /tmp/tmpn2iy66eh
cellbender:remove-background: Successfully unpacked tarball to /tmp/tmpn2iy66eh

@sjfleming
Copy link
Member

Well things seem to look pretty much how I'd expect.

It's not clear to me why we're not getting something a bit more reasonable...

I am guessing a bit here, but could you try --expected-cells 20000 --total-droplets-included 60000?

@racng
Copy link
Author

racng commented Aug 24, 2023

I think those settings improved it a bit, but it is still estimating ~50k cells instead of 30k.
Log:
cellbender:remove-background: Command:
cellbender remove-background --input /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 --output results/qc/cellbender/8/output.h5 --cuda --posterior-batch-size 256 --total-droplets-included 60000 --expected-cells 20000
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash ad6a62f361)
cellbender:remove-background: 2023-08-23 21:38:43
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Features in dataset: 192 Antibody Capture, 36608 Gene Expression
cellbender:remove-background: Trimming features for inference.
cellbender:remove-background: 26853 features have nonzero counts.
cellbender:remove-background: Prior on counts for cells is 7407
cellbender:remove-background: Prior on counts for empty droplets is 1489
cellbender:remove-background: Excluding 7747 features that are estimated to have <= 0.1 background counts in cells.
cellbender:remove-background: Including 19106 features in the analysis.
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 744
cellbender:remove-background: Using 20000 probable cell barcodes, plus an additional 40000 barcodes, and 22071 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 1486 UMI counts.
output_report_8_new.html.zip

@racng
Copy link
Author

racng commented Aug 25, 2023

I have a log file from cellbender v0.2.2 that was able to estimate ~30k cells:

cellbender:remove-background: Command:
cellbender remove-background --input /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5 --output results/qc/cellbender/8/exp10000_total40000_thresh200_z50.h5 --cuda --epochs 150 --fpr 0.01 --learning-rate 5e-05 --expected-cells 10000 --total-droplets-included 40000 --low-count-threshold 200 --z-dim 50
cellbender:remove-background: 2023-07-12 00:19:02
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from file /users/rng/proj/tlc/data/10x/cellranger-7.1.0/hg38-t-tropic-virus/20230508_8/outs/multi/count/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Trimming dataset for inference.
cellbender:remove-background: Including 26853 genes that have nonzero counts.
cellbender:remove-background: Prior on counts in empty droplets is 1479
cellbender:remove-background: Prior on counts for cells is 10067
cellbender:remove-background: Excluding barcodes with counts below 739
cellbender:remove-background: Using 10000 probable cell barcodes, plus an additional 30000 barcodes, and 42094 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 1737 UMI counts.
cellbender:remove-background: Running inference...

It shows similar priors for cells and empty droplets and count threshold. For v0.3.0, could it be that excluding 6-7k features estimated to have <= 0.1 background counts in cells be reducing the model complexity too much? How do I adjust that with --projected-ambient-count-threshold?

Update: using --projected-ambient-count-threshold 0 didn't help.

@sjfleming
Copy link
Member

Hi @racng , this is an interesting example. It does seem like cell probability inference is not working as well on this sample in v0.3.0 as it was in v0.2.2.

(It is definitely the case that v0.3.0 does better than v0.2.2 on a lot of samples. But this seems to be an exception.)

You are right that --projected-ambient-count-threshold 0 is the way to include all the features expressed at a nonzero level. But that didn't seem to help...

Is there any chance I could get a copy of that h5 file to try to experiment a bit and see what is going on?

In the meantime, two other settings I'd try to just hope we can force the outcome we want...

  • --expected-cells 25000 --total-droplets-included 40000
  • --expected-cells 200 --total-droplets-included 40000

@racng
Copy link
Author

racng commented Aug 28, 2023

@sjfleming I have just sent you an email via your Broad Institute email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user question User question about a specific dataset
Projects
Status: To Do
Development

No branches or pull requests

2 participants