Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: [Errno 2] No such file or directory: 'ckpt.tar.gz.tmp' -> 'ckpt.tar.gz' #241

Open
pengxin2019 opened this issue Aug 11, 2023 · 12 comments
Assignees

Comments

@pengxin2019
Copy link

Dear cellbender team,
Thanks for developing this great tool. I am using cellbender to remove some ambient RNA but ran into some errors that I do not know how to tackle.

I installed cellbender by issuing:

conda create -n cellbender python=3.7
conda activate cellbender
pip install cellbender

Here is my command:

cellbender remove-background \
--input path.to.h5/count/sample_raw_feature_bc_matrix.h5 \
--output cellbender.output.h5 

Here is my output showing the error:

cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash c982ed3028)
cellbender:remove-background: 2023-08-10 17:04:20
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /work/tpb/lixia/REAPseq/Fixed_RNA/FFPE/CRC_26Jan2023/multi/CRC_1/outs/per_sample_outs/CRC_6204/count/sample_raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Features in dataset: 37143 Gene Expression
cellbender:remove-background: Trimming features for inference.
cellbender:remove-background: 17899 features have nonzero counts.
cellbender:remove-background: Prior on counts for cells is 1444
cellbender:remove-background: Prior on counts for empty droplets is 8
cellbender:remove-background: Excluding 5100 features that are estimated to have <= 0.1 background counts in cells.
cellbender:remove-background: Including 12799 features in the analysis.
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 5
cellbender:remove-background: Using 4495 probable cell barcodes, plus an additional 14874 barcodes, and 21435 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 10 UMI counts.
cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq
cellbender:remove-background: Successfully unpacked tarball to /SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_model.torch
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_test.loaderstate
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_params.pyro
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_train.loaderstate
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_optim.torch
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_args.npy
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_random.pyro
/SFS/project/sysadmin/univa/ctcb/ugetmp/86712229.1.all.q/tmp94d_xepq/b11a0349da_optim.pyro
cellbender:remove-background: Workflow hash does not match that of checkpoint.
cellbender:remove-background: No checkpoint loaded.
cellbender:remove-background: Running inference...
cellbender:remove-background: [epoch 001]  average training loss: 2771.3749
cellbender:remove-background: [epoch 002]  average training loss: 2445.6552  (385.0 seconds per epoch)
cellbender:remove-background: Will checkpoint every 2 epochs
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 003]  average training loss: 2277.2381
cellbender:remove-background: [epoch 004]  average training loss: 2180.8833
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 005]  average training loss: 2135.7810
cellbender:remove-background: [epoch 005] average test loss: 2117.2394
cellbender:remove-background: [epoch 006]  average training loss: 2113.6024
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 007]  average training loss: 2107.6420
cellbender:remove-background: [epoch 008]  average training loss: 2109.4242
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 009]  average training loss: 2106.6678
cellbender:remove-background: [epoch 010]  average training loss: 2101.9939
cellbender:remove-background: [epoch 010] average test loss: 2091.8989
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 011]  average training loss: 2098.1218
cellbender:remove-background: [epoch 012]  average training loss: 2093.0278
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 013]  average training loss: 2088.0385
cellbender:remove-background: [epoch 014]  average training loss: 2084.8984
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 015]  average training loss: 2083.8992
cellbender:remove-background: [epoch 015] average test loss: 2078.8391
cellbender:remove-background: [epoch 016]  average training loss: 2084.4798
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 017]  average training loss: 2082.0866
cellbender:remove-background: [epoch 018]  average training loss: 2069.9764
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 019]  average training loss: 2047.9459
cellbender:remove-background: [epoch 020]  average training loss: 2030.7993
cellbender:remove-background: [epoch 020] average test loss: 2020.9930
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 021]  average training loss: 2021.4861
cellbender:remove-background: [epoch 022]  average training loss: 2017.3603
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 023]  average training loss: 2014.7500
cellbender:remove-background: [epoch 024]  average training loss: 2011.0506
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 025]  average training loss: 2004.5917
cellbender:remove-background: [epoch 025] average test loss: 2002.4808
cellbender:remove-background: [epoch 026]  average training loss: 2004.0337
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 027]  average training loss: 2000.5297
cellbender:remove-background: [epoch 028]  average training loss: 1998.2894
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 029]  average training loss: 1993.4373
cellbender:remove-background: [epoch 030]  average training loss: 1992.5402
cellbender:remove-background: [epoch 030] average test loss: 2000.0420
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 031]  average training loss: 1991.5930
cellbender:remove-background: [epoch 032]  average training loss: 1990.6498
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 033]  average training loss: 1988.9943
cellbender:remove-background: [epoch 034]  average training loss: 1987.4015
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 035]  average training loss: 1987.6341
cellbender:remove-background: [epoch 035] average test loss: 1972.5572
cellbender:remove-background: [epoch 036]  average training loss: 1985.4938
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 037]  average training loss: 1982.6104
cellbender:remove-background: [epoch 038]  average training loss: 1983.4120
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 039]  average training loss: 1981.6671
cellbender:remove-background: [epoch 040]  average training loss: 1979.5357
cellbender:remove-background: [epoch 040] average test loss: 1972.6142
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 041]  average training loss: 1980.4693
cellbender:remove-background: [epoch 042]  average training loss: 1979.8637
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 043]  average training loss: 1975.0156
cellbender:remove-background: [epoch 044]  average training loss: 1975.9838
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 045]  average training loss: 1979.7817
cellbender:remove-background: [epoch 045] average test loss: 1975.3770
cellbender:remove-background: [epoch 046]  average training loss: 1972.6189
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 047]  average training loss: 1974.6877
cellbender:remove-background: [epoch 048]  average training loss: 1970.5287
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 049]  average training loss: 1971.4797
cellbender:remove-background: [epoch 050]  average training loss: 1970.9494
cellbender:remove-background: [epoch 050] average test loss: 1966.9934
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 051]  average training loss: 1972.8832
cellbender:remove-background: [epoch 052]  average training loss: 1973.4166
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/checkpoint.py", line 140, in save_checkpoint
    make_tarball(files=file_list, tarball_name=tarball_name)
  File "/home/yanpengx/.conda/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/checkpoint.py", line 300, in make_tarball
    os.rename(tarball_name + '.tmp', tarball_name)
FileNotFoundError: [Errno 2] No such file or directory: 'ckpt.tar.gz.tmp' -> 'ckpt.tar.gz'

cellbender:remove-background: [epoch 053]  average training loss: 1969.6216
cellbender:remove-background: [epoch 054]  average training loss: 1966.2763
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 055]  average training loss: 1974.7479
cellbender:remove-background: [epoch 055] average test loss: 1966.3821
cellbender:remove-background: [epoch 056]  average training loss: 1968.6604
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 057]  average training loss: 1974.4091
cellbender:remove-background: [epoch 058]  average training loss: 1969.8807
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 059]  average training loss: 1974.3389
cellbender:remove-background: [epoch 060]  average training loss: 1968.1519
cellbender:remove-background: [epoch 060] average test loss: 1969.5730
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 061]  average training loss: 1976.2605
cellbender:remove-background: [epoch 062]  average training loss: 1985.0529
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/checkpoint.py", line 140, in save_checkpoint
    make_tarball(files=file_list, tarball_name=tarball_name)
  File "/home/yanpengx/.conda/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/checkpoint.py", line 300, in make_tarball
    os.rename(tarball_name + '.tmp', tarball_name)
FileNotFoundError: [Errno 2] No such file or directory: 'ckpt.tar.gz.tmp' -> 'ckpt.tar.gz'

cellbender:remove-background: [epoch 063]  average training loss: 1982.0365
cellbender:remove-background: [epoch 064]  average training loss: 1975.1049
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 065]  average training loss: 1978.7159
cellbender:remove-background: [epoch 065] average test loss: 1992.5103
cellbender:remove-background: [epoch 066]  average training loss: 1974.5693
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 067]  average training loss: 1971.0817
cellbender:remove-background: [epoch 068]  average training loss: 1961.1724
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz
cellbender:remove-background: [epoch 069]  average training loss: 1965.4476
cellbender:remove-background: [epoch 070]  average training loss: 1969.1745
cellbender:remove-background: [epoch 070] average test loss: 1979.7302
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Saved checkpoint as /SFS/user/ctc/yanpengx/CRC/Yan_code_based/cellbender.wd.out/ckpt.tar.gz

As you can see above, there are some problems like

"cellbender:remove-background: No checkpoint loaded."
"File "/home/yanpengx/.conda/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/checkpoint.py", line 140, in save_checkpoint
    make_tarball(files=file_list, tarball_name=tarball_name)
  File "/home/yanpengx/.conda/envs/cellbender/lib/python3.7/site-packages/cellbender/remove_background/checkpoint.py", line 300, in make_tarball
    os.rename(tarball_name + '.tmp', tarball_name)
FileNotFoundError: [Errno 2] No such file or directory: 'ckpt.tar.gz.tmp' -> 'ckpt.tar.gz'"

Can you give me some suggestions on how to fix it?

Thanks
Best
Penny

@chris-rands
Copy link

I encountered this error when launching multiple cellbender jobs from the same directory. My guess is that ckpt.tar.gz was being overwritten by different jobs like some kind of race condition. I avoided it by launching the jobs from different subdirectories. Using --checkpoint might be another option, but didn't work for me.

@OmarSalem12
Copy link

I also have a similar experience and second what @chris-rands says. When running multiple jobs from the same directory I get this error and when I create subdirs and run each job from a separate dir I don't run into this issue.

@sjfleming
Copy link
Member

@chris-rands you are quite correct! For some reason, I did not think of this. I typically work in the cloud where one machine is designated to run one job.

Let me explain what the tool is currently doing:

  • the checkpoint file is now an important part of the run... after training, the creation of the posterior and the final output denoised count matrix both depend on being able to read from / write to the checkpoint file
  • when you run the tool, the directory from which you run the tool is used to specify the checkpoint file for that run (ckpt.tar.gz in that directory)
  • if you start multiple runs at the same time from the same directory (i.e. the command was executed in the same directory), you will get a race condition as @chris-rands described. this is very bad, and it wasn't a use case I had anticipated, though I guess if two people have already run into it, it is a normal use case :)

Two options moving forward:

  • users can run each job from a separate subdirectory, as @chris-rands and @OmarSalem12 suggested. it would seem logical to make this directory the intended output directory
  • I can escalate this issue and fix the real problem... one way to do this would be to provide a new optional input argument which will allow users to specify a different name (a unique one) for the checkpoint file. this would prevent potential overlap between simultaneous runs initiated
  • from the same directory

@OmarSalem12
Copy link

@sjfleming I think giving the option to run everything from the same directory and specifying the checkpoint name would be nice to have down the line. For the time being it might be worth warning the users not to run multiple jobs from the same directory.

@sjfleming
Copy link
Member

Penny @pengxin2019 , does this solve your problem as well?

@sjfleming sjfleming self-assigned this Aug 11, 2023
@chris-rands
Copy link

  • I can escalate this issue and fix the real problem... one way to do this would be to provide a new optional input argument which will allow users to specify a different name (a unique one) for the checkpoint file. this would prevent potential overlap between simultaneous runs initiated
  • from the same directory

I actually misunderstood the existing --checkpoint flag as having this functionality.

My two cents: Most users probably don't care about the checkpoint file and would be ideal if the default options handle this. So, could add some unique identifier (like a hash/job id) to the checkpoint file name for each run to avoid these name collisions. Or make the default write location the same as the output directory (I'm assuming users would not launch multiple runs with the same --output flag). That said, would not say it's a high priority issue

@sjfleming
Copy link
Member

@chris-rands yeah that --checkpoint input argument is meant to point cellbender to an existing checkpoint file to use as input. If the specified checkpoint file doesn't exist or doesn't work, cellbender will just continue on without it. But it will always still save the checkpoint file as ckpt.tar.gz

I think you're right. I'll do a bit of thinking about the best way to handle this so that users really don't have to think about it.

The reason I wanted to always call the saved checkpoint file ckpt.tar.gz actually has to do with a detail of how checkpointing in Cromwell 55+ works. A lot of people run CellBender using a WDL workflow on Terra, and this can be made to automatically re-run upon preemption and pick up using a checkpoint. This enables the use of preemptible GPU machines in the cloud, and substantially cuts cloud costs of running the workflow.

@makrez
Copy link

makrez commented Aug 18, 2023

I was looking for this option as well. We typically run jobs in parallel on a HPC using snakemake. @sjfleming From my point of view, it would be enough to just save the checkpoint file in a user-specified output directory and only start from a checkpoint when the --checkpoint flag is used. Looking forward hearing an update on this. For now, I am fine with just running samples sequentially.

@pengxin2019
Copy link
Author

Penny @pengxin2019 , does this solve your problem as well?

Hi sjfleming,
Thanks for your reply. In my original post, I run 4 jobs at the same directory. After reading through your comments, I run 4 jobs at different directories but got some different errors.

here are my commands for the job of sample 6204:

cd /MY.PATH/cellbender.wd.out/sample.6204_1.wd

cellbender remove-background \
--input /my.path/sample.6204/count/sample_raw_feature_bc_matrix.h5 \
--output sample.6204_1.cellbender.output.h5 

However the expected output file sample.6204_1.cellbender.output.h5 was not generated. Here is my output file sample.6204_1.cellbender.output.log:

cellbender:remove-background: Command:
cellbender remove-background --input /work/tpb/lixia/REAPseq/Fixed_RNA/FFPE/CRC_26Jan2023/multi/CRC_1/outs/per_sample_outs/CRC_6204/count/sample_raw_feature_bc_matrix.h5 --output CRC_6204_1.cellbender.output.h5
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash c982ed3028)
cellbender:remove-background: 2023-08-17 14:35:55
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from /work/tpb/lixia/REAPseq/Fixed_RNA/FFPE/CRC_26Jan2023/multi/CRC_1/outs/per_sample_outs/CRC_6204/count/sample_raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Features in dataset: 37143 Gene Expression
cellbender:remove-background: Trimming features for inference.
cellbender:remove-background: 17899 features have nonzero counts.
cellbender:remove-background: Prior on counts for cells is 1444
cellbender:remove-background: Prior on counts for empty droplets is 8
cellbender:remove-background: Excluding 5100 features that are estimated to have <= 0.1 background counts in cells.
cellbender:remove-background: Including 12799 features in the analysis.
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 5
cellbender:remove-background: Using 4495 probable cell barcodes, plus an additional 14874 barcodes, and 21435 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 10 UMI counts.
cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /SFS/project/sysadmin/univa/ctcb/ugetmp/86920316.1.all.q/tmpw0oxoo6k
cellbender:remove-background: No saved checkpoint.
cellbender:remove-background: No checkpoint loaded.
cellbender:remove-background: Running inference...
cellbender:remove-background: [epoch 001]  average training loss: 2771.3917
cellbender:remove-background: [epoch 002]  average training loss: 2445.6278  (526.3 seconds per epoch)
cellbender:remove-background: Will checkpoint every 1 epochs
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint
    torch.save(model_obj, filebase + '_model.torch')
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'weakref' object

cellbender:remove-background: [epoch 003]  average training loss: 2277.3295
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint
    torch.save(model_obj, filebase + '_model.torch')
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'weakref' object

cellbender:remove-background: [epoch 004]  average training loss: 2180.8738
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint
    torch.save(model_obj, filebase + '_model.torch')
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'weakref' object

cellbender:remove-background: [epoch 005]  average training loss: 2135.7623
cellbender:remove-background: [epoch 005] average test loss: 2117.2164
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint
    torch.save(model_obj, filebase + '_model.torch')
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'weakref' object

cellbender:remove-background: [epoch 006]  average training loss: 2113.5286
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint
    torch.save(model_obj, filebase + '_model.torch')
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'weakref' object

cellbender:remove-background: [epoch 007]  average training loss: 2107.6612
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint
    torch.save(model_obj, filebase + '_model.torch')
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'weakref' object

This is the end of the output file:

cellbender:remove-background: [epoch 150]  average training loss: 1934.1259
cellbender:remove-background: [epoch 150] average test loss: 1944.6237
cellbender:remove-background: Saving a checkpoint...
cellbender:remove-background: Could not save checkpoint
cellbender:remove-background: Traceback (most recent call last):
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint
    torch.save(model_obj, filebase + '_model.torch')
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/home/yanpengx/.conda/envs/CellBender/lib/python3.8/site-packages/torch/serialization.py", line 653, in _save
    pickler.dump(obj)
TypeError: cannot pickle 'weakref' object

cellbender:remove-background: 2023-08-17 18:53:52
cellbender:remove-background: Inference procedure complete.

Briefly speaking,"TypeError: cannot pickle 'weakref' object"repeated a lot of times. Can you give me any further comments?

Thanks for your patience.
Penny

@sjfleming
Copy link
Member

Hi Penny @pengxin2019 ,

Okay, the issue you're having there is #212 which is also mentioned here
#230 (comment)

The problem is that you probably have pytorch version 2.0+ (and probably python > 3.7).

Currently cellbender needs pytorch < 2.0 (and python 3.7), because they changed something in pytorch in v2.0 and I have not quite figured out how to make cellbender work with it. (Progress on that is being tracked here #203 )

Can you try this to install cellbender (assuming you are using conda)?

(base) $ conda create -n cellbender python=3.7
(base) $ conda activate cellbender
(cellbender) $ pip install cellbender

@sjfleming
Copy link
Member

Thanks for the input @makrez !

@yfarjoun
Copy link

yfarjoun commented Sep 5, 2023

(please update the documentation in the README.md )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants