-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs when running 'StageB_ldm_finetune.py' #12
Comments
I have solved this problem by using only one GPU while training. |
Hello, I have also encountered this error. May I ask how you specifically modified the code to resolve this error? I would greatly appreciate it if I could receive your help. |
My server will parallelize calculations on multiple GPUs by default, which will cause some problems (I have forgotten the specific reason for the problem,lol). To solve it simply you only need to specify the GPU at runtime, for example, CUDA_VISIBLE_DEVICES=1(your gpu id) python StageB_ldm_finetune.py |
Thank you very much for your help. I have successfully solved this problem, but as you said, it does run very slowly. |
hi there can you tell me which verson of pytorch_lightning you have used. |
Hello, I'm using the verson of pytorch_lightning is 1.6.5. |
Due to the update of the torchmetrics API, has already been resolved in this one #13 |
Thank you for your excellent work. Could you please help me with questions below?
It is weird that when I first ran this file it worked well. However, when I repeated my operation, it threws out error info like:
Traceback (most recent call last):
File "code/stageB_ldm_finetune.py", line 245, in
main(config)
File "code/stageB_ldm_finetune.py", line 163, in main
generative_model.finetune(trainer, fmri_latents_dataset_train, fmri_latents_dataset_test,
File "/public1/home/ungradu/home/gra02/lyh_test/mind-vis/mind-vis-main/code/dc_ldm/ldm_for_fmri.py", line 103, in finetune
trainers.fit(self.model, dataloader, val_dataloaders=test_loader)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/spawn.py", line 78, in launch
mp.spawn(
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes
process.start()
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TorchHistory.add_log_parameters_hook..'
p.s.
before the error info there is also a userwarning about wandb, but I guess it is not the cause of this problem.
The text was updated successfully, but these errors were encountered: