Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs when running 'StageB_ldm_finetune.py' #12

Open
lyh1028 opened this issue Apr 10, 2023 · 7 comments
Open

Bugs when running 'StageB_ldm_finetune.py' #12

lyh1028 opened this issue Apr 10, 2023 · 7 comments

Comments

@lyh1028
Copy link

lyh1028 commented Apr 10, 2023

Thank you for your excellent work. Could you please help me with questions below?
It is weird that when I first ran this file it worked well. However, when I repeated my operation, it threws out error info like:
Traceback (most recent call last):
File "code/stageB_ldm_finetune.py", line 245, in
main(config)
File "code/stageB_ldm_finetune.py", line 163, in main
generative_model.finetune(trainer, fmri_latents_dataset_train, fmri_latents_dataset_test,
File "/public1/home/ungradu/home/gra02/lyh_test/mind-vis/mind-vis-main/code/dc_ldm/ldm_for_fmri.py", line 103, in finetune
trainers.fit(self.model, dataloader, val_dataloaders=test_loader)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/spawn.py", line 78, in launch
mp.spawn(
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes
process.start()
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/public1/home/ungradu/home/gra02/anaconda3/envs/mind-vis/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TorchHistory.add_log_parameters_hook..'

p.s.
before the error info there is also a userwarning about wandb, but I guess it is not the cause of this problem.

@lyh1028
Copy link
Author

lyh1028 commented Apr 12, 2023

I have solved this problem by using only one GPU while training.

@bottle0228
Copy link

Hello,

I have also encountered this error. May I ask how you specifically modified the code to resolve this error?

I would greatly appreciate it if I could receive your help.

@lyh1028
Copy link
Author

lyh1028 commented May 29, 2023

Hello,

I have also encountered this error. May I ask how you specifically modified the code to resolve this error?

I would greatly appreciate it if I could receive your help.

My server will parallelize calculations on multiple GPUs by default, which will cause some problems (I have forgotten the specific reason for the problem,lol). To solve it simply you only need to specify the GPU at runtime, for example, CUDA_VISIBLE_DEVICES=1(your gpu id) python StageB_ldm_finetune.py
However, this will make the finetune process very slow. I think it takes about 3 days in one RTX 3090 GPU.

@bottle0228
Copy link

Thank you very much for your help. I have successfully solved this problem, but as you said, it does run very slowly.

@tejastake
Copy link

hi there can you tell me which verson of pytorch_lightning you have used.
can you please guide me about my error.
my error is
File "/Major_with_stage3/eval_metrics.py", line 119, in n_way_top_k_acc
acc = accuracy(pred_picked.unsqueeze(0), torch.tensor([0], device=pred.device),
TypeError: accuracy() missing 1 required positional argument: 'task'

@bottle0228
Copy link

hi there can you tell me which verson of pytorch_lightning you have used. can you please guide me about my error. my error is File "/Major_with_stage3/eval_metrics.py", line 119, in n_way_top_k_acc acc = accuracy(pred_picked.unsqueeze(0), torch.tensor([0], device=pred.device), TypeError: accuracy() missing 1 required positional argument: 'task'

Hello, I'm using the verson of pytorch_lightning is 1.6.5.

@JoyMei
Copy link

JoyMei commented Oct 26, 2023

hi there can you tell me which verson of pytorch_lightning you have used. can you please guide me about my error. my error is File "/Major_with_stage3/eval_metrics.py", line 119, in n_way_top_k_acc acc = accuracy(pred_picked.unsqueeze(0), torch.tensor([0], device=pred.device), TypeError: accuracy() missing 1 required positional argument: 'task'

Due to the update of the torchmetrics API, has already been resolved in this one #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants