Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training OOM #27

Open
rexainn opened this issue Aug 15, 2024 · 5 comments
Open

Training OOM #27

rexainn opened this issue Aug 15, 2024 · 5 comments

Comments

@rexainn
Copy link

rexainn commented Aug 15, 2024

Hi, I want to know what GPU do you use for training?
I use a V100, but it kept reporting out of memory. I have turned off the 'convert_models_to_fp32'.

@rexainn
Copy link
Author

rexainn commented Aug 15, 2024

I notice that in your paper you mentioned experiments using a single 3090.
So is it because I train it on my own dataset, and there exists 7 tasks?

@zwx8981
Copy link
Owner

zwx8981 commented Aug 16, 2024

Maybe, try using smaller batch size

@zwx8981
Copy link
Owner

zwx8981 commented Aug 16, 2024

You may also try setting opt = 1 in Line127, which freezes the weights of text encoder. Empirically, this would not affect the final performance very much, but can significantly reduce the memory cost.

@rexainn
Copy link
Author

rexainn commented Aug 16, 2024

Maybe, try using smaller batch size

set batchsize = 1 still cause OOM, quite strange....

@rexainn
Copy link
Author

rexainn commented Aug 16, 2024

You may also try setting opt = 1 in Line127, which freezes the weights of text encoder. Empirically, this would not affect the final performance very much, but can significantly reduce the memory cost.

This works, thanks! Meanwhile, I will still try to find the way to not freeze the text encoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants