Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] remove device type #429

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lanking520
Copy link
Contributor

For HuggingFace Accelerate:

device_map='auto'

means to load the model across all visible GPUs. For weight remapping this is unecessary to do given it will make the process run slow. Just load the full model on CPU is sufficient.

@byshiue
Copy link
Collaborator

byshiue commented Feb 2, 2023

However, the CPU memory many not enough to put the full model.

@lanking520
Copy link
Contributor Author

@byshiue Usually CPU is the one that could fit, in AWS. Most GPU instance would equip with 60 - 1.2TB CPU memory, in the meantime, GPU total memory usually is only 1/4 of them like 15GB T4 (G4dn single) or 40GB A100 (P4d.24xlarge). Loading model with OPT-66B with such GPU fully take tremendous time on CPU loading and copy them fully to GPU.

And perhaps, even having this parameter, you still have to load the model full on CPU and partition to each GPU, which is useless to FT to do the weight remapping

@byshiue
Copy link
Collaborator

byshiue commented Feb 7, 2023

But this script is not only for cloud like AWS. We also need to consider other cases. In some developing environment, they may only have few CPU memory like 16GB. So, we still prefer keeping current setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants