`ModelCheckpoint` callback saves the encoder weights even though they are frozen #723

nkaenzig · 2024-12-05T09:29:20Z

During evals, we currently use lightning.pytorch.callbacks.ModelCheckpoint to save the best model checkpoints during fit. While we only fit the decoder / head on the downstream tasks, the encoder usually remains frozen, so it's not necessary to include it into the saved checkpoints.

Furthermore, for big encoders (e.g. ViT-G), this becomes a major bottleneck in terms of runtime, taking up to 30 seconds to save the checkpoint, while the GPU remains idle:

The text was updated successfully, but these errors were encountered:

nkaenzig self-assigned this Dec 5, 2024

nkaenzig mentioned this issue Dec 5, 2024

Override checkpointing hooks to exclude backbone while saving checkpoints #724

Merged

nkaenzig linked a pull request Dec 5, 2024 that will close this issue

Override checkpointing hooks to exclude backbone while saving checkpoints #724

Merged

nkaenzig closed this as completed in #724 Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ModelCheckpoint` callback saves the encoder weights even though they are frozen #723

`ModelCheckpoint` callback saves the encoder weights even though they are frozen #723

nkaenzig commented Dec 5, 2024

ModelCheckpoint callback saves the encoder weights even though they are frozen #723

ModelCheckpoint callback saves the encoder weights even though they are frozen #723

Comments

nkaenzig commented Dec 5, 2024

`ModelCheckpoint` callback saves the encoder weights even though they are frozen #723

`ModelCheckpoint` callback saves the encoder weights even though they are frozen #723