Skip to content

Commit

Permalink
Reduce peak VRAM by releasing large attention tensors (as soon as the…
Browse files Browse the repository at this point in the history
…y're unnecessary) (huggingface#3463)

Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size.
  • Loading branch information
cmdr2 authored May 17, 2023
1 parent 3ebd2d1 commit bd78f63
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/diffusers/models/attention_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,11 +344,14 @@ def get_attention_scores(self, query, key, attention_mask=None):
beta=beta,
alpha=self.scale,
)
del baddbmm_input

if self.upcast_softmax:
attention_scores = attention_scores.float()

attention_probs = attention_scores.softmax(dim=-1)
del attention_scores

attention_probs = attention_probs.to(dtype)

return attention_probs
Expand Down

0 comments on commit bd78f63

Please sign in to comment.