Skip to content

Commit

Permalink
[Major] Add CPU offloading support for apply_scale, apply_clip, pseud…
Browse files Browse the repository at this point in the history
…o_quantize_model_weight, real_quantize_model_weight
  • Loading branch information
Abhinav Kulkarni committed Jul 1, 2023
1 parent 95cd9c2 commit d32095a
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions awq/quantize/auto_clip.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,11 @@ def auto_clip_block(module,
# due to qk bmm, it is hard to clip precisely
if any([_ in name for _ in ["q_", "k_", "query", "key", "Wqkv"]]):
continue
named_linears[name].cuda()
max_val = auto_clip_layer(
named_linears[name].weight, input_feat[name], n_bit=w_bit, q_config=q_config)
clip_list.append((name, max_val))
named_linears[name].cpu()
return clip_list


Expand Down

0 comments on commit d32095a

Please sign in to comment.