Tags · nmacchioni/ao

v0.4.0

Fixing linear_activation_tensor dynamic quant (pytorch#622)

Summary: dynamic quant was broken for generate due to no repr function

Test Plan: sh benchmarks.sh

20240806170037, tok/s=  9.54, mem/s=  63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8

Reviewers:

Subscribers:

Tasks:

Tags:

Aug 7, 2024
245ab4e
zip
tar.gz

v0.4.0-rc5

fix version check

Aug 6, 2024
febeaac
zip
tar.gz

v0.4.0-rc4

fix atol test again

Aug 6, 2024
8f33f55
zip
tar.gz

v0.4.0-rc3

skip test cases that rely on pt 2.5

Aug 6, 2024
d0c8d49
zip
tar.gz

v0.4.0-rc2

Fix FP6-LLM API and add .to(device) op (pytorch#599)

* fix

* add some ops for convenience

Co-authored-by: Thien Tran <[email protected]>

Aug 5, 2024
55d556b
zip
tar.gz

v0.4.0-rc1

Update version.txt

Aug 2, 2024
fa39f90
zip
tar.gz

v0.3.1-rc1

Fix crash when PYTORCH_VERSION is not defined (pytorch#455)

Jun 27, 2024
af052d0
zip
tar.gz

v0.3.0

[Release Only] Pin generate-matrix to release 2.3.1 (pytorch#352)

Jun 25, 2024
a2ba345
zip
tar.gz

v0.3.0-rc3

[Relase only] Add test-infra-ref release/2.3 as input parameter

Jun 25, 2024
241c174
zip
tar.gz

v0.3.0-rc2

[Release Only] Pin generate-matrix to release 2.3.1 (pytorch#352)

Jun 25, 2024
a2ba345
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

v0.4.0-rc5

v0.4.0-rc4

v0.4.0-rc3

v0.4.0-rc2

v0.4.0-rc1

v0.3.1-rc1

v0.3.0

v0.3.0-rc3

v0.3.0-rc2

Tags: nmacchioni/ao