Tags · bertmaher/ao

ciflow/rocm/999

Merge branch 'main' into msaroufim-patch-21

Oct 25, 2024
fda91b3
zip
tar.gz

v0.6.1

Fix 20x slowdown of FP6 kernel due to device properties query (pytorc…

…h#1128)

Fix 20x slowdown of FP6 kernel due to device properties query (pytorch#1092)

Replace `cudaGetDeviceProperties` with `cudaDeviceGetAttribute`

Co-authored-by: Tobias van der Werff <[email protected]>

Oct 21, 2024
99c8d52
zip
tar.gz

v0.6.1-rc2

Fix 20x slowdown of FP6 kernel due to device properties query (pytorc…

…h#1128)

Fix 20x slowdown of FP6 kernel due to device properties query (pytorch#1092)

Replace `cudaGetDeviceProperties` with `cudaDeviceGetAttribute`

Co-authored-by: Tobias van der Werff <[email protected]>

Oct 21, 2024
99c8d52
zip
tar.gz

v0.6.1-rc1

Update README.md (pytorch#1036)

Oct 9, 2024
900f9ac
zip
tar.gz

v0.6.0

Update README.md (pytorch#1036)

Oct 9, 2024
900f9ac
zip
tar.gz

v0.6.0-rc1

Update README.md (pytorch#1036)

Oct 9, 2024
900f9ac
zip
tar.gz

v0.5.0

Don't run mac builds per commit (pytorch#842)

* Don't run mac builds per commit

* Update and rename build-wheels-m1.yml to build-wheels_m1.yml

* Update build-wheels_m1.yml

* Update build-wheels_m1.yml

Sep 10, 2024
ae8384b
zip
tar.gz

v0.5.0-rc3

Don't run mac builds per commit (pytorch#842)

* Don't run mac builds per commit

* Update and rename build-wheels-m1.yml to build-wheels_m1.yml

* Update build-wheels_m1.yml

* Update build-wheels_m1.yml

Sep 10, 2024
ae8384b
zip
tar.gz

v0.5.0-rc2

Add INT8 mixed-precision training (pytorch#748)

* initial commit

* expose some UX. update test

* add test. update bench

* update test. add doc

* fix ngpu

* fix FSDP

* fix

* fix fsdp test

* fix

* grammar

* simplify fsdp test

* update benchmark script

* update

* make claim more conservative

* register fused adam

* update benchmark script

* add more ops

* update default

* use TorchAOBaseTensor

* fix fsdp param_dtype

* fix param_dtype

* dtype check to prevent unnecessary errors

* move checks

* add note

* fix

* simplify script

* add module-based UX

* fix

* use FP8 impl of __torch_dispatch__

* rename _dynamice interface

* update test

* fix compile on 2.4

* log torch version

* make log interval customizable

* make naming for explicit

* update readme

* some change

* fix big bug

* add docstring. update _get_linear_inserter

* add TorchAOBaseTensor back

* fix FSDP

* update FSDP test. add autocast support

* reduce iter

* update int8_mm fallback

* put leading dims logic to _dynamic_int8_mm

Sep 9, 2024
3f7fc14
zip
tar.gz

v0.5.0-rc1

[StaticQuant] Update how block_size is calculated with Observers (pyt…

…orch#815)

Sep 5, 2024
a1b3e67
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ciflow/rocm/999

v0.6.1

v0.6.1-rc2

v0.6.1-rc1

v0.6.0

v0.6.0-rc1

v0.5.0

v0.5.0-rc3

v0.5.0-rc2

v0.5.0-rc1

Tags: bertmaher/ao