Tags: bertmaher/ao
Tags
Fix 20x slowdown of FP6 kernel due to device properties query (pytorc… …h#1128) Fix 20x slowdown of FP6 kernel due to device properties query (pytorch#1092) Replace `cudaGetDeviceProperties` with `cudaDeviceGetAttribute` Co-authored-by: Tobias van der Werff <[email protected]>
Fix 20x slowdown of FP6 kernel due to device properties query (pytorc… …h#1128) Fix 20x slowdown of FP6 kernel due to device properties query (pytorch#1092) Replace `cudaGetDeviceProperties` with `cudaDeviceGetAttribute` Co-authored-by: Tobias van der Werff <[email protected]>
Don't run mac builds per commit (pytorch#842) * Don't run mac builds per commit * Update and rename build-wheels-m1.yml to build-wheels_m1.yml * Update build-wheels_m1.yml * Update build-wheels_m1.yml
Don't run mac builds per commit (pytorch#842) * Don't run mac builds per commit * Update and rename build-wheels-m1.yml to build-wheels_m1.yml * Update build-wheels_m1.yml * Update build-wheels_m1.yml
Add INT8 mixed-precision training (pytorch#748) * initial commit * expose some UX. update test * add test. update bench * update test. add doc * fix ngpu * fix FSDP * fix * fix fsdp test * fix * grammar * simplify fsdp test * update benchmark script * update * make claim more conservative * register fused adam * update benchmark script * add more ops * update default * use TorchAOBaseTensor * fix fsdp param_dtype * fix param_dtype * dtype check to prevent unnecessary errors * move checks * add note * fix * simplify script * add module-based UX * fix * use FP8 impl of __torch_dispatch__ * rename _dynamice interface * update test * fix compile on 2.4 * log torch version * make log interval customizable * make naming for explicit * update readme * some change * fix big bug * add docstring. update _get_linear_inserter * add TorchAOBaseTensor back * fix FSDP * update FSDP test. add autocast support * reduce iter * update int8_mm fallback * put leading dims logic to _dynamic_int8_mm
PreviousNext