Add the possibility to quantize MatMul per-tensor when per_channel=True #12000

regisss · 2022-06-27T11:42:25Z

Description: When quantizing a model with per_channel=True, we should have the possibility to quantize linear layers in a per_tensor way as it does not make sense to quantize them per-feature. This PR adds this functionality to the MatMul operator: users just have to specify extra_options["QDQOpTypePerChannelSupportToAxis"]["MatMul"] = None to quantize all layers per-channel except the linear ones.

Motivation and Context

Why is this change required? Linear layers are not independent across features. Thus, we should be able to quantize convolutional layers per channel and linear ones per tensor at the same time.
It fixes add QLinearMatMul do not quantize per channel flag to quantize_static extra options #10283 and Quantize specific ops per-tensor while per_channel=True #11890.

regisss · 2022-07-04T07:00:32Z

@yufenglee @chilo-ms any feedback on this PR?

ytaous · 2022-07-14T20:53:23Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

ytaous · 2022-07-14T20:53:44Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

azure-pipelines · 2022-07-14T20:53:57Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2022-07-14T20:54:14Z

Azure Pipelines successfully started running 8 pipeline(s).

ytaous · 2022-07-15T17:52:54Z

@yufenglee @chilo-ms

ghost · 2022-08-19T16:42:40Z

All CLA requirements met.

yufenglee · 2022-08-19T17:12:04Z

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

yufenglee · 2022-08-19T17:12:20Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

azure-pipelines · 2022-08-19T17:12:32Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2022-08-19T17:12:51Z

Azure Pipelines successfully started running 8 pipeline(s).

yufenglee · 2022-08-19T23:22:52Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

azure-pipelines · 2022-08-19T23:23:27Z

Azure Pipelines successfully started running 9 pipeline(s).

Add the possibility to quantize MatMul per-tensor when per_channel=True

469aff7

ytaous added the quantization issues related to quantization label Jul 15, 2022

regisss added 2 commits August 19, 2022 18:29

Merge branch 'main' into per_channel_matmul

8bcfc5f

Expand doc

753c441

yufenglee approved these changes Aug 19, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the possibility to quantize MatMul per-tensor when per_channel=True #12000

Add the possibility to quantize MatMul per-tensor when per_channel=True #12000

regisss commented Jun 27, 2022

regisss commented Jul 4, 2022

ytaous commented Jul 14, 2022

ytaous commented Jul 14, 2022

azure-pipelines bot commented Jul 14, 2022

azure-pipelines bot commented Jul 14, 2022

ytaous commented Jul 15, 2022

ghost commented Aug 19, 2022 •

edited by ghost

Loading

yufenglee commented Aug 19, 2022

yufenglee commented Aug 19, 2022

azure-pipelines bot commented Aug 19, 2022

azure-pipelines bot commented Aug 19, 2022

yufenglee commented Aug 19, 2022

azure-pipelines bot commented Aug 19, 2022

Add the possibility to quantize MatMul per-tensor when per_channel=True #12000

Are you sure you want to change the base?

Add the possibility to quantize MatMul per-tensor when per_channel=True #12000

Conversation

regisss commented Jun 27, 2022

regisss commented Jul 4, 2022

ytaous commented Jul 14, 2022

ytaous commented Jul 14, 2022

azure-pipelines bot commented Jul 14, 2022

azure-pipelines bot commented Jul 14, 2022

ytaous commented Jul 15, 2022

ghost commented Aug 19, 2022 • edited by ghost Loading

yufenglee commented Aug 19, 2022

yufenglee commented Aug 19, 2022

azure-pipelines bot commented Aug 19, 2022

azure-pipelines bot commented Aug 19, 2022

yufenglee commented Aug 19, 2022

azure-pipelines bot commented Aug 19, 2022

ghost commented Aug 19, 2022 •

edited by ghost

Loading