Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to quantize MatMul per-tensor when per_channel=True #12000

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

regisss
Copy link

@regisss regisss commented Jun 27, 2022

Description: When quantizing a model with per_channel=True, we should have the possibility to quantize linear layers in a per_tensor way as it does not make sense to quantize them per-feature. This PR adds this functionality to the MatMul operator: users just have to specify extra_options["QDQOpTypePerChannelSupportToAxis"]["MatMul"] = None to quantize all layers per-channel except the linear ones.

Motivation and Context

@regisss
Copy link
Author

regisss commented Jul 4, 2022

@yufenglee @chilo-ms any feedback on this PR?

@ytaous
Copy link
Contributor

ytaous commented Jul 14, 2022

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

@ytaous
Copy link
Contributor

ytaous commented Jul 14, 2022

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@ytaous ytaous added the quantization issues related to quantization label Jul 15, 2022
@ytaous
Copy link
Contributor

ytaous commented Jul 15, 2022

@yufenglee @chilo-ms

@ghost
Copy link

ghost commented Aug 19, 2022

CLA assistant check
All CLA requirements met.

@yufenglee
Copy link
Member

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

@yufenglee
Copy link
Member

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add QLinearMatMul do not quantize per channel flag to quantize_static extra options
3 participants