-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch._dynamo.exc.Unsupported: Unsupported: quantized nyi in meta tensors with fake tensor propagation. #8727
Comments
@gpchowdari i'm assuming you're trying to pass in quantized inputs into the model. If so the way you're trying to do it by configuring it as qint8 inputs is not very well supported. The ideal way to do this in ExecuTorch is to use this pass: https://github.com/pytorch/executorch/blob/main/exir/passes/quantize_io_pass.py You can take a look at this code to see how it's invoked: https://github.com/pytorch/executorch/blob/main/exir/tests/test_quantize_io_pass.py |
@tarun292 Thank you for the reply. Context: I am trying to bring one of our multimodal LLM to on-device. I am able to export the model without quantization.
I will go through the provided references. Thanks. |
@gpchowdari oh if that's the error you are seeing then i don't think this is the right fix. Can you share the whole stack trace of the error you're seeing and the export code that you used (without input quantization etc.). |
This is the function used to export and lower with XNNPack backend
stack
|
Seems reasonable to me overall tagging @mcr229 @digantdesai who might be able to help. |
@gpchowdari if you're using dynamic quantization you might have to use the following pass:
unfortunately there is a gap here for smooth quantization in dynamic, which makes this step manual, but we are working on resolving it. |
@mcr229 with the suggestion, I am able to export the model. Thank you :) Inference is failing with Error in XNNPACK: failed to define Fully Connected operator with input ID #2, filter ID #3, bias ID #4, and output ID #1: mismatching datatypes across input (QDINT8), filter (QINT8), bias (FP32), and output (FP32) (xnn_define_fully_connected, executorch/backends/xnnpack/third-party/XNNPACK/src/subgraph/fully-connected.c:1216) |
Update: With the below 2 changes inference is working without issues but very slow.
and
|
@gpchowdari what's the parameter size of your model? given that it's multi-modal and if it's a large model the slowness might be expected. You can profile your model if you're interested. https://pytorch.org/executorch/main/etdump.html |
@gpchowdari that's strange, this should work with XnnpackPartitioner(). Do you mind sharing the graph of the model before and after delegation? |
@tarun292 ~1.6 billion param's. Will check on the profiing. Thank you. |
@mcr229 I am really sorry,Unfortunately I can't share the info. Will try to reproduce with sample, will share it, if I can reproduce. Thank you. |
@gpchowdari understandable. I just want to make sure:
is this change really required to get around the bug you saw? If that's the case would you be able to try something like: [XnnpackDynamicallyQuantizedPartitioner(), XnnpackPartitioner()]? for context XnnpackDyanmicallyQuantizedPartitioner only delegates dynamically quantized linear operations, which means remaing ops will not be accelerated by xnnpack. the second XnnpackPartitioner() call should lower the remaining ops. |
Is there an issue we can add here to close the gap so it can be tracked here? |
@mcr229 |
@gpchowdari do you mind sharing the error that comes up with dynamic_shapes_1? The reason for the inference time improvement is likely that more ops are being delegated to XNNPACK which gives larger speed up. I've actually never seen dynamic shapes specified as dynamic_shapes_2, do you have an example you're referring to for using Dim.STATIC, Dim.AUTO? |
@mcr229 I have referred the example of dynamic_shapes_2 here @https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html |
@gpchowdari Interesting, does using dynamic_shapes_2 solve the use case you have? Otherwise if you could share the error that is happening with dynamic_shapes_1, we can take a look at what's causing the issue to debug. |
@mcr229 |
Thanks for the update. Closing this, feel free to reopen or create another issue. |
🐛 Describe the bug
to_edge_transform_and_lower throwing the error when quantized input is passed in export function.
Sample program to reproduce:
Logs
Versions
cc @digantdesai @mcr229 @cbilgin @mergennachin @byjlw
The text was updated successfully, but these errors were encountered: