[FR]: prepare_qnn_config should not overwrite op_types_to_quantize if explicitly set #1552

xieofxie · 2025-01-15T05:28:22Z

Proposal Summary

I suggest we add the logic here after

Line 488 in c1e1365

    
           run_config = {k: v for k, v in inspect.getmembers(qnn_config) if not k.startswith("_")}

if config["op_types_to_quantize"]:
    run_config["op_types_to_quantize"] = config["op_types_to_quantize"]

What component(s) does this request affect?

The text was updated successfully, but these errors were encountered:

jambayk · 2025-01-21T19:10:57Z

Hi, do you have an example use case where you want to only quantize some ops?

When using QNN EP, the entire model needs to be to fully quantized so that the whole model runs in a single qnn graph. Otherwise, I think it gets split into multiple subgraphs or becomes incompatible with qnn ep. Is your intention to split the model on purpose?

Regardless, I think it might be better to ask for this feature in onnxruntime get_qnn_qdq_config https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/execution_providers/qnn/quant_config.py#L43. We can update the call to this according in olive. This way, we can keep the qnn ep related logic in the ort quantization code instead of doing the op_types_to_quantize overwrite in Olive, especially since we don't know if this should be allowed or not.

xieofxie · 2025-01-22T07:53:35Z

I am quantizing unet of stable diffusion 2.1 and I found that it will only work fine when I don't quantize the constant ops. From log, it looks like everything is running in qnn ep

2025-01-22 14:25:50.2318910 [I:onnxruntime:, qnn_execution_provider.cc:788 onnxruntime::QNNExecutionProvider::GetCapability] Number of partitions supported by QNN EP: 1, number of nodes in the graph: 6201, number of nodes supported by QNN: 6201
2025-01-22 14:25:53.3775343 [I:onnxruntime:, qnn_execution_provider.cc:788 onnxruntime::QNNExecutionProvider::GetCapability] Number of partitions supported by QNN EP: 1, number of nodes in the graph: 7101, number of nodes supported by QNN: 7101

From what I am understanding https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/SupportedOps.html we could run ops in qnn ep without quantizing in fp16 as long as it is supported? So it is acceptable that we do not quantize everything.

Yes, I will create a pr there

jambayk · 2025-01-22T19:17:50Z

I see. Thanks for the context.

For the time being, what if you apply the [peephole optimizer](https://microsoft.github.io/Olive/how-to/configure-workflows/model-opt-and-transform/onnx.html#peeophole-optimizer (needs onnxoptimizer installed) to do constant folding before the quantization step. This step should convert the constants into initializers, so they won't get separately quantized.

Looks like what's happening is the constants tensors get weight quantized but the output of the constants again gets activation quantized.

xieofxie · 2025-01-23T02:28:37Z

Look like not working.. I compare the number of initializers with and without peephole, they are both 4930

jambayk · 2025-01-23T02:36:35Z

Thanks for checking. btw, did you have onnxoptimizer installed. My response had originally said onnxscript instead of onnxoptimizer. just wanted to make sure, you saw the updated response.

jambayk · 2025-01-23T02:42:24Z

Could you try another thing: in your quantization pass config, please set "quant_preprocess": true.

xieofxie · 2025-01-23T03:46:57Z

Oh.. it is working now. Not sure why doc said it is default true but I have to set it explicitly in json. If it is preprocessed, then there is no constant op any more. Peephole is not needed.

At least this is not needed for my case.. So it depends on Onnxruntime team to see if this

From what I am understanding https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/SupportedOps.html we could run ops in qnn ep without quantizing in fp16 as long as it is supported? So it is acceptable that we do not quantize everything.

makes sense.

jambayk · 2025-01-23T03:52:26Z

great! thanks for checking.

I set the default to False if the ep is qnnep because I thought a preprocessed model would not be compatible with qnn ep

Olive/olive/passes/onnx/quantization.py

Line 611 in 28beb3f

config["quant_preprocess"].default_value = False

But I have found recently that preprocessing is needed most of the times. Along with the constant folding, the quantizer also needs a shape inferred model which is also done by the preprocessing.

I am planning to remove this False default for qnn eps.

…nn ep (#1565) ## Describe your changes `quant_preprocess` was originally set to `False` by default because I thought it might not be compatible with QNN EP. However, it is needed for most models for the quantization to work properly: - The quantizer needs a shape inferred model to quantize some tensors. - Model quality is bad if constants are not made into initializers (see #1552). ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. - [ ] Is this PR including examples changes? If yes, please remember to update [example documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md) in a follow-up PR. ## (Optional) Issue link

xieofxie · 2025-01-24T08:03:54Z

I found that I could not use Add and Softmax in the text encoder of stable diffusion, do you have any idea about this? @jambayk

jambayk · 2025-01-24T17:55:26Z

Sorry, I am not familiar with this problem. Do you mean that add and softmax in the text encoder doesn't work with qnn ep?

xieofxie · 2025-01-26T01:43:51Z

I mean the accuracy drops when I tested the qdq model via cpu. I am wondering if there is some optimization technique to make them more accurate? Like for Constant op, we could use preprocess to make them into initializer

jambayk · 2025-01-27T23:14:10Z

I see. I don't think the constant->initializer change can be applied to other ops. The constant fix worked since we are avoiding quantizing the same tensor twice. but for other operators like add, we have to quantize the op if we want to run it on the NPU as a single graph. You could manually set tensor overrides as described in https://github.com/microsoft/onnxruntime/blob/8db97a68f2629aa32a3ab318e741555f72151aca/onnxruntime/python/tools/quantization/quantize.py#L179 but that might be too complicated.

xieofxie added the enhancement New feature or request label Jan 15, 2025

xieofxie mentioned this issue Jan 22, 2025

add op_types_to_quantize to get_qnn_qdq_config microsoft/onnxruntime#23458

Open

jambayk mentioned this issue Jan 23, 2025

OnnxStaticQuantization: Remove quant_preprocess=False default for qnn ep #1565

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR]: prepare_qnn_config should not overwrite op_types_to_quantize if explicitly set #1552

[FR]: prepare_qnn_config should not overwrite op_types_to_quantize if explicitly set #1552

xieofxie commented Jan 15, 2025

jambayk commented Jan 21, 2025

xieofxie commented Jan 22, 2025

jambayk commented Jan 22, 2025 •

edited

Loading

xieofxie commented Jan 23, 2025

jambayk commented Jan 23, 2025

jambayk commented Jan 23, 2025

xieofxie commented Jan 23, 2025

jambayk commented Jan 23, 2025

xieofxie commented Jan 24, 2025

jambayk commented Jan 24, 2025

xieofxie commented Jan 26, 2025

jambayk commented Jan 27, 2025

[FR]: prepare_qnn_config should not overwrite op_types_to_quantize if explicitly set #1552

[FR]: prepare_qnn_config should not overwrite op_types_to_quantize if explicitly set #1552

Comments

xieofxie commented Jan 15, 2025

Proposal Summary

What component(s) does this request affect?

jambayk commented Jan 21, 2025

xieofxie commented Jan 22, 2025

jambayk commented Jan 22, 2025 • edited Loading

xieofxie commented Jan 23, 2025

jambayk commented Jan 23, 2025

jambayk commented Jan 23, 2025

xieofxie commented Jan 23, 2025

jambayk commented Jan 23, 2025

xieofxie commented Jan 24, 2025

jambayk commented Jan 24, 2025

xieofxie commented Jan 26, 2025

jambayk commented Jan 27, 2025

jambayk commented Jan 22, 2025 •

edited

Loading