add BAAI/bge-small-en-v1.5 Optimization #1634

xieofxie · 2025-02-21T07:02:52Z

Describe your changes

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

(Optional) Issue link

examples/bge/user_script.py

+                model_output = self.model.run_session(self.session, model_inputs)
+            model_output = model_output.last_hidden_state.numpy()
+        # select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.
+        return model_output[:, 0, :]


jambayk · 2025-02-21T07:59:48Z

examples/bge/readme.md

+
+The precision will drop when Add or Softmax types of op are quantized, so they are not included.
+
+| Quantized Ops | precision |


This is interesting. Would it be possible to measure the latency for each case so we get what the accuracy vs latency tradeoff is?

If the tradeoff is large maybe we can spend some time investigating the cause of the bad accuracy.

Sure. Change to draft since I am still working on npu part. The pr is created in cpu machine

jambayk · 2025-02-21T08:02:41Z

examples/bge/bge-small-en-v1.5_ptq_qnn.json

+            "calibrate_method": "MinMax",
+            "quant_preprocess": true,
+            "prepare_qnn_config": true,
+            "op_types_to_quantize": [


I have a dev branch where I introduce an option called op_type_to_exclude which is used to modify op_types_to_quantize and nodes_to_exclude.

Olive/olive/passes/onnx/quantization.py

Line 57 in 6e5c3b4

"op_types_to_exclude": PassConfigParam(

Looks like it might be useful here too when it gets merged

otherwise, we need to know all of the op types present in the model.

currently use append_first_op_types_to_quantize_list with nodes_to_exclude will do this. Will we also update this logic?

Olive/olive/passes/onnx/quantization.py

Line 409 in 043f7e1

if run_config["append_first_op_types_to_quantize_list"]:

honestly, I am not sure why this option was added and if it is used for anything right now.

Not sure if we will touch this option and related logic but I plan to update the logic to be able to use op_types_to_exclude and nodes_to_exclude. The op_types_to_exclude has been very useful for me when I know I don't want to quantize all nodes for an op.

also created this PR in ort microsoft/onnxruntime#23779.

hualxie added 15 commits February 13, 2025 19:22

add basic eval

9fdba1a

0.8574675324675324

f6fdc65

Merge remote-tracking branch 'origin/main' into hualxie/bge

d587239

0.5321753246753247

b7c555d

add DynamicToFixedShape

72f3ef4

auto qdq

bece557

fix

4bcea27

unsure how to debug

87dc670

better error

bf8c5ac

update code

69fbd76

greedy search

e9ab69d

temp data

f2b6304

also no Add and Softmax

c09c457

Merge remote-tracking branch 'origin/main' into hualxie/bge

bad50c0

nit

0995c8d

github-advanced-security bot found potential problems Feb 21, 2025

View reviewed changes

examples/bge/user_script.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Feb 21, 2025

View reviewed changes

linter

93ae4c4

github-advanced-security bot found potential problems Feb 21, 2025

View reviewed changes

jambayk reviewed Feb 21, 2025

View reviewed changes

xieofxie marked this pull request as draft February 21, 2025 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add BAAI/bge-small-en-v1.5 Optimization #1634

add BAAI/bge-small-en-v1.5 Optimization #1634

xieofxie commented Feb 21, 2025 •

edited

Loading

jambayk Feb 21, 2025

xieofxie Feb 21, 2025

jambayk Feb 21, 2025 •

edited

Loading

jambayk Feb 21, 2025

xieofxie Feb 21, 2025

jambayk Feb 21, 2025

jambayk Feb 21, 2025


		The precision will drop when Add or Softmax types of op are quantized, so they are not included.

		\| Quantized Ops \| precision \|

add BAAI/bge-small-en-v1.5 Optimization #1634

Are you sure you want to change the base?

add BAAI/bge-small-en-v1.5 Optimization #1634

Conversation

xieofxie commented Feb 21, 2025 • edited Loading

Describe your changes

Checklist before requesting a review

(Optional) Issue link

jambayk Feb 21, 2025

Choose a reason for hiding this comment

xieofxie Feb 21, 2025

Choose a reason for hiding this comment

jambayk Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

jambayk Feb 21, 2025

Choose a reason for hiding this comment

xieofxie Feb 21, 2025

Choose a reason for hiding this comment

jambayk Feb 21, 2025

Choose a reason for hiding this comment

jambayk Feb 21, 2025

Choose a reason for hiding this comment

xieofxie commented Feb 21, 2025 •

edited

Loading

jambayk Feb 21, 2025 •

edited

Loading