Additional option for alternative model Paths to load different model formats #335

John42506176Linux · 2024-08-13T20:54:53Z

John42506176Linux
Aug 13, 2024

Feature request

Additional optional for Docker CLI to load alternative paths of models
port=7997
mid_rerank_model=mixedbread-ai/mxbai-rerank-xsmall-v1
volume=$PWD/data

sudo docker run -it --gpus all
-v $volume:/app/.cache
-p $port:$port
michaelf34/infinity:latest
v2
--batch-size 32
--model-id $mid_rerank_model
--alternative_path onnx/model_quantized.onnx
--port $port

Motivation

I'm currently trying to load the quantized ONNX version of a Reranker model I'm using and I currently can't see an easy solution to doing this with the docker CLI.

Your contribution

Help test or fix few simple bugs.

michaelfeil · 2024-08-13T22:41:22Z

michaelfeil
Aug 13, 2024
Maintainer

@John42506176Linux Your request is not realistic. Besides model weights, you also need to load e.g. the config files, tokenizer, and infer the model class from the config.

I would suggest you to pack your custom reranker into a huggingface repo.

I am confused, as for loading onnx models you need to set —engine optimum, which you are not doing.

0 replies

John42506176Linux · 2024-08-14T01:13:56Z

John42506176Linux
Aug 14, 2024
Author

Thanks, for the information I didn't know if this was an option so wanted to double-check.

I didn't include the --engine optimum option in the contribution but that was used in testing.

0 replies

michaelfeil · 2024-08-14T01:50:31Z

michaelfeil
Aug 14, 2024
Maintainer

Thanks, I'll move it to discussions.

0 replies

ashokrajab · 2025-02-28T06:25:58Z

ashokrajab
Feb 28, 2025

def get_onnx_files(
    *,
    model_name_or_path: str,
    revision: Union[str, None] = None,
    use_auth_token: Union[bool, str] = True,
    prefer_quantized=False,
) -> Path:
    """gets the onnx files from the repo"""
    repo_files = _list_all_repo_files(
        model_name_or_path=model_name_or_path,
        revision=revision,
        use_auth_token=use_auth_token,
    )
    pattern = "**.onnx"
    onnx_files = [p for p in repo_files if p.match(pattern)]
    prefered_regex = "quantize" if prefer_quantized else "model.onnx"
    prefered_onnx = [f for f in onnx_files if prefered_regex in f.name]
    if len(onnx_files) > 1:
        logger.info(f"Found {len(onnx_files)} onnx files: {onnx_files}")
        if prefered_onnx:
            onnx_files = prefered_onnx
        onnx_file = onnx_files[-1]
        logger.info(f"Using {onnx_file} as the model")
        return onnx_file
    elif len(onnx_files) == 1:
        return onnx_files[0]
    else:
        raise ValueError(f"No onnx files found for {model_name_or_path} and revision {revision}")

This function decides which onnx file is chosen based on the model_name_or_path and revision.

But consider a hf model like https://huggingface.co/Xenova/bge-small-en-v1.5/tree/main/onnx where it has 3 kinds of onnx files which vary in quantization and they have their own naming convention.

I would be great if we can pass the exact onnx file name that needs to be chosen instead of implicity infering it based on provider to decide whether to chose the quantized onnx file. I'm refering to how the value of prefer_quantized is derived.

prefer_quantized=("cpu" in provider.lower() or "openvino" in provider.lower())

Expectation:
As mentioned by the author of this discussion, it would be great to have a command line argument that takes in the exact onnx file name.

@michaelfeil looking forward to your take on this.

1 reply

michaelfeil Feb 28, 2025
Maintainer

Very valid question!

The convention came from Benchmarking:

int8/qint8 is fast on CPU, but it seems like the kernels of the recent ONNX versions do not seem optimized for GPU. fp16 is often faster on GPU, and quality is better. For CPU, the decision is fp32 or int8, where int8 is much faster.
float16/float32 of https://huggingface.co/Xenova/bge-small-en-v1.5/tree/main/onnx should both be running in fp16.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional option for alternative model Paths to load different model formats #335

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Additional option for alternative model Paths to load different model formats #335

John42506176Linux Aug 13, 2024

Feature request

Motivation

Your contribution

Replies: 4 comments · 1 reply

michaelfeil Aug 13, 2024 Maintainer

John42506176Linux Aug 14, 2024 Author

michaelfeil Aug 14, 2024 Maintainer

ashokrajab Feb 28, 2025

michaelfeil Feb 28, 2025 Maintainer

John42506176Linux
Aug 13, 2024

Replies: 4 comments 1 reply

michaelfeil
Aug 13, 2024
Maintainer

John42506176Linux
Aug 14, 2024
Author

michaelfeil
Aug 14, 2024
Maintainer

ashokrajab
Feb 28, 2025

michaelfeil Feb 28, 2025
Maintainer