Caching xenova repo onnx files - filename parameter not working #2218

gidzr · 2025-03-18T22:25:45Z

System Info

LAMP stack, Debian 10
Python 3.10
pip using OrtModel, onnx, transformer.. , and latest upgraded versions (as at 19 march 2025)

pip install --upgrade optimum transformers
pip install --upgrade huggingface huggingface-hub;

Who can help?

@xenova

Hi Josh

Actual behaviour:

The encoder_model_quantized is cached correctly using "encoder_file_name"
BUT "file_name" caches the original full sized decoder_model_merged.onnx , instead of the version I call "decoder_model_quantized "

This behaviour mimics fallback, because if I completely remove "file_name" parameter it also demonstrates the same behaviour.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Issue:
When running model directly from repo, I use the following settings and expect to cache and use the named 'quantised' versions of the model.

eg.

config = AutoConfig.from_pretrained(f'{path}')
model = ORTModelForSeq2SeqLM.from_pretrained(
    "Xenova/opus-mt-en-de",
    config=config,
    subfolder="onnx",
    encoder_file_name="encoder_model_quantized.onnx",
    file_name="decoder_model_quantized.onnx",
    accelerator="ort",
)

I have also tested with decoder_file_name="decoder_model_quantized.onnx",

Expected behavior

Expected behaviour - alternatives:

when explicitly stating the name of the models to be used in the file_name and encoder_file_name parameters, these are the files cached and used from repo
when only explicitly stating the name of the encoder to use: encoder_file_name="encoder_model_quantized.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_quantized.onnx. or
encoder_file_name="encoder_model_fp16.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_merged_fp16.onnx.

Apologies if I've overlooked something the docs or other issues, but I've checked both using github and google site:github.com searches and bupkis.

The text was updated successfully, but these errors were encountered:

xenova · 2025-03-19T00:49:27Z

Hi there 👋 I've transferred this to the Optimum repo, since that is the library for ORTModelForSeq2SeqLM.

It looks like we don't currently support passing the merged decoder, but this is probably something we should add!

optimum/optimum/onnxruntime/modeling_seq2seq.py

Line 772 in 3adbe7c

[DECODER_MERGED_ONNX_FILE_PATTERN],

cc @echarlaix

gidzr added the bug Something isn't working label Mar 18, 2025

xenova transferred this issue from huggingface/transformers Mar 19, 2025

xenova transferred this issue from huggingface/transformers.js Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching xenova repo onnx files - filename parameter not working #2218

Caching xenova repo onnx files - filename parameter not working #2218

gidzr commented Mar 18, 2025 •

edited

Loading

xenova commented Mar 19, 2025

Caching xenova repo onnx files - filename parameter not working #2218

Caching xenova repo onnx files - filename parameter not working #2218

Comments

gidzr commented Mar 18, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

xenova commented Mar 19, 2025

gidzr commented Mar 18, 2025 •

edited

Loading