Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching xenova repo onnx files - filename parameter not working #2218

Open
2 of 4 tasks
gidzr opened this issue Mar 18, 2025 · 1 comment
Open
2 of 4 tasks

Caching xenova repo onnx files - filename parameter not working #2218

gidzr opened this issue Mar 18, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@gidzr
Copy link

gidzr commented Mar 18, 2025

System Info

LAMP stack, Debian 10
Python 3.10
pip using OrtModel, onnx, transformer.. , and latest upgraded versions (as at 19 march 2025)

pip install --upgrade optimum transformers
pip install --upgrade huggingface huggingface-hub;

Who can help?

@xenova

Hi Josh

Actual behaviour:

The encoder_model_quantized is cached correctly using "encoder_file_name"
BUT "file_name" caches the original full sized decoder_model_merged.onnx , instead of the version I call "decoder_model_quantized "

This behaviour mimics fallback, because if I completely remove "file_name" parameter it also demonstrates the same behaviour.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Issue:
When running model directly from repo, I use the following settings and expect to cache and use the named 'quantised' versions of the model.

eg.

config = AutoConfig.from_pretrained(f'{path}')
model = ORTModelForSeq2SeqLM.from_pretrained(
    "Xenova/opus-mt-en-de",
    config=config,
    subfolder="onnx",
    encoder_file_name="encoder_model_quantized.onnx",
    file_name="decoder_model_quantized.onnx",
    accelerator="ort",
)

I have also tested with decoder_file_name="decoder_model_quantized.onnx",

Expected behavior

Expected behaviour - alternatives:

  1. when explicitly stating the name of the models to be used in the file_name and encoder_file_name parameters, these are the files cached and used from repo

  2. when only explicitly stating the name of the encoder to use: encoder_file_name="encoder_model_quantized.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_quantized.onnx. or
    encoder_file_name="encoder_model_fp16.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_merged_fp16.onnx.

Apologies if I've overlooked something the docs or other issues, but I've checked both using github and google site:github.com searches and bupkis.

@gidzr gidzr added the bug Something isn't working label Mar 18, 2025
@xenova xenova transferred this issue from huggingface/transformers Mar 19, 2025
@xenova xenova transferred this issue from huggingface/transformers.js Mar 19, 2025
@xenova
Copy link
Contributor

xenova commented Mar 19, 2025

Hi there 👋 I've transferred this to the Optimum repo, since that is the library for ORTModelForSeq2SeqLM.

It looks like we don't currently support passing the merged decoder, but this is probably something we should add!

[DECODER_MERGED_ONNX_FILE_PATTERN],

cc @echarlaix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants