You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The encoder_model_quantized is cached correctly using "encoder_file_name"
BUT "file_name" caches the original full sized decoder_model_merged.onnx , instead of the version I call "decoder_model_quantized "
This behaviour mimics fallback, because if I completely remove "file_name" parameter it also demonstrates the same behaviour.
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Issue:
When running model directly from repo, I use the following settings and expect to cache and use the named 'quantised' versions of the model.
I have also tested with decoder_file_name="decoder_model_quantized.onnx",
Expected behavior
Expected behaviour - alternatives:
when explicitly stating the name of the models to be used in the file_name and encoder_file_name parameters, these are the files cached and used from repo
when only explicitly stating the name of the encoder to use: encoder_file_name="encoder_model_quantized.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_quantized.onnx. or
encoder_file_name="encoder_model_fp16.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_merged_fp16.onnx.
Apologies if I've overlooked something the docs or other issues, but I've checked both using github and google site:github.com searches and bupkis.
The text was updated successfully, but these errors were encountered:
System Info
LAMP stack, Debian 10
Python 3.10
pip using OrtModel, onnx, transformer.. , and latest upgraded versions (as at 19 march 2025)
pip install --upgrade optimum transformers
pip install --upgrade huggingface huggingface-hub;
Who can help?
@xenova
Hi Josh
Actual behaviour:
The encoder_model_quantized is cached correctly using "encoder_file_name"
BUT "file_name" caches the original full sized decoder_model_merged.onnx , instead of the version I call "decoder_model_quantized "
This behaviour mimics fallback, because if I completely remove "file_name" parameter it also demonstrates the same behaviour.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Issue:
When running model directly from repo, I use the following settings and expect to cache and use the named 'quantised' versions of the model.
eg.
I have also tested with decoder_file_name="decoder_model_quantized.onnx",
Expected behavior
Expected behaviour - alternatives:
when explicitly stating the name of the models to be used in the file_name and encoder_file_name parameters, these are the files cached and used from repo
when only explicitly stating the name of the encoder to use: encoder_file_name="encoder_model_quantized.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_quantized.onnx. or
encoder_file_name="encoder_model_fp16.onnx", that the FALLBACK should be the corresponding decoder model.. ie. decoder_model_merged_fp16.onnx.
Apologies if I've overlooked something the docs or other issues, but I've checked both using github and google site:github.com searches and bupkis.
The text was updated successfully, but these errors were encountered: