Issue when converting Exaone 3.0 7.8B model #2202

Zhaeong · 2025-02-27T22:10:50Z

System Info

optimum==1.24.0
Python 3.12.4

Who can help?

Hi,

When trying to convert this model to onnx format:
https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

With this code:

from transformers import AutoModelForCausalLM, AutoConfig
from optimum.exporters.onnx import onnx_export_from_model

from optimum.exporters.onnx.config import TextDecoderWithPositionIdsOnnxConfig
from optimum.utils import NormalizedTextConfig

class ExaoneOnnxConfig(TextDecoderWithPositionIdsOnnxConfig):
        DEFAULT_ONNX_OPSET = 19
        NORMALIZED_CONFIG_CLASS = NormalizedTextConfig

model_id = "C:\\huggingface\\EXAONE-3.0-7.8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
# At this point, we could override some submodules, forward methods, weights, etc. from the model.

onnx_config = ExaoneOnnxConfig(
    config=config,
    task="text-generation",
    use_past=True,
    use_past_in_inputs=True,

)

custom_onnx_configs = {
    "model": onnx_config
}


onnx_export_from_model(model,custom_onnx_configs=custom_onnx_configs, output="ex_onnx/", task="text-generation")

I'm getting this issue:

File "C:\Users\amd\.cache\huggingface\modules\transformers_modules\EXAONE-3.0-7.8B-Instruct\modeling_exaone.py", line 850, in forward
    key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\amd\miniconda3\envs\exaone\Lib\site-packages\transformers\cache_utils.py", line 449, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 8 for tensor number 1 in the list.

I think the issue is with difference in num_attention_heads and num_key_value_heads

"num_attention_heads": 32,
"num_key_value_heads": 8,

Is there a way to configure the export?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Can be reproduced with publicly available model

Expected behavior

Without using use_past_in_inputs=True the model is exported as normal.

The text was updated successfully, but these errors were encountered:

Zhaeong · 2025-02-28T18:37:54Z

Issue seems to be this line:
https://github.com/huggingface/optimum/blob/main/optimum/utils/input_generators.py#L657

It uses self.num_attention_heads, instead of num_key_value_heads

jl749 · 2025-03-13T04:27:16Z

It seems like Exaone and Llama share the same input/output pattern

I was able to export Exaone by passing LlamaOnnxConfig

# transformers==4.47.1
# optimum==1.24.0
import transformers
import optimum.exporters

model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
)
custom_onnx_configs = {
    "model": optimum.exporters.onnx.model_configs.LlamaOnnxConfig(
        config=model.config,
        task="text-generation",
    )
}
optimum.exporters.onnx.onnx_export_from_model(
    model=model,
    task="text-generation",
    output="./hidad",
    opset=17,
    custom_onnx_configs=custom_onnx_configs,
)

Output Log

Found different candidate ONNX initializers (likely duplicate) for the tied weights:
        lm_head.weight: {'onnx::MatMul_11641'}
        transformer.wte.weight: {'transformer.wte.weight'}
                -[x] values not close enough, max diff: 2.956390380859375e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 2.956390380859375e-05.
 The exported model was saved at: hidad

Maybe you were missing DUMMY_INPUT_GENERATOR_CLASSES and DUMMY_PKV_GENERATOR_CLASS?

optimum/optimum/exporters/onnx/model_configs.py

Lines 347 to 352 in c2259ea

    
           class LlamaOnnxConfig(TextDecoderWithPositionIdsOnnxConfig): 
        
               DEFAULT_ONNX_OPSET = 14  # Llama now uses F.scaled_dot_product_attention by default for torch>=2.1.1. 
        
               DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, MistralDummyPastKeyValuesGenerator) 
        
               DUMMY_PKV_GENERATOR_CLASS = MistralDummyPastKeyValuesGenerator 
        
               NORMALIZED_CONFIG_CLASS = NormalizedTextConfig

Zhaeong · 2025-03-17T06:06:21Z

Hi @jl749,

Thanks for the response, that seems to work with exporting, however when I try to use onnxruntime-genai to do model inferencing I'm running into this error:

genai\examples\csharp\HelloPhi\bin\x64\Debug_DirectML\net6.0\runtimes\win-x64\native\2025-03-16 22:40:36.9555206 [E:onnxruntime:onnxruntime-genai, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running DmlFusedNode_6_81 node. Name:'DmlFusedNode_6_81' Status Message: onnxruntime\core\framework\execution_frame.cc:173 onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,8,500,80} Requested shape:{1,8,518,80}
Stacktrace:
onnxruntime\onnxruntime\core\framework\op_kernel.cc(82): onnxruntime!onnxruntime::OpKernelContext::OutputMLValue+0x117

Any clues for finding the cause?

Zhaeong added the bug Something isn't working label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when converting Exaone 3.0 7.8B model #2202

Issue when converting Exaone 3.0 7.8B model #2202

Zhaeong commented Feb 27, 2025 •

edited

Loading

Zhaeong commented Feb 28, 2025

jl749 commented Mar 13, 2025

Zhaeong commented Mar 17, 2025

Issue when converting Exaone 3.0 7.8B model #2202

Issue when converting Exaone 3.0 7.8B model #2202

Comments

Zhaeong commented Feb 27, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Zhaeong commented Feb 28, 2025

jl749 commented Mar 13, 2025

Zhaeong commented Mar 17, 2025

Zhaeong commented Feb 27, 2025 •

edited

Loading