Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when converting Exaone 3.0 7.8B model #2202

Open
1 of 4 tasks
Zhaeong opened this issue Feb 27, 2025 · 3 comments
Open
1 of 4 tasks

Issue when converting Exaone 3.0 7.8B model #2202

Zhaeong opened this issue Feb 27, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@Zhaeong
Copy link

Zhaeong commented Feb 27, 2025

System Info

optimum==1.24.0
Python 3.12.4

Who can help?

Hi,

When trying to convert this model to onnx format:
https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

With this code:

from transformers import AutoModelForCausalLM, AutoConfig
from optimum.exporters.onnx import onnx_export_from_model

from optimum.exporters.onnx.config import TextDecoderWithPositionIdsOnnxConfig
from optimum.utils import NormalizedTextConfig

class ExaoneOnnxConfig(TextDecoderWithPositionIdsOnnxConfig):
        DEFAULT_ONNX_OPSET = 19
        NORMALIZED_CONFIG_CLASS = NormalizedTextConfig

model_id = "C:\\huggingface\\EXAONE-3.0-7.8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
# At this point, we could override some submodules, forward methods, weights, etc. from the model.

onnx_config = ExaoneOnnxConfig(
    config=config,
    task="text-generation",
    use_past=True,
    use_past_in_inputs=True,

)

custom_onnx_configs = {
    "model": onnx_config
}


onnx_export_from_model(model,custom_onnx_configs=custom_onnx_configs, output="ex_onnx/", task="text-generation")

I'm getting this issue:

File "C:\Users\amd\.cache\huggingface\modules\transformers_modules\EXAONE-3.0-7.8B-Instruct\modeling_exaone.py", line 850, in forward
    key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\amd\miniconda3\envs\exaone\Lib\site-packages\transformers\cache_utils.py", line 449, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 8 for tensor number 1 in the list.

I think the issue is with difference in num_attention_heads and num_key_value_heads

"num_attention_heads": 32,
"num_key_value_heads": 8,

Is there a way to configure the export?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Can be reproduced with publicly available model

Expected behavior

Without using use_past_in_inputs=True the model is exported as normal.

@Zhaeong Zhaeong added the bug Something isn't working label Feb 27, 2025
@Zhaeong
Copy link
Author

Zhaeong commented Feb 28, 2025

Issue seems to be this line:
https://github.com/huggingface/optimum/blob/main/optimum/utils/input_generators.py#L657

It uses self.num_attention_heads, instead of num_key_value_heads

@jl749
Copy link

jl749 commented Mar 13, 2025

It seems like Exaone and Llama share the same input/output pattern


I was able to export Exaone by passing LlamaOnnxConfig

# transformers==4.47.1
# optimum==1.24.0
import transformers
import optimum.exporters

model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
)
custom_onnx_configs = {
    "model": optimum.exporters.onnx.model_configs.LlamaOnnxConfig(
        config=model.config,
        task="text-generation",
    )
}
optimum.exporters.onnx.onnx_export_from_model(
    model=model,
    task="text-generation",
    output="./hidad",
    opset=17,
    custom_onnx_configs=custom_onnx_configs,
)
Output Log
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
        lm_head.weight: {'onnx::MatMul_11641'}
        transformer.wte.weight: {'transformer.wte.weight'}
                -[x] values not close enough, max diff: 2.956390380859375e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 2.956390380859375e-05.
 The exported model was saved at: hidad

Maybe you were missing DUMMY_INPUT_GENERATOR_CLASSES and DUMMY_PKV_GENERATOR_CLASS?

class LlamaOnnxConfig(TextDecoderWithPositionIdsOnnxConfig):
DEFAULT_ONNX_OPSET = 14 # Llama now uses F.scaled_dot_product_attention by default for torch>=2.1.1.
DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator, MistralDummyPastKeyValuesGenerator)
DUMMY_PKV_GENERATOR_CLASS = MistralDummyPastKeyValuesGenerator
NORMALIZED_CONFIG_CLASS = NormalizedTextConfig

@Zhaeong
Copy link
Author

Zhaeong commented Mar 17, 2025

Hi @jl749,

Thanks for the response, that seems to work with exporting, however when I try to use onnxruntime-genai to do model inferencing I'm running into this error:

genai\examples\csharp\HelloPhi\bin\x64\Debug_DirectML\net6.0\runtimes\win-x64\native\2025-03-16 22:40:36.9555206 [E:onnxruntime:onnxruntime-genai, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running DmlFusedNode_6_81 node. Name:'DmlFusedNode_6_81' Status Message: onnxruntime\core\framework\execution_frame.cc:173 onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue shape && tensor.Shape() == *shape was false. OrtValue shape verification failed. Current shape:{1,8,500,80} Requested shape:{1,8,518,80}
Stacktrace:
onnxruntime\onnxruntime\core\framework\op_kernel.cc(82): onnxruntime!onnxruntime::OpKernelContext::OutputMLValue+0x117

Any clues for finding the cause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants