Feature request: allow user to provide tokenizer when loading transformer model #320

jessecambon · 2022-07-27T20:01:32Z

Feature request

When I try to load a locally saved transformers model with ORTModelForSequenceClassification.from_pretrained(<path>, from_transformers=True) an error occurs ("unable to generate dummy inputs for model") unless I also save the tokenizer in the checkpoint. A reproducible example of this is below.

A way to pass a tokenizer object to from_pretrained() would be helpful to avoid this problem.

orig_model="prajjwal1/bert-tiny" 
saved_model_path='saved_model'

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load a model from the hub and save it locally
model = AutoModelForSequenceClassification.from_pretrained(orig_model)
model.save_pretrained(saved_model_path)

tokenizer=AutoTokenizer.from_pretrained(orig_model)

# attempt to load the locally saved model and convert to Onnx
loaded_model=ORTModelForSequenceClassification.from_pretrained(
    saved_model_path,
    from_transformers=True
    )

Produces error:

Traceback (most recent call last):
  File "optimum_loading_reprex.py", line 21, in <module>
    loaded_model=ORTModelForSequenceClassification.from_pretrained(
  File "/home/cambonator/anaconda3/envs/onnx/lib/python3.8/site-packages/optimum/modeling_base.py", line 201, in from_pretrained
    return cls._from_transformers(
  File "/home/cambonator/anaconda3/envs/onnx/lib/python3.8/site-packages/optimum/onnxruntime/modeling_ort.py", line 275, in _from_transformers
    export(
  File "/home/cambonator/anaconda3/envs/onnx/lib/python3.8/site-packages/transformers/onnx/convert.py", line 335, in export
    return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
  File "/home/cambonator/anaconda3/envs/onnx/lib/python3.8/site-packages/transformers/onnx/convert.py", line 142, in export_pytorch
    model_inputs = config.generate_dummy_inputs(preprocessor, framework=TensorType.PYTORCH)
  File "/home/cambonator/anaconda3/envs/onnx/lib/python3.8/site-packages/transformers/onnx/config.py", line 334, in generate_dummy_inputs
    raise ValueError(
ValueError: Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor.

Package versions

transformers: 4.20.1
optimum: 1.3.0
onnxruntime: 1.11.1
torch: 1.11.0

Motivation

Saving the tokenizer to the model checkpoint is a step that could be eliminated if there were a way to provide a tokenizer to ORTModelForSequenceClassification.from_pretrained()

Your contribution

I'm not currently sure where to start on implementing this feature, but would be happy to help with some guidance.

The text was updated successfully, but these errors were encountered:

jmwoloso · 2022-07-27T20:34:39Z

+1
Related to #210 to support alternative workflows when run-time tokenization isn't possible or feasible.

michaelbenayoun · 2022-10-14T09:29:26Z

Hi @jessecambon,
This seems related to the ONNX export. We are currently working on adding support for this in optimum, and providing a tokenizer will not be needed to perform the export.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: allow user to provide tokenizer when loading transformer model #320

Feature request: allow user to provide tokenizer when loading transformer model #320

jessecambon commented Jul 27, 2022

jmwoloso commented Jul 27, 2022

michaelbenayoun commented Oct 14, 2022

Feature request: allow user to provide tokenizer when loading transformer model #320

Feature request: allow user to provide tokenizer when loading transformer model #320

Comments

jessecambon commented Jul 27, 2022

Feature request

Motivation

Your contribution

jmwoloso commented Jul 27, 2022

michaelbenayoun commented Oct 14, 2022