You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @jinfagang ,
Yes optimum allows you to apply both dynamic quand static quantization on a GPT2 model.
We however currently support a subset of tasks such as text classification, token classification and question answering.
We plan to add many more in the future (including tasks more relevant to decoder-only and encoder-decoder architectures).
@echarlaix thanks for your reply. What I am more interested is that if you have any experience on quantize a huge GPT2 model like 7.6GB in onnx size?
I managed quantize Bert model using onnxruntime built-in feature, but when apply on GPT2 large model, it fails.
If optimum can have an example on large model (specifically for model in 125M params more, model size large than 7GB), small models won't have problems, but large model, it actually where troubles come. Such as onnx doesn't support unified model large than 2GB etc.
So if there any tutorials on quantize very huge model, it would be very useful.
Does optimum able to quantize GPT2 model?
The text was updated successfully, but these errors were encountered: