Does optimum able to quantize GPT2 model? #90

lucasjinreal · 2022-03-09T13:40:56Z

Does optimum able to quantize GPT2 model?

echarlaix · 2022-03-31T11:51:23Z

Hi @jinfagang ,
Yes optimum allows you to apply both dynamic quand static quantization on a GPT2 model.
We however currently support a subset of tasks such as text classification, token classification and question answering.
We plan to add many more in the future (including tasks more relevant to decoder-only and encoder-decoder architectures).

lucasjinreal · 2022-03-31T13:36:14Z

@echarlaix thanks for your reply. What I am more interested is that if you have any experience on quantize a huge GPT2 model like 7.6GB in onnx size?

I managed quantize Bert model using onnxruntime built-in feature, but when apply on GPT2 large model, it fails.

If optimum can have an example on large model (specifically for model in 125M params more, model size large than 7GB), small models won't have problems, but large model, it actually where troubles come. Such as onnx doesn't support unified model large than 2GB etc.

So if there any tutorials on quantize very huge model, it would be very useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does optimum able to quantize GPT2 model? #90

Does optimum able to quantize GPT2 model? #90

lucasjinreal commented Mar 9, 2022

echarlaix commented Mar 31, 2022

lucasjinreal commented Mar 31, 2022

Does optimum able to quantize GPT2 model? #90

Does optimum able to quantize GPT2 model? #90

Comments

lucasjinreal commented Mar 9, 2022

echarlaix commented Mar 31, 2022

lucasjinreal commented Mar 31, 2022