Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does optimum able to quantize GPT2 model? #90

Open
lucasjinreal opened this issue Mar 9, 2022 · 2 comments
Open

Does optimum able to quantize GPT2 model? #90

lucasjinreal opened this issue Mar 9, 2022 · 2 comments

Comments

@lucasjinreal
Copy link

Does optimum able to quantize GPT2 model?

@echarlaix
Copy link
Collaborator

Hi @jinfagang ,
Yes optimum allows you to apply both dynamic quand static quantization on a GPT2 model.
We however currently support a subset of tasks such as text classification, token classification and question answering.
We plan to add many more in the future (including tasks more relevant to decoder-only and encoder-decoder architectures).

@lucasjinreal
Copy link
Author

@echarlaix thanks for your reply. What I am more interested is that if you have any experience on quantize a huge GPT2 model like 7.6GB in onnx size?

I managed quantize Bert model using onnxruntime built-in feature, but when apply on GPT2 large model, it fails.

If optimum can have an example on large model (specifically for model in 125M params more, model size large than 7GB), small models won't have problems, but large model, it actually where troubles come. Such as onnx doesn't support unified model large than 2GB etc.

So if there any tutorials on quantize very huge model, it would be very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants