-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug causing random initialization of bias when using GPTQ quantization with models without bias #1827
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for fixing @B-201 ! Did you do any perplexity measurement to see if indeed we have a better performance ? Thanks !
@SunMarc Sure! This is my code that measures the perplexity: from evaluate import load
from datasets import load_dataset
input_texts = load_dataset("VMware/open-instruct", split="train")
input_texts = [
p + r for p, r in zip(input_texts["alpaca_prompt"], input_texts["response"])
][:50]
perplexity = load("perplexity", module_type="metric")
results = perplexity.compute(
predictions=input_texts,
model_id="Yi-34B-Chat-GPTQ", # The model obtained through auto_gptq, in which bias are not included
device="cuda",
)
print(round(results["mean_perplexity"], 2)) After executing the code, you will receive the following warning message:
The ppl comparison results before and after the fix are as follows:
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ation with models without bias (huggingface#1827) * Fix gptq quantization for models without bias * Fix gptq quantization for models without bias
What does this PR do?
For some GPTQ quantized models (e.g. TheBloke/Yi-34B-Chat-GPTQ), the Linear layers in the model do not include bias, but when using transformers to load these GPTQ quantized models, optimum randomly initializes bias in the Linear layers, causing a decrease in the model's accuracy. This PR fixes this issue.
Before submitting
Who can review?