Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HQQ to weight compression algorithms for LLMs #3347

Open
1 task done
hello-fri-end opened this issue Mar 16, 2025 · 1 comment
Open
1 task done

Add HQQ to weight compression algorithms for LLMs #3347

hello-fri-end opened this issue Mar 16, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@hello-fri-end
Copy link

🚀 Feature request

HQQ is a popular data-free weight quantization algorithm for LLMs. It would be super cool to add it NNCF's weight compression algorithms. I would like to work on this myself. I understand I need to create my hqq.py file inside nncf/quantization/algorithms/weight_compression dir & I'm currently diving into the implementations of awq and gptq. Currently, I'm having trouble understanding the NNCFGraph object which needs to be passed to the apply method. Are there some docs on how to understand this Graph object? It would also be super helpful if you guys can point me to some code/docs that I can look into to understand the workflow better. Looking forward to contributing 🚀

Feature Use Case

HQQ is a fast and accurate model quantizer that skips the need for calibration data. It offers compression quality competitive with that of calibration-based methods. For instance, HQQ takes less than 5 minutes to process the colossal Llama-2-70B, that’s over 50x faster compared to the widely adopted GPTQ

Are you going to submit a PR?

  • Yes I'd like to help by submitting a PR!
@hello-fri-end hello-fri-end added the enhancement New feature or request label Mar 16, 2025
@alexsu52
Copy link
Contributor

Hello @hello-fri-end,

Thank you for your feature request and for wanting to contribute in NNCF. We are open for contributions, especially new algorithms that improve compression speed or accuracy of the compressed model. I would like to highlight some details:

  • As I know HQQ uses floating point zero points. Сhanges will likely be required on the OpenVINO side to support floating point zero point because OpenVINO supports u8 zero point type. You will need to open an issue in OpenVINO repository once you have a model built with HQQ.
  • HQQ algorithm should support of combination with the AWQ and Scale Estimation algorithms.
  • Support of one OpenVNO or PyTorch backend is enough for merging. I would recomend to use OpenVINO backend.
  • HQQ must demonstrate improvement in accuracy and/or performance of a single or combination of algorithms on the several models. For example: microsoft/Phi-3.5-mini-instruct, Qwen/Qwen2.5-VL-3B-Instruct or Qwen/Qwen2.5-1.5B-Instruct

Yes you are absolute right you should add hqq.py file inside nncf/quantization/algorithms/weight_compression dir. NNCFGraph is used as cross-backend representation of the framework specific model graph to implement cross-backend algorithm. We don't have additional documentation exclude the code, you can look at weight compression algorithms to understand how NNCFGraph is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants