-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF4 per-channel support for AWQ and Scale Estimation #2898
NF4 per-channel support for AWQ and Scale Estimation #2898
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change code in scale_estimation to avoid usage of calculate_normalized_weight_and_fp4_scale, but it requires bigger refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong target in scale estimation with nf4
@ljaljushkin , can you actualize it? |
97703b8
to
b6f840d
Compare
@andreyanufr Fixed, added test for that. Please take a look: |
Rebased on latest changes. |
Changes
Supported NF4 mode for Scale Estimation and AWQ.
All results below were collected w/ and w/o Scale estimation algorithms and w/ Lora Correction algorithm.
Reason for changes
NF4 per-channel with scale estimation may give promising results for NPU, since the accuracy is on par with int4 group-wise quantization.
Related tickets
150560
Tests
job/NNCF/job/manual/job/post_training_weight_compression/182
job/NNCF/job/manual/job/post_training_weight_compression/181
job/NNCF/job/manual/job/post_training_weight_compression/180