Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP8 types support in NNCF graph building #3344

Merged
merged 4 commits into from
Mar 18, 2025

Conversation

alexsu52
Copy link
Contributor

@alexsu52 alexsu52 commented Mar 14, 2025

Changes

  • Added support for "f8e4m3" and "f8e5m2" types in NNCF graph building.
  • Extended TensorDataType to TensorDataType.f8e5m2, TensorDataType.f8e4m3, TensorDataType.nf4

Reason for changes

Support for compression of models with fp8 and nf4 weights.

Related tickets

ref: 164161

Tests

test_compare_nncf_graph_precision_synthetic_models

@github-actions github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Mar 14, 2025
@alexsu52 alexsu52 marked this pull request as ready for review March 17, 2025 05:24
@alexsu52 alexsu52 requested a review from a team as a code owner March 17, 2025 05:24
@nikita-savelyevv
Copy link
Collaborator

Which usage scenario does this PR enable? Compression/quantization of already fp8-quantized models?

Copy link

@jane-intel jane-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR enabled us to convert DeepSeek R1 model (with original f8 weights) via optimum intel. Thank you.

Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we claim FP8 model support for weight compression, we should add corresponding tests. For example here: tests/openvino/native/quantization/test_weights_compression.py::TestActivationWeightDtype.test_compression_for_different_dtypes.

As I understand only fp8 support is required for now. I would suggest not to claim nf4 support just yet because some additional effort is required to enable it. This is because Tensor.reshape() needs to be implemented for nf4 nncf Tensors in ov backend.

FP8 compression should work after alexsu52#20

Edit:
Discussed in alexsu52#20

@alexsu52 alexsu52 changed the title FP8 types support FP8 types support in NNCF graph building Mar 18, 2025
@alexsu52
Copy link
Contributor Author

alexsu52 commented Mar 18, 2025

Which usage scenario does this PR enable? Compression/quantization of already fp8-quantized models?

Skip fp8 weights during compression. Before this fix, NNCF throw a runtime error:
{EA0769E3-3A4A-4146-8C6E-276B044D15AE}

@github-actions github-actions bot added the NNCF PTQ Pull requests that updates NNCF PTQ label Mar 18, 2025
@alexsu52 alexsu52 merged commit 833f13c into openvinotoolkit:develop Mar 18, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Freeze NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants