-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8 types support in NNCF graph building #3344
FP8 types support in NNCF graph building #3344
Conversation
Which usage scenario does this PR enable? Compression/quantization of already fp8-quantized models? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR enabled us to convert DeepSeek R1 model (with original f8 weights) via optimum intel. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we claim FP8 model support for weight compression, we should add corresponding tests. For example here: tests/openvino/native/quantization/test_weights_compression.py::TestActivationWeightDtype.test_compression_for_different_dtypes.
As I understand only fp8 support is required for now. I would suggest not to claim nf4 support just yet because some additional effort is required to enable it. This is because Tensor.reshape()
needs to be implemented for nf4 nncf Tensors in ov backend.
FP8 compression should work after alexsu52#20
Edit:
Discussed in alexsu52#20
Changes
TensorDataType
toTensorDataType.f8e5m2
,TensorDataType.f8e4m3
,TensorDataType.nf4
Reason for changes
Support for compression of models with fp8 and nf4 weights.
Related tickets
ref: 164161
Tests
test_compare_nncf_graph_precision_synthetic_models