[TorchFX] Multy-quantizers conformance check #38

daniil-lyakhov · 2025-01-21T18:16:01Z

Changes

Reason for changes

Related tickets

Tests

### Changes Enable mypy for nncf/common/pruning

Disable pip cache to avoid issues with cache

### Changes - Remove levit model_128 - Fix: no catching exception without message ### Reason for changes ``` File "/home/jenkins/agent/workspace/NNCF/manual/Daemons/PTQ/ptq_single_model/venv/lib/python3.10/site-packages/timm/models/_builder.py", line 393, in build_model_with_cfg load_pretrained( File "/home/jenkins/agent/workspace/NNCF/manual/Daemons/PTQ/ptq_single_model/venv/lib/python3.10/site-packages/timm/models/_builder.py", line 193, in load_pretrained state_dict = filter_fn(state_dict, model) File "/home/jenkins/agent/workspace/NNCF/manual/Daemons/PTQ/ptq_single_model/venv/lib/python3.10/site-packages/timm/models/levit.py", line 707, in checkpoint_filter_fn assert 'head' in ka or 'stem.conv1.linear' in ka AssertionError ```

### Changes black - 24.10.0 isort - 5.13.0 ruff - v0.9.2 markdownlint - 0.43.0

### Changes Include SDPA metatype in scales unification map for all MinMax backends ### Reason for changes Scales were not being unified in quantizers inserted for such a subgraph: ``` x y \ | concat Q V | / / SDPA ``` ### Tests Template test was created at `tests/cross_fw/test_templates/test_unified_scales.py` The template tests uses a synthetic SDPA model with a concat operation. Then, it uses the method `_find_quantization_target_points` of MinMaxQuantization algorithm to return the unified scale groups. These groups are then used for assertion.

### Changes Run test from precommit test scope in nightly trigger - pt2: use fixed parameters for .T, _saved_dim0/1 depends from os, but actual value always -1, -2 - onnx: build model only inside test function, not in collect parameters, to more effective use memory - common: use `1.2` s for test timer, to be sure that will be catch 00:00:01 NOTE: pt test falls in case of parallel run on build quantization extension. ### Tests https://github.com/openvinotoolkit/nncf/actions/runs/12835288745/job/35794407928?pr=3194

### Changes Add quantization_aware_training_tensorflow_mobilenet_v2 to test scope

…lkit#3207) ### Changes Changes the model from having the same input tensor to the QKV to having different input for each ### Reason for changes The former model was causing an error with openvino

…olkit#2727) ### Changes - Implemented OpenVINO model graphs which are used for calculation of compressed and decompressed weights. Since these models are compiled, calculation become significantly faster especially for larger models and int4 compression. - This functionality is exposed by two methods at `weight_lowering.py`: - `do_int_quantization()` is used for computing a compressed weight. Possible signatures: - `weight` -> `compressed_weight`, `scale`, (`zero_point` for asymmetric compression) - `weight`, `scale`, (`zero_point`) -> `compressed_weight`, `scale`, (`zero_point`) - `calculate_quantized_dequantized_weight()` is used for computing a decompressed weight. Possible signatures: - `weight` -> `decompressed_weight` - `weight`, `scale`, (`zero_point`) -> `decompressed_weight` - `weight` -> `decompressed_weight`, `compressed_weight`, `scale`, (`zero_point`) - `weight`, `scale`, (`zero_point`) -> `decompressed_weight`, `compressed_weight`, `scale`, (`zero_point`) - Output `scale` and `zero_point` are the same as the ones given as input (if they were given at all). - Computation is done via OV models only if openvino package is installed and input tensors are not torch tensors. - Introduce a new NNCF Tensor backend for storing instances of `openvino.Tensor`. Implementation for this backend is limited by only the required functionality, e.g. addition of OV Tensors is not supported because it is not needed. - Introduction of OV Tensors is required for seamless handling of tensors in `bf16`, `u4` and `i4` data types. For example, `bf16` constants are read from an OpenVINO LLM and given as inputs to a compressing OpenVINO model. `u4` and `i4` compressed weights are seamlessly inserted into the resulting compressed OpenVINO model. - Added `as_numpy_tensor()` method to convert an NNCF Tensor to numpy backend. Currently only OV -> NP conversion is required. - All calculations are aligned with reference numpy implementation. Some performance and memory sacrifices had to be made for such alignment. Data-free asymmetric compression: ![image](https://github.com/user-attachments/assets/efd76b2f-1a3e-4037-8165-0bd5812de94d) Data-free symmetric compression: ![image](https://github.com/user-attachments/assets/c61b98c6-cc96-4125-b21e-90c7d0827e22) Data-aware compression: ![image](https://github.com/user-attachments/assets/b9823594-9915-4ca5-9e50-7bffa6777104) ### Reason for changes Reducing model compression time. Only OpenVINO model compression backend is affected. ### Related tickets 139047 ### Tests - `tests/openvino/native/quantization/test_ov_modeling_compression.py::test_quantization_alignment` -- check aligment with reference numpy implementation - `tests/openvino/native/test_openvino_modeling.py` -- checks OV modeling framework hyperparameters - `tests/openvino/native/test_tensor.py` -- NNCF OV Tensor backend tests Validation jobs: - `NNCF/job/manual/job/post_training_weight_compression/299/` - `NNCF/job/nightly/job/test_examples/650` - OVVP validation ✅ - optimum-intel test job https://github.com/huggingface/optimum-intel/actions/runs/12912964434/job/36009036879?pr=734

### Changes - Added `nf4` precision for OV `GraphConverter`. ### Reason for changes - `nf4` precision support. ### Related tickets - 153357 ### Tests - Added

### Changes Follow up to openvinotoolkit#2727 1. Do not use `infer_request.results` 2. Replace `>=` with `opset.greater_equal()` 3. Rename `ov_numeric.py` to `openvino_numeric.py` ### Reason for changes 1. Improve int4 compression time by up to ~10% 2. Avoid warning: `DeprecationWarning: greater_equal is deprecated and will be removed in version 2025.3. Use ops.greater_equal instead` 3. Fix onnx install test ### Related tickets 139047 ### Tests - https://github.com/openvinotoolkit/nncf/actions/runs/12947249537 - NNCF/job/manual/job/post_training_weight_compression/301/ - NNCF/job/nightly/job/test_examples/653/

### Changes - As stated in the title ### Reason for changes - Upcoming release ### Related tickets - 161230 ### Tests - N/A

### Changes - Skip `test_non_convertable_division` on MacOS - Cut values to 1e-10, more digits after zero dont change a result - Use onle line `pip install` in gha ### Reason for changes https://github.com/openvinotoolkit/nncf/actions/runs/12969549430/job/36208496733 ``` FAILED tests/openvino/native/test_node_utils.py::test_non_convertable_division[0.058599039912223816-15-True-0.003906603] - AssertionError: Not equal to tolerance rtol=0, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference among violations: 3.5297126e-07 Max relative difference among violations: 9.035248e-05 ACTUAL: array([0.003906], dtype=float32) DESIRED: array([0.003907], dtype=float32) FAILED tests/openvino/native/test_node_utils.py::test_non_convertable_division[0.058599039912223816-15-False-0.003906602505594492] - AssertionError: Not equal to tolerance rtol=0, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference among violations: 3.525056e-07 Max relative difference among violations: 9.023329e-05 ACTUAL: array([0.003906], dtype=float32) DESIRED: array([0.003907], dtype=float32) ``` ### Tests https://github.com/openvinotoolkit/nncf/actions/runs/12991342528/job/36228616650 --------- Co-authored-by: Nikita Savelyev <nikita.savelyev@intel.com>

### Changes * torch.ao `OpenVINOQuantizer` as well as `OpenVINOQuantizerAdapter` are introduced * `quantize_pt2e` function is updated to work with `OpenVINOQuantizer` ### Reason for changes * To enable OpenVINO quantization for torch.ao quantization pipelines (`torch.ao.quantization.prepare_pt2e`, `torch.ao.quantization.convert_pt2e`) and quantize_pt2e API function ### Related tickets openvinotoolkit#2766 ### Tests tests/torch/fx/test_quantizer.py is updated with use cases: - `OpenVINOQuantizer` + `quantize_pt2e` - `OpenVINOQuantizer` +`torch.ao.quantization.prepare_pt2e` -> `torch.ao.quantization.convert_pt2e`

### Changes - Introduced `PT2OpLayerAttribute`, to collect called function, attributes and constant ports - `FunctionMeta` stored function instead of function name ### Reason for changes Needs to implement subgraph extractor for FBC ### Related tickets 152996 ### Tests tests/torch2/function_hook/nncf_graph/test_layer_attributes.py

No grad during the TorchFX model validation quantization params are being forwarded to quantize_pt2e/OpenVINOQuantizer [Conformance] Ultralytics yolov8n and yolo11n XNNPACK conformance attempt ARM and QUALCOMM eager backends Shared quantization spec support [HWConfig] narrow_range parameter is introduced in hardware config Embedding qconfig list is extended for CPU devices / tests fixes References update

[mypy] nncf/common/pruning (openvinotoolkit#3198)

190006d

### Changes Enable mypy for nncf/common/pruning

github-actions bot added NNCF PT NNCF Common experimental NNCF PTQ labels Jan 21, 2025

AlexanderDokuchaev and others added 12 commits January 22, 2025 13:10

[GHA] Disable cache (openvinotoolkit#3202)

62ce790

Disable pip cache to avoid issues with cache

Bump version of linters in pre-commit (openvinotoolkit#3204)

9c5b459

### Changes black - 24.10.0 isort - 5.13.0 ruff - v0.9.2 markdownlint - 0.43.0

Update test_examples (openvinotoolkit#3195)

eab6b46

### Changes Add quantization_aware_training_tensorflow_mobilenet_v2 to test scope

[Common]Fix Test for SDPA and Concat Unified Scales Test (openvinotoo…

b6f2e75

…lkit#3207) ### Changes Changes the model from having the same input tensor to the QKV to having different input for each ### Reason for changes The former model was causing an error with openvino

OV nf4 support (openvinotoolkit#3209)

5414dd6

### Changes - Added `nf4` precision for OV `GraphConverter`. ### Reason for changes - `nf4` precision support. ### Related tickets - 153357 ### Tests - Added

Bump NNCF version (openvinotoolkit#3213)

6f0e3e3

### Changes - As stated in the title ### Reason for changes - Upcoming release ### Related tickets - 161230 ### Tests - N/A

daniil-lyakhov force-pushed the dl/fx/xnnpack_conformance branch from 5c12b78 to ba6b7ba Compare January 27, 2025 17:06

github-actions bot added NNCF ONNX NNCF OpenVINO labels Jan 27, 2025

daniil-lyakhov force-pushed the dl/fx/xnnpack_conformance branch from ba6b7ba to 389c796 Compare January 27, 2025 18:03

github-actions bot added documentation Improvements or additions to documentation NNCF TF labels Jan 27, 2025

daniil-lyakhov and others added 2 commits January 28, 2025 11:52

daniil-lyakhov force-pushed the dl/fx/xnnpack_conformance branch from 389c796 to 39c9f56 Compare January 28, 2025 19:40

daniil-lyakhov closed this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TorchFX] Multy-quantizers conformance check #38

[TorchFX] Multy-quantizers conformance check #38

daniil-lyakhov commented Jan 21, 2025

[TorchFX] Multy-quantizers conformance check #38

[TorchFX] Multy-quantizers conformance check #38

Conversation

daniil-lyakhov commented Jan 21, 2025

Changes

Reason for changes

Related tickets

Tests