Support transposed input for data-aware Weights Compression #3296

rk119 · 2025-02-19T12:56:19Z

Changes

Added support for transpose of activations in matmul operation in Weights Compression.
While collecting statistics, reduction axes is set according to whether the input is transposed or not. If the input is transposed, the second last size dimension is the hidden dimension else the last dimension is hidden.
Pass the input transpose value in matmul operation in insert_adapters for OV backend to match the inner dimensions.
Implemented a common backend function get_activation_channel_axis for input transpose_a=True support.

Closes Issue

ljaljushkin · 2025-03-04T10:09:32Z

tests/openvino/native/quantization/test_weights_compression.py

    input_data = [np.ones(inp.shape) for inp in model.inputs] * dataset_size
    dataset = Dataset(input_data)

-    with pytest.raises(nncf.UnsupportedModelError):


Since there's some issue with transpose_a=False, transpose_b=False, could you please make the test more general to cover all 4 combinations, as follows?

from contextlib import nullcontext @pytest.mark.parametrize( ("transpose_a", "transpose_b", "raises_error"), ( (False, True, False), (True, True, False), (False, False, True), (True, False, True), ), ids=["tb_nota", "ta_tb", "nota_notb", "ta_notb"] ) @pytest.mark.parametrize( "kwargs", ( dict(scale_estimation=True), dict(lora_correction=True), dict( gptq=True, awq=True, scale_estimation=True, advanced_parameters=CompressionParams(gptq_params=GPTQParams(subset_size=2)), ), ), ids=['se', 'lora', 'gptq_se_awq'] ) def test_compression_with_transpose(transpose_a, transpose_b, raises_error, kwargs): dataset_size = 4 model = LMLinearModel(transpose_a=transpose_a, transpose_b=transpose_b).ov_model input_data = [np.ones(inp.shape) for inp in model.inputs] * dataset_size dataset = Dataset(input_data) with pytest.raises(nncf.UnsupportedModelError) if raises_error else nullcontext(): compress_weights( model, mode=CompressWeightsMode.INT4_SYM, ratio=1.0, group_size=8, subset_size=2, dataset=dataset, all_layers=True, **kwargs, )

For kwargs such as dict(lora_correction=True) and dict(awq=True), it seems that the following cases:

@pytest.mark.parametrize( ("transpose_a", "transpose_b", "raises_error"), ( (False, False, True), (True, False, True), ), )

do not raise an error and seem to pass for those two algorithms. Would you suggest I still implement a check to raise an unsupported error for these algorithms when transpose_b=False or?

The issue with transpose_b=False wasn't mentioned in GFI, so I can't insist on implementing it within this PR.
However, it would be wonderful if you could resolve both issues! :) In that case, raise_error won't be needed, and test wouldn't expect an exception for all combinations.

No need in template test: #3230 (comment)
and dict(awq=True) is not needed as well.

The issue with transpose_b=False wasn't mentioned in GFI, so I can't insist on implementing it within this PR. However, it would be wonderful if you could resolve both issues! :) In that case, raise_error won't be needed, and test wouldn't expect an exception for all combinations.

So, should I try implementing transpose_b=False in this PR or that would be out of scope? I wouldn't mind either :)

Both options are possible. The choice is yours.

But regardless of your choice, please extend the test to verify all combinations as originally suggested.
If you decide not to implement transpose_b=False, it's fine, the test will raise an error to ensure the known issue isn't forgotten.
If you fix the issue for transpose_b=False, then all combinations should pass without expecting an error.

ljaljushkin · 2025-03-04T10:45:58Z

nncf/quantization/algorithms/weight_compression/algorithm.py

+                reduction_axes = tuple(
+                    i
+                    for i in range(n_dims)
+                    if i != n_dims + self._backend_entity.get_input_hidden_dim(output_edge.to_node)


What if get_input_hidden_dim is positive?

What if get_input_hidden_dim is positive?

I created this backend function mainly because in comparison to OV, Torch tensors are reduced across all the dimensions except for the last one (transposed or not) hence -1 is returned by default. Only for OV the method is overriden, when transpose_a=True then -2 is returned leaving the second last dimension from being reduced since later in the opset.matmul operation it will be swapped with the last dim for transpose_a=True. In other case which is transpose_a=False it will return -1 as well.

I think you can use node.metatype.output_channel_axis directly without defining a backend method.

Unfortunately, node.metatype.output_channel_axis returns None over here hence I'm not sure how to utilize it:

I assume it should work for output_edge.to_node.

Suggest to keep get_input_hidden_dim, but probably rename to get_activation_channel_axis and implement for OpenVINO backend a bit more general to cover cases with Convolutions.

You can call this get_activation_channel_axis from here for OpenVINO: https://github.com/openvinotoolkit/nncf/blob/develop/nncf/openvino/graph/node_utils.py#L631

node = output_edge.to_node input_port_id = node.input_port_id input_shape = node.tensor_shape input_channel_axis = self._backend_entity.get_activation_channel_axis(node, input_port_id, input_shape)

For torch, this way should work:

output_edge.to_node.metatype.output_channel_axis

Thank you! I was trying to look for an existing implementation to obtain the dim but could not find it. My bad. Yep, this approach seems better.

ljaljushkin · 2025-03-04T10:51:22Z

nncf/quantization/algorithms/weight_compression/openvino_backend.py

@@ -364,6 +366,12 @@ def filter_func(point: StatisticPoint) -> bool:

        return filter_func

+    @staticmethod
+    def get_input_hidden_dim(node: NNCFNode) -> int:
+        if (node is not None) and (node.layer_attributes is not None):


I wonder what is the valid case when this condition False? If it's invalid case, I'd not silently return -1.

Initially, I did not have that conditional check and only returned this return -2 if node.layer_attributes.input_attributes["transpose"] else -1 which caused this test to fail as follows:

since while collecting statistics the node passed in get_input_hidden_dim was None hence could not perform the check node.layer_attributes.input_attributes["transpose"] and so it raised an error.

I agree returning -1 silently wouldn't seem ideal but in the current implementation it always assumes that the last dim is hidden by default and excludes it using n_dims - 1.

That's interesting, why this test fails. Can it be because node corresponds to embedding instead of matmul?
Probably, another approach will avoid this situation: #3296 (comment)

Probably, another approach will avoid this situation: #3296 (comment)

I implemented the approach specified in #3296 (comment) so far and yet the test is failing again in the checks.

That's interesting, why this test fails. Can it be because node corresponds to embedding instead of matmul?

I am not sure since I am unable to debug this test properly on my local machine due to the Windows fatal exception: access violation error.

Hi @ljaljushkin,

I was able to run the test on my local machine by downgrading onnx version as it was causing some errors. I noticed an inconsistency in the test failures.

Test run 1:

Passes for _test_basic_configurations but fails in _test_advanced_gptq_scale_estimation for configuration Testing: AWQ=True, GPTQ=False, Scale=True, LoRA=False

Consecutive Test run 2:

_test_basic_configurations fails now for configuration Testing: AWQ=True, Group=1, Ratio=0.4, Metric=hessian_input_activation, Scope=IgnoredScope(names=[], patterns=[], types=[], subgraphs=[], validate=True)

Both test runs fail with the same error:

Furthermore, I debugged the test failures and confirmed if the configurations are working as intended. The code seems to not have any flaws in the logic and those earlier failures execute properly in the debugger. It seems to be an issue with how statistics are cached internally (I am not exactly sure how to word it properly). I could be totally wrong about this, but the inconsistency seems to validate it for me so far. However, I am also not sure why this inconsistency is triggered now and not earlier before my changes were implemented. Would be grateful for some guidance and clarity on this :)

@ljaljushkin Alright then. Before I implement and push those changes, I thought I could explain my findings before adding sort.

The NNCF graphs are different for the tests that pass vs the one that fails.

The following screenshot is of one MVN node with edges pointing to child nodes in the following order for the tests that pass:

The NNCF graph of the model which leads to a failure since this line of code accesses the first edge node which causes the error 'NoneType' object has no attribute 'get_const_port_ids' later:

The screenshots above are the following node with the last edge pointing to shape:

I also noticed in the NNCF graphs raw code that the nodes added are the same and differences arise when edges are added confirming that the order of nodes in which the edges were added was probably causing an issue.

I debugged further to see why the nodes are ordered the same, but edges lists are different.

I noticed that the child nodes returned for the same MVN node in 2 successful test runs are returned in a non-deterministic order in out.get_target_inputs() but are sorted here before adding to the graph:

Noticed the same child nodes returned here while adding the edges are not sorted hence why I made the present change.

Good to have order of edges deterministic!

But I doubt it will help resolve the original problem with the channel axis. IMO, it only solves the problem for this particular instance with MVN, but it doesn't necessarily guarantee that the first edge will always lead to a matmul with not empty layer_attributes.input_attributes, does it?

but it doesn't necessarily guarantee that the first edge will always lead to a matmul with not empty layer_attributes.input_attributes, does it?

Ah yes! You are right about it not always leading to matmul :) I'll make the relevant changes and push them in sometime.

Good to have order of edges deterministic!

So, would you recommend I keep these changes or revert them?

Yes, you could add this to avoid confusion between different graphs for the same model. Hope it doesn't significantly affect graph building for large models.

Alright. I just made a minor change to keep it consistent with the sort condition for adding the nodes.

ljaljushkin · 2025-03-04T14:57:53Z

nncf/quantization/algorithms/weight_compression/scale_estimation.py

@@ -365,7 +365,7 @@ def calculate_quantization_params(
        return result_scale, zp

    @staticmethod
-    def activations_to_wc_statistics(activations: List[Tensor]) -> WCTensorStatistic:
+    def activations_to_wc_statistics(activations: List[Tensor], transpose: bool) -> WCTensorStatistic:


I'd pass output_channel_axis explicitly and re-implement definition of reduction_shape using this info.

rk119 and others added 2 commits February 17, 2025 20:35

Lora correction input transpose support

ada9c0a

Merge branch 'openvinotoolkit:develop' into support_transposed_input

90381d7

github-actions bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Feb 19, 2025

rk119 mentioned this pull request Feb 19, 2025

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

Open

rk119 added 8 commits February 24, 2025 22:39

OV backend act transpose support

a18ba70

Merge branch 'develop' into support_transposed_input

64f769e

pre-commit fix

d69ff3f

Merge branch 'develop' into support_transposed_input

c78b033

Brute force solution

d493314

Minor fix

202815b

pre-commit fix

1dc14ba

attempt fix

50eddb4

alexsu52 requested review from ljaljushkin and andreyanufr February 28, 2025 09:33

alexsu52 assigned ljaljushkin Feb 28, 2025

rk119 added 2 commits March 2, 2025 17:14

Add doc string

9dccf21

Merge branch 'develop' into support_transposed_input

f5d2a1f

rk119 marked this pull request as ready for review March 2, 2025 14:01

rk119 requested a review from a team as a code owner March 2, 2025 14:01

ljaljushkin requested changes Mar 4, 2025

View reviewed changes

Merge branch 'develop' into support_transposed_input

3fde78a

ljaljushkin reviewed Mar 4, 2025

View reviewed changes

Implement get_activation_channel_axis

aa062dc

github-actions bot added the NNCF PT Pull requests that updates NNCF PyTorch label Mar 5, 2025

rk119 added 4 commits March 5, 2025 11:55

fix test

051a183

Fix error

7f7f468

Merge branch 'develop' into support_transposed_input

b82b0c4

Fix OV NNCF Graph Builder add edges

e19804c

rk119 added 6 commits March 18, 2025 19:13

Update

8016399

Merge branch 'develop' into support_transposed_input

5489669

Update

298b587

Update

9a7cb02

Merge branch 'develop' into support_transposed_input

93c9cef

Update

341c4a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support transposed input for data-aware Weights Compression #3296

Support transposed input for data-aware Weights Compression #3296

rk119 commented Feb 19, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025 •

edited

Loading

rk119 Mar 4, 2025

ljaljushkin Mar 4, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025 •

edited

Loading

rk119 Mar 5, 2025

ljaljushkin Mar 18, 2025

ljaljushkin Mar 4, 2025

rk119 Mar 4, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025

rk119 Mar 5, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025

rk119 Mar 4, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025

rk119 Mar 5, 2025 •

edited

Loading

rk119 Mar 10, 2025 •

edited

Loading

rk119 Mar 17, 2025 •

edited

Loading

ljaljushkin Mar 18, 2025

rk119 Mar 18, 2025 •

edited

Loading

ljaljushkin Mar 18, 2025

rk119 Mar 18, 2025

ljaljushkin Mar 4, 2025

Support transposed input for data-aware Weights Compression #3296

Are you sure you want to change the base?

Support transposed input for data-aware Weights Compression #3296

Conversation

rk119 commented Feb 19, 2025 • edited Loading

Changes

Closes Issue

ljaljushkin Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ljaljushkin Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

ljaljushkin Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rk119 Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rk119 Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rk119 Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rk119 Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

rk119 Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

rk119 Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rk119 Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rk119 commented Feb 19, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025 •

edited

Loading

ljaljushkin Mar 4, 2025 •

edited

Loading

rk119 Mar 4, 2025 •

edited

Loading

rk119 Mar 5, 2025 •

edited

Loading

rk119 Mar 4, 2025 •

edited

Loading

rk119 Mar 5, 2025 •

edited

Loading

rk119 Mar 10, 2025 •

edited

Loading

rk119 Mar 17, 2025 •

edited

Loading

rk119 Mar 18, 2025 •

edited

Loading