[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

ljaljushkin · 2025-01-30T13:55:46Z

Context

Matmul operation in OpenVINO assumes an implicit shape alignment for input arguments. It applies transpositions specified by optional transpose_a and transpose_b attributes: OV spec.
Currently, weight compression in NNCF does not support transpose_a=True.
Here's the check and test.
Potentially, it affects Mixed-Precision, AWQ, Scale Estimation and Lora Correction algorithms.

What needs to be done?

The task is to enable data-aware weight compression methods (Mixed-Precision, AWQ, Scale Estimation, Lora Correction) for models with transposed input matrix multiplications.

At least one function process_stats should be corrected, check - removed.
test should pass and be a templated one in order to check OV and Torch backend at once.
Tests that used LMLinearModel with transpose_a=False by default should pass with transpose_a=True.

Ticket

No response

The text was updated successfully, but these errors were encountered:

rk119 · 2025-01-30T14:56:28Z

.take

github-actions · 2025-01-30T14:56:43Z

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

rk119 · 2025-02-03T11:28:19Z

Hi @ljaljushkin,

I have started working on this issue. While investigating on why transpose_a=True gives an error and understanding how to solve it, I noticed something for now and would like to clarify it.

The model in test_compression_with_transposed_activations generates the following NNCF Graphs in compress_weights_impl:

Case 1 -> transpose_a = False, transpose_b = True (presently supported)

input shape: [1,24,16] # batch, seq_length, hidden dim

hidden: 16

output: 32

weight = [32, 16] # [out, hdim] for inner dimensions to match after transpose and then multiply

pass to matmul(input, weight, transpose_a=False, transpose_b=True):

input is the same as [1,24,16]

weight is transposed to [16, 32]

matrix-matrix multiplication output = [1, 24, 32]

Case 2 -> transpose_a = True, transpose_b = False

input shape: [1,24,16]

hidden: 24

output: 32

weight = [24, 32] # [hdim, out] for inner dimensions to match after transpose of input and then multiply

pass to matmul(input, weight, transpose_a=True, transpose_b=False):

input is transposed to [1, 16, 24] # according to openvino docs

weight is the same as [24, 32]

matrix-matrix multiplication output = [1, 16, 32]

However, I wanted to clarify if the NNCF Graph for Case 2 is being built properly because for the supported Case 1 in create_nncf_graph, while obtaining the layer_attributes the function get_weighted_layer_attributes gets the weight_layout according to transpose (which is transpose_b=True in the model) and then obtains the in and out feature dims and passes it into LinearLayerAttributes which returns an object with layer attributes.

For Case 1, its correct:

and when I call some methods of the layer_attributes object, they provide the correct details

In Case 2, the following information is correct:

but the information retrieved from layer_attributes is not accurate

I noticed that get_weight_shape (which returns [out, in] even though for case 2 the layout is [in, out]) and get_target_dim_for_compression in LinearLayerAttributes class only support the case when weights are transposed and not case 2.

I want to clarify if this is intended or would be ideal to utilize a transpose value in the class and return the weight shape and target dim accordingly.

ljaljushkin · 2025-02-03T14:26:42Z

Hi @rk119!

Thanks for the detailed explanation! This is a good question. The get_weight_shape and get_target_dim_for_compression methods of layer_attributes don't look valid in these cases. We're going to create a GFI for deleting them, since they are used in obsolete methods only and are not supposed to be used in new algorithms. Instead, all backends implement their own get_weight_shape: openvino, torch, torch_fx.

daniil-lyakhov · 2025-02-04T12:25:43Z

Hi @rk119!

Thanks for you analysis! I've opened new GFI to cover this inconsistency #3249

rk119 · 2025-02-05T16:51:30Z

Hi @rk119!

Thanks for the detailed explanation! This is a good question. The get_weight_shape and get_target_dim_for_compression methods of layer_attributes don't look valid in these cases. We're going to create a GFI for deleting them, since they are used in obsolete methods only and are not supposed to be used in new algorithms. Instead, all backends implement their own get_weight_shape: openvino, torch, torch_fx.

@ljaljushkin Ohh, thank you so much for clarifying this. Makes sense.

Hi @rk119!

Thanks for you analysis! I've opened new GFI to cover this inconsistency #3249

@daniil-lyakhov Alright :D

rk119 · 2025-02-19T13:09:25Z

Hi @ljaljushkin,

I apologize for delaying this issue, I had many important commitments to attend to and am finally available. I fixed the errors when lora_correction was enabled to successfully support transpose_a=True, and this can be viewed in my draft PR #3296, but I would appreciate your guidance in how to address the following:

When I run the test with just scale_estimation enabled with argument transpose_a=True in the model, I receive the following error

This is because of the reduction axis and weights being transposed here cause the result_scale to have the shape (32,3,1). However, in the compress_weight method, in do_int_quantization, the reshape_weight_for_grouped_quantization returns weight with shape (3, 8, 32). This incompatibility causes the above error in _build_compress_model. The following is the stack trace

ljaljushkin · 2025-02-21T17:19:54Z

@rk119 thank you for the PR and findings!
Seems like the issue you are describing is not related to your changes.
There's a separate issue when transpose_b=False even on develop:

model = LMLinearModel(transpose_a=False, transpose_b=False).ov_model

So, propose to consider transpose_b=True only for now and fix the issue with transpose_b=False separately.

rk119 · 2025-03-02T13:12:24Z

Hi @ljaljushkin,

I have a doubt regarding the test being a templated one since I am not sure if it’s necessary to do so and I could be wrong. The torch.matmul function does not include an argument for transposing the input or weight, so one must explicitly transpose them before passing them into matmul. Additionally, the transpose operation is represented as a separate node in the NNCF graph. As a result, the hidden_dim index remains the same in either case. However, in OpenVINO, the transpose is handled within the matmul function itself.

Example:

class Model(torch.nn.Module):
    def __init__(self, weight: torch.Tensor = torch.ones(size=(24, 32), dtype=torch.float32)):
        super().__init__()
        self.w = torch.nn.Parameter(weight)

    def forward(self, input):
        return torch.matmul(torch.transpose(input, -2, -1), self.w)

NNCF_Graph:

ljaljushkin · 2025-03-04T09:51:23Z

Hi @rk119! Yes, the models for OpenVINO and Torch backends are different.

But you can encapsulate this difference in the get_one_linear_model(transpose_a=True, transpose_b=True) similar to this:
https://github.com/openvinotoolkit/nncf/blob/develop/tests/cross_fw/test_templates/template_test_weights_compression.py#L235
Once the model is ready, the API call and body of the test should be common. That's why the template test is reasonable for me.

One thing is slightly different, that Torch backend requires nncf.Dataset even for data-free case in order to build the graph. OpenVINO backend doesn't, but no problem to pass it for both cases. Here's the example:
https://github.com/openvinotoolkit/nncf/blob/develop/tests/cross_fw/test_templates/template_test_weights_compression.py#L253

rk119 · 2025-03-04T15:18:15Z

@ljaljushkin Oh yah, alright then :)

ljaljushkin · 2025-03-04T18:24:11Z

@ljaljushkin Oh yah, alright then :)

But on the other hand, this is pure OpenVINO backend specific of matmul operation. Agree with you, that test for torch is unnecessary.

rk119 · 2025-03-05T01:30:41Z

@ljaljushkin Oh yah, alright then :)

But on the other hand, this is pure OpenVINO backend specific of matmul operation. Agree with you, that test for torch is unnecessary.

ohh haha, okay :D

ljaljushkin added the good first issue Good for newcomers label Jan 30, 2025

ljaljushkin mentioned this issue Jan 30, 2025

[Torch][WeightCompression] Add Scale Estimation data-aware support #3179

Merged

github-actions bot assigned rk119 Jan 30, 2025

ljaljushkin added this to Good first issues Jan 30, 2025

github-project-automation bot moved this to Contributors Needed in Good first issues Jan 30, 2025

ljaljushkin moved this from Contributors Needed to Assigned in Good first issues Jan 30, 2025

shumaari mentioned this issue Feb 6, 2025

[Good First Issue][NNCF]: Remove backend-specific methods from the common layer attributes #3249

Closed

rk119 mentioned this issue Feb 19, 2025

Support transposed input for data-aware Weights Compression #3296

Open

alexsu52 moved this from Assigned to In Review in Good first issues Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

ljaljushkin commented Jan 30, 2025

rk119 commented Jan 30, 2025

github-actions bot commented Jan 30, 2025

rk119 commented Feb 3, 2025 •

edited

Loading

ljaljushkin commented Feb 3, 2025

daniil-lyakhov commented Feb 4, 2025 •

edited

Loading

rk119 commented Feb 5, 2025

rk119 commented Feb 19, 2025 •

edited

Loading

ljaljushkin commented Feb 21, 2025

rk119 commented Mar 2, 2025 •

edited

Loading

ljaljushkin commented Mar 4, 2025

rk119 commented Mar 4, 2025

ljaljushkin commented Mar 4, 2025

rk119 commented Mar 5, 2025

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

Comments

ljaljushkin commented Jan 30, 2025

Context

What needs to be done?

Example Pull Requests

Resources

Contact points

Ticket

rk119 commented Jan 30, 2025

github-actions bot commented Jan 30, 2025

rk119 commented Feb 3, 2025 • edited Loading

ljaljushkin commented Feb 3, 2025

daniil-lyakhov commented Feb 4, 2025 • edited Loading

rk119 commented Feb 5, 2025

rk119 commented Feb 19, 2025 • edited Loading

ljaljushkin commented Feb 21, 2025

rk119 commented Mar 2, 2025 • edited Loading

ljaljushkin commented Mar 4, 2025

rk119 commented Mar 4, 2025

ljaljushkin commented Mar 4, 2025

rk119 commented Mar 5, 2025

rk119 commented Feb 3, 2025 •

edited

Loading

daniil-lyakhov commented Feb 4, 2025 •

edited

Loading

rk119 commented Feb 19, 2025 •

edited

Loading

rk119 commented Mar 2, 2025 •

edited

Loading