Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230

Open
ljaljushkin opened this issue Jan 30, 2025 · 13 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@ljaljushkin
Copy link
Contributor

Context

Matmul operation in OpenVINO assumes an implicit shape alignment for input arguments. It applies transpositions specified by optional transpose_a and transpose_b attributes: OV spec.
Currently, weight compression in NNCF does not support transpose_a=True.
Here's the check and test.
Potentially, it affects Mixed-Precision, AWQ, Scale Estimation and Lora Correction algorithms.

What needs to be done?

The task is to enable data-aware weight compression methods (Mixed-Precision, AWQ, Scale Estimation, Lora Correction) for models with transposed input matrix multiplications.

  1. At least one function process_stats should be corrected, check - removed.
  2. test should pass and be a templated one in order to check OV and Torch backend at once.
  3. Tests that used LMLinearModel with transpose_a=False by default should pass with transpose_a=True.

Example Pull Requests

#3179
#3129

Resources

Contact points

@ljaljushkin

Ticket

No response

@rk119
Copy link
Contributor

rk119 commented Jan 30, 2025

.take

Copy link

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

@github-project-automation github-project-automation bot moved this to Contributors Needed in Good first issues Jan 30, 2025
@ljaljushkin ljaljushkin moved this from Contributors Needed to Assigned in Good first issues Jan 30, 2025
@rk119
Copy link
Contributor

rk119 commented Feb 3, 2025

Hi @ljaljushkin,

I have started working on this issue. While investigating on why transpose_a=True gives an error and understanding how to solve it, I noticed something for now and would like to clarify it.

The model in test_compression_with_transposed_activations generates the following NNCF Graphs in compress_weights_impl:

Case 1 -> transpose_a = False, transpose_b = True (presently supported)

Image

input shape: [1,24,16] # batch, seq_length, hidden dim

hidden: 16

output: 32

weight = [32, 16] # [out, hdim] for inner dimensions to match after transpose and then multiply

pass to matmul(input, weight, transpose_a=False, transpose_b=True):

input is the same as [1,24,16]

weight is transposed to [16, 32]

matrix-matrix multiplication output = [1, 24, 32]

Case 2 -> transpose_a = True, transpose_b = False

Image

input shape: [1,24,16]

hidden: 24

output: 32

weight = [24, 32] # [hdim, out] for inner dimensions to match after transpose of input and then multiply

pass to matmul(input, weight, transpose_a=True, transpose_b=False):

input is transposed to [1, 16, 24] # according to openvino docs

weight is the same as [24, 32]

matrix-matrix multiplication output = [1, 16, 32]

However, I wanted to clarify if the NNCF Graph for Case 2 is being built properly because for the supported Case 1 in create_nncf_graph, while obtaining the layer_attributes the function get_weighted_layer_attributes gets the weight_layout according to transpose (which is transpose_b=True in the model) and then obtains the in and out feature dims and passes it into LinearLayerAttributes which returns an object with layer attributes.

For Case 1, its correct:

Image

and when I call some methods of the layer_attributes object, they provide the correct details

Image

In Case 2, the following information is correct:

Image

but the information retrieved from layer_attributes is not accurate

Image

I noticed that get_weight_shape (which returns [out, in] even though for case 2 the layout is [in, out]) and get_target_dim_for_compression in LinearLayerAttributes class only support the case when weights are transposed and not case 2.

I want to clarify if this is intended or would be ideal to utilize a transpose value in the class and return the weight shape and target dim accordingly.

@ljaljushkin
Copy link
Contributor Author

Hi @rk119!

Thanks for the detailed explanation! This is a good question. The get_weight_shape and get_target_dim_for_compression methods of layer_attributes don't look valid in these cases. We're going to create a GFI for deleting them, since they are used in obsolete methods only and are not supposed to be used in new algorithms. Instead, all backends implement their own get_weight_shape: openvino, torch, torch_fx.

@daniil-lyakhov
Copy link
Collaborator

daniil-lyakhov commented Feb 4, 2025

Hi @rk119!

Thanks for you analysis! I've opened new GFI to cover this inconsistency #3249

@rk119
Copy link
Contributor

rk119 commented Feb 5, 2025

Hi @rk119!

Thanks for the detailed explanation! This is a good question. The get_weight_shape and get_target_dim_for_compression methods of layer_attributes don't look valid in these cases. We're going to create a GFI for deleting them, since they are used in obsolete methods only and are not supposed to be used in new algorithms. Instead, all backends implement their own get_weight_shape: openvino, torch, torch_fx.

@ljaljushkin Ohh, thank you so much for clarifying this. Makes sense.

Hi @rk119!

Thanks for you analysis! I've opened new GFI to cover this inconsistency #3249

@daniil-lyakhov Alright :D

@rk119
Copy link
Contributor

rk119 commented Feb 19, 2025

Hi @ljaljushkin,

I apologize for delaying this issue, I had many important commitments to attend to and am finally available. I fixed the errors when lora_correction was enabled to successfully support transpose_a=True, and this can be viewed in my draft PR #3296, but I would appreciate your guidance in how to address the following:

When I run the test with just scale_estimation enabled with argument transpose_a=True in the model, I receive the following error

Image

This is because of the reduction axis and weights being transposed here cause the result_scale to have the shape (32,3,1). However, in the compress_weight method, in do_int_quantization, the reshape_weight_for_grouped_quantization returns weight with shape (3, 8, 32). This incompatibility causes the above error in _build_compress_model. The following is the stack trace

Image

@ljaljushkin
Copy link
Contributor Author

@rk119 thank you for the PR and findings!
Seems like the issue you are describing is not related to your changes.
There's a separate issue when transpose_b=False even on develop:

model = LMLinearModel(transpose_a=False, transpose_b=False).ov_model

So, propose to consider transpose_b=True only for now and fix the issue with transpose_b=False separately.

@alexsu52 alexsu52 moved this from Assigned to In Review in Good first issues Feb 24, 2025
@rk119
Copy link
Contributor

rk119 commented Mar 2, 2025

Hi @ljaljushkin,

I have a doubt regarding the test being a templated one since I am not sure if it’s necessary to do so and I could be wrong. The torch.matmul function does not include an argument for transposing the input or weight, so one must explicitly transpose them before passing them into matmul. Additionally, the transpose operation is represented as a separate node in the NNCF graph. As a result, the hidden_dim index remains the same in either case. However, in OpenVINO, the transpose is handled within the matmul function itself.

Example:

class Model(torch.nn.Module):
    def __init__(self, weight: torch.Tensor = torch.ones(size=(24, 32), dtype=torch.float32)):
        super().__init__()
        self.w = torch.nn.Parameter(weight)

    def forward(self, input):
        return torch.matmul(torch.transpose(input, -2, -1), self.w)

NNCF_Graph:

Image

@ljaljushkin
Copy link
Contributor Author

Hi @rk119! Yes, the models for OpenVINO and Torch backends are different.

But you can encapsulate this difference in the get_one_linear_model(transpose_a=True, transpose_b=True) similar to this:
https://github.com/openvinotoolkit/nncf/blob/develop/tests/cross_fw/test_templates/template_test_weights_compression.py#L235
Once the model is ready, the API call and body of the test should be common. That's why the template test is reasonable for me.

One thing is slightly different, that Torch backend requires nncf.Dataset even for data-free case in order to build the graph. OpenVINO backend doesn't, but no problem to pass it for both cases. Here's the example:
https://github.com/openvinotoolkit/nncf/blob/develop/tests/cross_fw/test_templates/template_test_weights_compression.py#L253

@rk119
Copy link
Contributor

rk119 commented Mar 4, 2025

@ljaljushkin Oh yah, alright then :)

@ljaljushkin
Copy link
Contributor Author

@ljaljushkin Oh yah, alright then :)

But on the other hand, this is pure OpenVINO backend specific of matmul operation. Agree with you, that test for torch is unnecessary.

@rk119
Copy link
Contributor

rk119 commented Mar 5, 2025

@ljaljushkin Oh yah, alright then :)

But on the other hand, this is pure OpenVINO backend specific of matmul operation. Agree with you, that test for torch is unnecessary.

ohh haha, okay :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Status: In Review
Development

No branches or pull requests

3 participants