-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods #3230
Comments
.take |
Thank you for looking into this issue! Please let us know if you have any questions or require any help. |
Hi @ljaljushkin, I have started working on this issue. While investigating on why The model in test_compression_with_transposed_activations generates the following NNCF Graphs in compress_weights_impl: Case 1 -> transpose_a = False, transpose_b = True (presently supported) input shape: [1,24,16] # batch, seq_length, hidden dim hidden: 16 output: 32 weight = [32, 16] # [out, hdim] for inner dimensions to match after transpose and then multiply pass to matmul(input, weight, transpose_a=False, transpose_b=True): input is the same as [1,24,16] weight is transposed to [16, 32] matrix-matrix multiplication output = [1, 24, 32] Case 2 -> transpose_a = True, transpose_b = False input shape: [1,24,16] hidden: 24 output: 32 weight = [24, 32] # [hdim, out] for inner dimensions to match after transpose of input and then multiply pass to matmul(input, weight, transpose_a=True, transpose_b=False): input is transposed to [1, 16, 24] # according to openvino docs weight is the same as [24, 32] matrix-matrix multiplication output = [1, 16, 32] However, I wanted to clarify if the NNCF Graph for Case 2 is being built properly because for the supported Case 1 in create_nncf_graph, while obtaining the layer_attributes the function get_weighted_layer_attributes gets the weight_layout according to For Case 1, its correct: and when I call some methods of the layer_attributes object, they provide the correct details In Case 2, the following information is correct: but the information retrieved from layer_attributes is not accurate I noticed that get_weight_shape (which returns [out, in] even though for case 2 the layout is [in, out]) and get_target_dim_for_compression in LinearLayerAttributes class only support the case when weights are transposed and not case 2. I want to clarify if this is intended or would be ideal to utilize a |
Hi @rk119! Thanks for the detailed explanation! This is a good question. The |
@ljaljushkin Ohh, thank you so much for clarifying this. Makes sense.
@daniil-lyakhov Alright :D |
Hi @ljaljushkin, I apologize for delaying this issue, I had many important commitments to attend to and am finally available. I fixed the errors when When I run the test with just This is because of the reduction axis and weights being transposed here cause the |
@rk119 thank you for the PR and findings!
So, propose to consider |
Hi @ljaljushkin, I have a doubt regarding the test being a templated one since I am not sure if it’s necessary to do so and I could be wrong. The Example:
NNCF_Graph: |
Hi @rk119! Yes, the models for OpenVINO and Torch backends are different. But you can encapsulate this difference in the One thing is slightly different, that Torch backend requires |
@ljaljushkin Oh yah, alright then :) |
But on the other hand, this is pure OpenVINO backend specific of matmul operation. Agree with you, that test for torch is unnecessary. |
ohh haha, okay :D |
Context
Matmul operation in OpenVINO assumes an implicit shape alignment for input arguments. It applies transpositions specified by optional
transpose_a
andtranspose_b
attributes: OV spec.Currently, weight compression in NNCF does not support
transpose_a
=True.Here's the check and test.
Potentially, it affects Mixed-Precision, AWQ, Scale Estimation and Lora Correction algorithms.
What needs to be done?
The task is to enable data-aware weight compression methods (Mixed-Precision, AWQ, Scale Estimation, Lora Correction) for models with transposed input matrix multiplications.
process_stats
should be corrected, check - removed.LMLinearModel
withtranspose_a=False
by default should pass withtranspose_a=True
.Example Pull Requests
#3179
#3129
Resources
Contact points
@ljaljushkin
Ticket
No response
The text was updated successfully, but these errors were encountered: