-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support transposed input for data-aware Weights Compression #3296
base: develop
Are you sure you want to change the base?
Changes from 12 commits
ada9c0a
90381d7
a18ba70
64f769e
d69ff3f
c78b033
d493314
202815b
1dc14ba
50eddb4
9dccf21
f5d2a1f
3fde78a
aa062dc
051a183
7f7f468
b82b0c4
e19804c
8016399
5489669
298b587
9a7cb02
93c9cef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -110,9 +110,6 @@ def mean_statistic_collector( | |
|
||
@staticmethod | ||
def get_activation_port_id(node: NNCFNode, nncf_graph: NNCFGraph) -> int: | ||
if node.layer_attributes.input_attributes["transpose"]: | ||
msg = "Transposed input is not supported" | ||
raise nncf.UnsupportedModelError(msg) | ||
constant_ports = node.layer_attributes.get_const_port_ids() | ||
activation_ports = [ | ||
e.input_port_id for e in nncf_graph.get_input_edges(node) if e.input_port_id not in constant_ports | ||
|
@@ -204,7 +201,12 @@ def insert_adapters( | |
A_W = opset.constant(lora_A.data) | ||
B_W = opset.constant(lora_B.data) | ||
|
||
A_MM = opset.matmul(input_node, A_W, transpose_a=False, transpose_b=True) | ||
A_MM = opset.matmul( | ||
input_node, | ||
A_W, | ||
transpose_a=wc_params.node_with_weight.layer_attributes.input_attributes["transpose"], | ||
transpose_b=True, | ||
) | ||
B_MM = opset.matmul(A_MM, B_W, transpose_a=False, transpose_b=True) | ||
|
||
node_output_port = mm_node.output(0) | ||
|
@@ -364,6 +366,12 @@ def filter_func(point: StatisticPoint) -> bool: | |
|
||
return filter_func | ||
|
||
@staticmethod | ||
def get_input_hidden_dim(node: NNCFNode) -> int: | ||
if (node is not None) and (node.layer_attributes is not None): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder what is the valid case when this condition False? If it's invalid case, I'd not silently return -1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Initially, I did not have that conditional check and only returned this since while collecting statistics the I agree returning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's interesting, why this test fails. Can it be because node corresponds to embedding instead of matmul? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I implemented the approach specified in #3296 (comment) so far and yet the test is failing again in the checks.
I am not sure since I am unable to debug this test properly on my local machine due to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @ljaljushkin, I was able to run the test on my local machine by downgrading onnx version as it was causing some errors. I noticed an inconsistency in the test failures. Test run 1: Passes for Consecutive Test run 2:
Both test runs fail with the same error: Furthermore, I debugged the test failures and confirmed if the configurations are working as intended. The code seems to not have any flaws in the logic and those earlier failures execute properly in the debugger. It seems to be an issue with how statistics are cached internally (I am not exactly sure how to word it properly). I could be totally wrong about this, but the inconsistency seems to validate it for me so far. However, I am also not sure why this inconsistency is triggered now and not earlier before my changes were implemented. Would be grateful for some guidance and clarity on this :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ljaljushkin Alright then. Before I implement and push those changes, I thought I could explain my findings before adding sort. The NNCF graphs are different for the tests that pass vs the one that fails. The following screenshot is of one MVN node with edges pointing to child nodes in the following order for the tests that pass: The NNCF graph of the model which leads to a failure since this line of code accesses the first edge node which causes the error The screenshots above are the following node with the last edge pointing to shape: I also noticed in the NNCF graphs raw code that the nodes added are the same and differences arise when edges are added confirming that the order of nodes in which the edges were added was probably causing an issue. I debugged further to see why the nodes are ordered the same, but edges lists are different. I noticed that the child nodes returned for the same MVN node in 2 successful test runs are returned in a non-deterministic order in Noticed the same child nodes returned here while adding the edges are not sorted hence why I made the present change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good to have order of edges deterministic! But I doubt it will help resolve the original problem with the channel axis. IMO, it only solves the problem for this particular instance with MVN, but it doesn't necessarily guarantee that the first edge will always lead to a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah yes! You are right about it not always leading to
So, would you recommend I keep these changes or revert them? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you could add this to avoid confusion between different graphs for the same model. Hope it doesn't significantly affect graph building for large models. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright. I just made a minor change to keep it consistent with the sort condition for adding the nodes. |
||
return -2 if node.layer_attributes.input_attributes["transpose"] else -1 | ||
return -1 | ||
|
||
|
||
class OVAWQAlgoAlgoBackend(AWQAlgoBackend, OVWeightCompressionAlgoBackend): | ||
@staticmethod | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -365,7 +365,7 @@ def calculate_quantization_params( | |
return result_scale, zp | ||
|
||
@staticmethod | ||
def activations_to_wc_statistics(activations: List[Tensor]) -> WCTensorStatistic: | ||
def activations_to_wc_statistics(activations: List[Tensor], transpose: bool) -> WCTensorStatistic: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd pass |
||
""" | ||
Mimic the activation reducing logic from WeightCompression.get_statistic_points. | ||
|
||
|
@@ -376,7 +376,9 @@ def activations_to_wc_statistics(activations: List[Tensor]) -> WCTensorStatistic | |
shapes = [] | ||
for act in activations: | ||
shapes.append(act.shape) | ||
reduction_shape = tuple(range(act.ndim - 1)) | ||
reduction_shape = ( | ||
tuple(i for i in range(act.ndim) if i != act.ndim - 2) if transpose else tuple(range(act.ndim - 1)) | ||
) | ||
mean_values.append(fns.mean(act, axis=reduction_shape)) | ||
wc_statistics = WCTensorStatistic(mean_values, shapes) | ||
return wc_statistics | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1440,21 +1440,20 @@ def test_compression_with_different_algo_combinations(input_shape, kwargs): | |
) | ||
def test_compression_with_transposed_activations(kwargs): | ||
dataset_size = 4 | ||
model = LMLinearModel(transpose_a=True, transpose_b=False).ov_model | ||
model = LMLinearModel(transpose_a=True, transpose_b=True).ov_model | ||
input_data = [np.ones(inp.shape) for inp in model.inputs] * dataset_size | ||
dataset = Dataset(input_data) | ||
|
||
with pytest.raises(nncf.UnsupportedModelError): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since there's some issue with from contextlib import nullcontext
@pytest.mark.parametrize(
("transpose_a", "transpose_b", "raises_error"),
(
(False, True, False),
(True, True, False),
(False, False, True),
(True, False, True),
),
ids=["tb_nota", "ta_tb", "nota_notb", "ta_notb"]
)
@pytest.mark.parametrize(
"kwargs",
(
dict(scale_estimation=True),
dict(lora_correction=True),
dict(
gptq=True,
awq=True,
scale_estimation=True,
advanced_parameters=CompressionParams(gptq_params=GPTQParams(subset_size=2)),
),
),
ids=['se', 'lora', 'gptq_se_awq']
)
def test_compression_with_transpose(transpose_a, transpose_b, raises_error, kwargs):
dataset_size = 4
model = LMLinearModel(transpose_a=transpose_a, transpose_b=transpose_b).ov_model
input_data = [np.ones(inp.shape) for inp in model.inputs] * dataset_size
dataset = Dataset(input_data)
with pytest.raises(nncf.UnsupportedModelError) if raises_error else nullcontext():
compress_weights(
model,
mode=CompressWeightsMode.INT4_SYM,
ratio=1.0,
group_size=8,
subset_size=2,
dataset=dataset,
all_layers=True,
**kwargs,
) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For
do not raise an error and seem to pass for those two algorithms. Would you suggest I still implement a check to raise an unsupported error for these algorithms when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need in template test: #3230 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So, should I try implementing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both options are possible. The choice is yours. But regardless of your choice, please extend the test to verify all combinations as originally suggested. |
||
compress_weights( | ||
model, | ||
mode=CompressWeightsMode.INT4_SYM, | ||
ratio=1.0, | ||
group_size=8, | ||
subset_size=2, | ||
dataset=dataset, | ||
all_layers=True, | ||
**kwargs, | ||
) | ||
compress_weights( | ||
model, | ||
mode=CompressWeightsMode.INT4_SYM, | ||
ratio=1.0, | ||
group_size=8, | ||
subset_size=2, | ||
dataset=dataset, | ||
all_layers=True, | ||
**kwargs, | ||
) | ||
|
||
|
||
class TestOVTemplateWeightCompression(TemplateWeightCompression): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
get_input_hidden_dim
is positive?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this backend function mainly because in comparison to OV, Torch tensors are reduced across all the dimensions except for the last one (transposed or not) hence
-1
is returned by default. Only for OV the method is overriden, whentranspose_a=True
then-2
is returned leaving the second last dimension from being reduced since later in theopset.matmul
operation it will be swapped with the last dim fortranspose_a=True
. In other case which istranspose_a=False
it will return-1
as well.Unfortunately,
node.metatype.output_channel_axis
returnsNone
over here hence I'm not sure how to utilize it:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it should work for
output_edge.to_node
.Suggest to keep
get_input_hidden_dim
, but probably rename toget_activation_channel_axis
and implement for OpenVINO backend a bit more general to cover cases withConvolutions
.You can call this
get_activation_channel_axis
from here for OpenVINO: https://github.com/openvinotoolkit/nncf/blob/develop/nncf/openvino/graph/node_utils.py#L631For torch, this way should work:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I was trying to look for an existing implementation to obtain the dim but could not find it. My bad. Yep, this approach seems better.