Remove KV-cache compression disabling flag for compressed models #1141

nikita-savelyevv · 2025-02-05T08:35:40Z

What does this PR do?

Remove "KV_CACHE_PRECISION": "f16" rt info flag for all weight-compressed models. This way applying weight compression also enables KV cache compression.
Add deprecation warning for OVDynamicQuantizationConfig.

Tests:

Check that KV_CACHE_PRECISION rt info flag is not present for compressed models.
Take list of submodels from image-text-to-text models as list(model.submodels.values()), this is more convenient.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2025-02-05T08:41:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/intel/openvino/configuration.py

Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>

nikita-savelyevv · 2025-02-05T12:49:56Z

tests/openvino/test_exporters_cli.py

-                    [{"int8": 14, "int4": 16}, {"int8": 9}, {"int8": 1}],
+                    [{"int8": 14, "int4": 16}, {"int8": 1}, {"int8": 9}],


Order of some reference values is changed because of the addition of this line: submodels = list(model.submodels.values()). This alters the order of submodels compared to before.

echarlaix

LGTM!

Remove kv cache compression disabling flag for compressed models

eec3b7a

nikita-savelyevv added 3 commits February 5, 2025 10:07

Add kv-cache precision flag check to a separate method

91bedf9

Add deprecation warning for

9631027

Fix test

2836458

nikita-savelyevv requested a review from AlexKoff88 February 5, 2025 10:28

nikita-savelyevv marked this pull request as ready for review February 5, 2025 10:28

AlexKoff88 reviewed Feb 5, 2025

View reviewed changes

optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved

AlexKoff88 approved these changes Feb 5, 2025

View reviewed changes

AlexKoff88 requested a review from eaidova February 5, 2025 10:45

eaidova approved these changes Feb 5, 2025

View reviewed changes

Update optimum/intel/openvino/configuration.py

81dd468

Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>

nikita-savelyevv requested review from IlyasMoutawwakil and echarlaix February 5, 2025 12:43

nikita-savelyevv commented Feb 5, 2025

View reviewed changes

echarlaix approved these changes Feb 5, 2025

View reviewed changes

echarlaix merged commit f601b8b into huggingface:main Feb 5, 2025
17 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove KV-cache compression disabling flag for compressed models #1141

Remove KV-cache compression disabling flag for compressed models #1141

nikita-savelyevv commented Feb 5, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 5, 2025

nikita-savelyevv Feb 5, 2025

echarlaix left a comment

		[{"int8": 14, "int4": 16}, {"int8": 9}, {"int8": 1}],
		[{"int8": 14, "int4": 16}, {"int8": 1}, {"int8": 9}],

Remove KV-cache compression disabling flag for compressed models #1141

Remove KV-cache compression disabling flag for compressed models #1141

Conversation

nikita-savelyevv commented Feb 5, 2025 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Feb 5, 2025

nikita-savelyevv Feb 5, 2025

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

nikita-savelyevv commented Feb 5, 2025 •

edited

Loading