You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TESTS] Use FP32 inference precision, FP16 KV cache precision for pipelines (#1485)
OpenVINO plugins enable different kind of optimizations by default like
KV cache compression to int8, fp16 inference precision, while in GenAI
tests we want to test pipelines and how they are compared against HF /
optimum w/o extra optimizations:
https://github.com/openvinotoolkit/openvino.genai/blob/4db67aecac78885c6d1e302f348c9489e2154388/tests/python_tests/common.py#L318-L325
Hopefully, we can merge int8 KV cache by default for CB then
#1206, because in
tests we will still compare FP16 KV cache, while official Validation
should be responsible for validation against reference via WWB metrics.
0 commit comments