Skip to content

Commit 983a94d

Browse files
1 parent 18750e7 commit 983a94d

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

examples/llm_compression/openvino/tiny_llama_find_hyperparams/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Find the appropriate hyperparameters to compress the TinyLLama model
22

33
This example demonstrates how to find the appropriate `awq`, `ratio` and `group_size` parameters to compress the weights of the TinyLLama model from the HuggingFace Transformers. OpenVINO backend supports inference of mixed-precision models with weights compressed to a 4-bit data type as a primary precision. The fastest mixed-precision mode is `INT4_SYM`, but it may lead to a significant accuracy degradation, especially for models of moderate size. In this example, the allowed maximum deviation from the original model is `0.2` points of the similarity metric. If the similarity of the compressed model is not satisfying, there are 3 hyper-parameters to tune: `awq`, `group_size` and `ratio`. Smaller `group_size` and `ratio` of 4-bit layers usually improve accuracy at the sacrifice of model size and inference latency. Generally, the accuracy of the 4-bit compressed models also can be improved by using AWQ algorithm over data-based mixed-precision algorithm.
4-
To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using [WhoWhatBench](https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python/who_what_benchmark) library.
4+
To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using [WhoWhatBench](https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/who_what_benchmark) library.
55

66
The example includes the following steps:
77

examples/llm_compression/openvino/tiny_llama_synthetic_data/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Compress TinyLLama model using synthetic data
22

33
This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) model.
4-
To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using [WhoWhatBench](https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python/who_what_benchmark) library.
4+
To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using [WhoWhatBench](https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/who_what_benchmark) library.
55

66
The example includes the following steps:
77

tools/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ The input file should contain the following columns:
146146
- `wikitext, word perplexity` - Word perplexity on the Wikitext dataset, measured using rolling loglikelihoods in the [lm_eval tool](https://github.com/EleutherAI/lm-evaluation-harness).
147147
- `lambada-openai, acc` - Accuracy on the Lambada-OpenAI dataset, measured using [lm_eval tool](https://github.com/EleutherAI/lm-evaluation-harness).
148148
- `lambada-openai, perplexity` - Perplexity on the Lambada-OpenAI dataset, measured using the [lm_eval tool](https://github.com/EleutherAI/lm-evaluation-harness).
149-
- `WWB, similarity` - Similarity, measured using the [WWB tool](https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python/).
149+
- `WWB, similarity` - Similarity, measured using the [WWB tool](https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/llm_bench).
150150

151151
### Example of script usage
152152

0 commit comments

Comments
 (0)