You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Visualisation of weight compression results (#3009)
### Changes
The tables and images have been added to illustrate the trade-off
between accuracy and footprint for the INT4_ASYM mode.
A script has been created to automate the process of generating this
visualization from a CSV file containing all the necessary raw data.
The script calculates the compression rate and the average relative
error for a given model size and metrics.
It should reduce the likelihood of errors and simplify the maintenance
of results.
### Reason for changes
INT4_ASYM is a more accurate and preferable mode for weight compression.
The previous results were obtained using the INT4_SYM mode.
### Related tickets
n/a
### Tests
tests/tools/test_compression_visualization.py
Copy file name to clipboardexpand all lines: docs/usage/post_training_compression/weights_compression/Usage.md
+36-222
Original file line number
Diff line number
Diff line change
@@ -569,14 +569,13 @@ Here is the perplexity and accuracy with data-free and data-aware mixed-precisio
569
569
570
570
#### Accuracy/Footprint trade-off
571
571
572
-
Below are the tables showing the accuracy/footprint trade-off for `Qwen/Qwen2-7B` and
572
+
Below are the tables showing the accuracy/footprint trade-off for `meta-llama/Llama-2-7b-chat-hf` and
573
573
`microsoft/Phi-3-mini-4k-instruct` compressed with different options.
574
574
575
575
Compression ratio is defined as the ratio between the size of fp32 model and size of the compressed one.
576
-
Accuracy metrics are measured on 4 tasks [lambada openai](https://huggingface.co/datasets/EleutherAI/lambada_openai), [wikitext](https://arxiv.org/pdf/1609.07843.pdf),
- More layers in 8 bit does improve accuracy, but it increases the footprint a lot.
594
-
- Scale Estimation, AWQ, GPTQ do improve accuracy of the baseline int4 model without footprint increase.
595
-
- Lora correction algorithm improves the accuracy of int4 models further with a footprint much less compared to mixed-precision models with the same or worse accuracy.
592
+
- More layers in 8 bit does improve accuracy, but it also increases the footprint significantly.
593
+
- Scale Estimation, AWQ, GPTQ improve the accuracy of the baseline int4 model without increasing the footprint.
594
+
-The Lora Correction algorithm further improves the accuracy of int4 models with a much smaller footprint compared to mixed-precision models that have the same or worse accuracy.
596
595
597
-
Accuracy/footprint trade-off for `Qwen/Qwen2-7B`:
596
+
Accuracy/footprint trade-off for `meta-llama/Llama-2-7b-chat-hf`:
Also it plots a trade-off between accuracy and footprint by processing a CSV file in a specific format.
132
+
The resulting images are employed for [the relevant section](/docs/usage/post_training_compression/weights_compression/Usage.md#accuracyfootprint-trade-off) in the Weight Compression documentation:
The input file should contain the following columns:
139
+
140
+
-`mode` - The string indicating the compression method used for the model. The 'fp32' mode corresponds to the uncompressed version. To calculate the accuracy-footprint trade-off, the following words must be present in at least one row: "gptq", "int4", "fp32", "int8".
141
+
-`%int4` - The ratio of int4 layers.
142
+
-`%int8` - The ratio of int8 layers.
143
+
-`lora rank` - The rank of the adapters used in Lora Correction algorithm.
144
+
-`plot name` - Short names for annotation in the plot.
145
+
-`model size, Gb` - The size of the corresponding model in Gb.
146
+
-`wikitext, word perplexity` - Word perplexity on the Wikitext dataset, measured using rolling loglikelihoods in the [lm_eval tool](https://github.com/EleutherAI/lm-evaluation-harness).
147
+
-`lambada-openai, acc` - Accuracy on the Lambada-OpenAI dataset, measured using [lm_eval tool](https://github.com/EleutherAI/lm-evaluation-harness).
148
+
-`lambada-openai, perplexity` - Perplexity on the Lambada-OpenAI dataset, measured using the [lm_eval tool](https://github.com/EleutherAI/lm-evaluation-harness).
149
+
-`WWB, similarity` - Similarity, measured using the [WWB tool](https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python/).
0 commit comments