You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: examples/llm_compression/openvino/tiny_llama_synthetic_data/README.md
+1-5
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,12 @@
1
1
# Compress TinyLLama model using synthetic data
2
2
3
-
This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) model.
4
-
To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using [WhoWhatBench](https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/who_what_benchmark) library.
3
+
This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) model. This leads to a significant decrease in model footprint and performance improvement with OpenVINO.
5
4
6
5
The example includes the following steps:
7
6
8
-
- Prepare `wikitext` dataset.
9
7
- Prepare `TinyLlama/TinyLlama-1.1B-Chat-v1.0` text-generation model in OpenVINO representation using [Optimum-Intel](https://huggingface.co/docs/optimum/intel/inference).
10
-
- Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & `wikitext` dataset.
11
8
- Prepare `synthetic` dataset using `nncf.data.generate_text_data` method.
12
9
- Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & `synthetic` dataset.
13
-
- Measure the similarity of the two models optimized with different datasets.
0 commit comments