Skip to content

Commit 1749aac

Browse files
committed
notebook update
1 parent 2fd0416 commit 1749aac

File tree

1 file changed

+27
-13
lines changed

1 file changed

+27
-13
lines changed

notebooks/openvino/phi-2_on_mtl.ipynb

+27-13
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,13 @@
55
"id": "aeb16663-be53-4260-b62d-44611b6771ec",
66
"metadata": {},
77
"source": [
8-
"# Chat and Code with Phi-2 with OpenVINO and 🤗 Optimum on Intel® Meteor Lake iGPU\n",
9-
"In this notebook we will show how to export and quantize Phi-2 to 4 bits.\n",
8+
"# Chat and Code with Phi-2 with OpenVINO and 🤗 Optimum on Intel Meteor Lake iGPU\n",
9+
"In this notebook we will show how to export and apply weight only quantization on Phi-2 to 4 bits.\n",
1010
"Then using the quantized model we will show how to generate code completions with the model running on Intel Meteor Lake iGPU presenting a good experience of running GenAI locally on Intel PC marking the start of the AIPC Era!\n",
11-
"Then we will show how to talk with Phi-2 in a ChatBot demo running completley locally on your Laptop!"
11+
"Then we will show how to talk with Phi-2 in a ChatBot demo running completely locally on your Laptop!\n",
12+
"\n",
13+
"[Phi-2](https://huggingface.co/microsoft/phi-2) is a 2.7 billion-parameter language model trained by Microsoft. Microsoft in the model's release [blog post](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) states that Phi-2:\n",
14+
"> demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters. On complex benchmarks Phi-2 matches or outperforms models up to 25x larger, thanks to new innovations in model scaling and training data curation."
1215
]
1316
},
1417
{
@@ -19,7 +22,7 @@
1922
"## Install dependencies\n",
2023
"Make sure you have the latest GPU drivers installed on your machine: https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html.\n",
2124
"\n",
22-
"We will start by installing the dependencies, you can either uncomment the following cell and run it."
25+
"We will start by installing the dependencies, that can be done by uncommenting the following cell and run it."
2326
]
2427
},
2528
{
@@ -51,7 +54,18 @@
5154
"metadata": {},
5255
"source": [
5356
"## Configuration\n",
54-
"Here we will configure which model to load and other attributes. We will explain everything :)"
57+
"Here we will configure which model to load and other attributes. We will explain everything 😄\n",
58+
"* `model_name`: the name or path of the model we want to export and quantize, can be either on the 🤗 Hub or a local directory on your laptop.\n",
59+
"* `save_name`: directory where the exported & quantized model will be saved.\n",
60+
"* `precision`: the compute data type we will use for inference of the model, can be either `f32` or `f16`. We use FP32 precision due to Phi-2 overflow issues in FP16.\n",
61+
"* `quantization_config`: here we set the attributes for the weight only quantization algorithm:\n",
62+
" * `bits`: number of bits to use for quantization, can be either `8` or `4`.\n",
63+
" * `sym`: whether to use symmetric quantization or not, can be either `True` or `False`.\n",
64+
" * `group_size`: number of weights to group together for quantization. We use groups of 128 to ensure no accuracy degradation.\n",
65+
" * `ratio`: the ratio of the model to quantize to #`bits`. The rest will be quantize to the default bits number, `8`.\n",
66+
"* `device`: the device to use for inference, can be either `cpu` or `gpu`.\n",
67+
"* `stateful`: Optimize model by setting the KV cache as part of the models state instead of as an input\n",
68+
"\n"
5569
]
5670
},
5771
{
@@ -62,16 +76,16 @@
6276
"outputs": [],
6377
"source": [
6478
"model_name = 'microsoft/phi-2'\n",
65-
"save_name = './phi-2-woq4' # Directory where the exported & quantized model will be saved\n",
66-
"precision = 'f32' # We use FP32 precision due to Phi-2 overflow issues in FP16.\n",
79+
"save_name = './phi-2-woq4'\n",
80+
"precision = 'f32'\n",
6781
"quantization_config = OVWeightQuantizationConfig(\n",
6882
" bits=4,\n",
69-
" sym=False, # Use asymmetric quantization\n",
70-
" group_size=128, # Quantize weights in groups of 128 to ensure no accuracy degradation\n",
71-
" ratio=0.8, # 80% of the model layers will be quantized to 4bit, the rest will be quantized to 8bit.\n",
83+
" sym=False,\n",
84+
" group_size=128,\n",
85+
" ratio=0.8,\n",
7286
")\n",
73-
"device = 'gpu' # choose from ['cpu', 'gpu']\n",
74-
"stateful = True # Optimize model by setting the KV cache as part of the models state instead of as an input"
87+
"device = 'gpu'\n",
88+
"stateful = True "
7589
]
7690
},
7791
{
@@ -444,7 +458,7 @@
444458
"\n",
445459
"\n",
446460
"with gr.Blocks(theme=gr.themes.Soft()) as demo:\n",
447-
" gr.Markdown('<h1 style=\"text-align: center;\">Talk with Phi-2 on Meteor Lake iGPU</h1>')\n",
461+
" gr.Markdown('<h1 style=\"text-align: center;\">Chat with Phi-2 on Meteor Lake iGPU</h1>')\n",
448462
" chatbot = gr.Chatbot()\n",
449463
" with gr.Row():\n",
450464
" msg = gr.Textbox(placeholder=\"Enter message here...\", show_label=False, autofocus=True, scale=75)\n",

0 commit comments

Comments
 (0)