|
13 | 13 | "id": "2bb9b46a-d9c5-42dc-8e50-6700180aad0c",
|
14 | 14 | "metadata": {},
|
15 | 15 | "source": [
|
16 |
| - "This notebook illustrates the usage of the `LLAMA_CPP` plugin for OpenVINO, which enables the `llama.cpp`-powered inference of corresponding GGUF-format model files via OpenVINO API. The user flow will be demonstrated on an LLM inferencing task with the Qwen-7B-Chat model in Chinese." |
| 16 | + "This notebook illustrates the usage of the `LLAMA_CPP` plugin for OpenVINO, which enables the `llama.cpp`-powered inference of corresponding GGUF-format model files via OpenVINO API. The user flow will be demonstrated on an LLM inferencing task with the Qwen-7B-Chat model in Chinese. This notebook executes direct Linux shell commands, and therefore should be run in a Linux environment." |
17 | 17 | ]
|
18 | 18 | },
|
19 | 19 | {
|
|
43 | 43 | "scrolled": true
|
44 | 44 | },
|
45 | 45 | "source": [
|
46 |
| - "The `LLAMA_CPP` plugin must be built from [sources](https://github.com/openvinotoolkit/openvino_contrib/modules/llama_cpp_plugin) in a standard [OpenVINO extra module user flow](https://github.com/openvinotoolkit/openvino_contrib/#how-to-build-openvino-with-extra-modules). " |
| 46 | + "The `LLAMA_CPP` plugin must be built from [sources](https://github.com/openvinotoolkit/openvino_contrib/modules/llama_cpp_plugin) in a standard [OpenVINO extra module user flow](https://github.com/openvinotoolkit/openvino_contrib/#how-to-build-openvino-with-extra-modules):" |
47 | 47 | ]
|
48 | 48 | },
|
49 | 49 | {
|
|
53 | 53 | "metadata": {},
|
54 | 54 | "outputs": [],
|
55 | 55 | "source": [
|
| 56 | + "!apt-get install -y cmake python3.8-dev build-essential\n", |
56 | 57 | "!git clone https://github.com/openvinotoolkit/openvino_contrib\n",
|
57 | 58 | "!git clone --recurse-submodules https://github.com/openvinotoolkit/openvino\n",
|
58 | 59 | "\n",
|
59 |
| - "# Add -DLLAMA_CUBLAS=1 to the cmake line below to build the plugin with the CUDA backend.\n", |
| 60 | + "!pip install --upgrade pip \n", |
| 61 | + "!pip install --upgrade build setuptools wheel\n", |
| 62 | + "!pip install -r openvino/src/bindings/python/wheel/requirements-dev.txt\n", |
| 63 | + "\n", |
| 64 | + "# Add -DLLAMA_CUBLAS=1 to the cmake line below build the plugin with the CUDA backend.\n", |
60 | 65 | "# The underlying llama.cpp inference code will be executed on CUDA-powered GPUs on your host.\n",
|
61 |
| - "!cmake -B build -DCMAKE_BUILD_TYPE=Release -DOPENVINO_EXTRA_MODULES=../openvino_contrib/modules/llama_cpp_plugin -DENABLE_PLUGINS_XML=ON -DENABLE_LLAMA_CPP_PLUGIN_REGISTRATION=ON -DENABLE_PYTHON=1 -DENABLE_WHEEL=ON openvino #-DLLAMA_CUBLAS=1\n", |
| 66 | + "!cmake -B build -DCMAKE_BUILD_TYPE=Release -DOPENVINO_EXTRA_MODULES=../openvino_contrib/modules/llama_cpp_plugin -DENABLE_PLUGINS_XML=ON -DENABLE_LLAMA_CPP_PLUGIN_REGISTRATION=ON -DENABLE_PYTHON=1 -DPYTHON_EXECUTABLE=`which python3.8` -DENABLE_WHEEL=ON openvino #-DLLAMA_CUBLAS=1\n", |
62 | 67 | "\n",
|
63 |
| - "!cmake --build build --parallel -- llama_cpp_plugin pyopenvino ie_wheel" |
| 68 | + "!cmake --build build --parallel `nproc` -- llama_cpp_plugin pyopenvino ie_wheel" |
64 | 69 | ]
|
65 | 70 | },
|
66 | 71 | {
|
67 | 72 | "cell_type": "markdown",
|
68 | 73 | "id": "458258b7-2e00-44b3-b3ee-e73238a47019",
|
69 | 74 | "metadata": {},
|
70 | 75 | "source": [
|
71 |
| - "After the build, the plugin binaries should be installed into the same directory as the rest of the OpenVINO plugin binaries, and the plugin itself should be registered in the `plugins.xml` file in the same directory. In our case, we configured the build above to generate a Python `.whl`, which will contain the necessary binaries and files automatically after installation into a virtual environment." |
| 76 | + "After the build, the plugin binaries should be installed into the same directory as the rest of the OpenVINO plugin binaries, and the plugin itself should be registered in the `plugins.xml` file in the same directory. In our case, we configured the build above to generate a Python `.whl`, which will place the necessary binaries and files to their required locations automatically after an installation into a virtual environment:" |
72 | 77 | ]
|
73 | 78 | },
|
74 | 79 | {
|
|
253 | 258 | "id": "7ba9e1a2-beda-426a-9979-0e1ab2970430",
|
254 | 259 | "metadata": {},
|
255 | 260 | "source": [
|
256 |
| - "Note the last line in the cell above - since the model inference is stateful, we should reset the model's internal state if we want to process new text inputs that are unrelated to the current chatbot interaction. The same OpenVINO API call may be used to do this, just like with other OpenVINO \"stateful\" models:" |
| 261 | + "Note the last line in the cell above - since the model inference is stateful, we should reset the model's internal state if we want to process new text inputs that are unrelated to the current chatbot interaction. " |
257 | 262 | ]
|
258 | 263 | }
|
259 | 264 | ],
|
|
0 commit comments