You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: samples/python/multinomial_causal_lm/README.md
+8
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,8 @@
2
2
3
3
This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `ov::genai::LLMPipeline` and configures it to run random sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.
4
4
5
+
This sample also contains example implementation of an iterable streamer with bufferisation.
6
+
5
7
## Download and convert the model and tokenizers
6
8
7
9
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.
@@ -22,6 +24,12 @@ Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is
22
24
23
25
See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.
24
26
27
+
## Streaming
28
+
29
+
This Python example demonstrates custom detokenization with bufferization. The streamer receives integer tokens corresponding to each word or subword, one by one. If tokens are decoded individually, the resulting text misses necessary spaces because of detokenize(tokenize(" a")) == "a".
30
+
31
+
To address this, the detokenizer needs a larger context. We accumulate tokens in a tokens_cache buffer and decode multiple tokens together, adding the text to the streaming queue only when a complete decoded chunk is ready. We run a separate thread to print all new elements arriving in this queue from the generation pipeline. Each generated chunk of text is put into a synchronized queue, ensuring that all put and get operations are thread-safe and blocked until they can proceed.
Copy file name to clipboardexpand all lines: src/README.md
+60-23
Original file line number
Diff line number
Diff line change
@@ -5,27 +5,41 @@ It hides the complexity of the generation process and minimizes the amount of co
5
5
6
6
## Install OpenVINO™ GenAI
7
7
8
+
> **NOTE**: Please make sure that you are following the versions compatibility rules, refer to the [OpenVINO™ GenAI Dependencies](#openvino-genai-dependencies) for more information.
9
+
8
10
The OpenVINO™ GenAI flavor is available for installation via Archive and PyPI distributions.
9
11
To install OpenVINO™ GenAI, refer to the [Install Guide](https://docs.openvino.ai/2024/get-started/install-openvino.html).
10
12
11
-
To build OpenVINO™ GenAI library from source, refer to the [Build Instructions](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/2/src/docs/BUILD.md).
13
+
To build OpenVINO™ GenAI library from source, refer to the [Build Instructions](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/src/docs/BUILD.md).
14
+
15
+
### OpenVINO™ GenAI Dependencies
16
+
17
+
OpenVINO™ GenAI depends on [OpenVINO](https://github.com/openvinotoolkit/openvino) and [OpenVINO Tokenizers](https://github.com/openvinotoolkit/openvino_tokenizers).
18
+
19
+
When installing OpenVINO™ GenAI from PyPi, the same versions of OpenVINO and OpenVINO Tokenizers are used (e.g. `openvino==2024.3.0` and `openvino-tokenizers==2024.3.0.0` are installed for `openvino-genai==2024.3.0`).
20
+
If you update one of the dependency packages (e.g. install `openvino-nightly`), versions might be incompatible due to different ABI and running OpenVINO GenAI can result in errors (e.g. `ImportError: libopenvino.so.2440: cannot open shared object file: No such file or directory`).
21
+
Having packages version in format `<MAJOR>.<MINOR>.<PATCH>.<REVISION>`, only `<REVISION>` part of the full version can be varied to ensure ABI compatibility, while changing `<MAJOR>`, `<MINOR>` or `<PATCH>` parts of the version might break ABI.
22
+
23
+
GenAI, Tokenizers, and OpenVINO wheels for Linux on PyPI are compiled with `_GLIBCXX_USE_CXX11_ABI=0` to cover a wider range of platforms. In contrast, C++ archive distributions for Ubuntu are compiled with `_GLIBCXX_USE_CXX11_ABI=1`. It is not possible to mix different Application Binary Interfaces (ABIs) because doing so results in a link error. This incompatibility prevents the use of, for example, OpenVINO from C++ archive distributions alongside GenAI from PyPI.
24
+
25
+
If you want to try OpenVINO GenAI with different dependencies versions (**not** prebuilt packages as archives or python wheels), build OpenVINO GenAI library from source.
12
26
13
27
## Usage
14
28
15
29
### Prerequisites
16
30
17
31
1. Installed OpenVINO™ GenAI
18
32
19
-
> If OpenVINO GenAI is installed via archive distribution or built from source, you will need to install additional python dependencies (e.g. `optimum-cli` for simplified model downloading and exporting, it's not required to install [./samples/requirements.txt](./samples/requirements.txt) for deployment if the model has already been exported):
20
-
>
21
-
> ```sh
22
-
># (Optional) Clone OpenVINO GenAI repository if it does not exist
std::string text = m_tokenizer.decode(tokens_cache);
180
-
...
181
-
*/
182
-
return stop_flag; // flag whether generation should be stoped, if true generation stops.
191
+
// Custom decoding/tokens processing logic.
192
+
193
+
// Returns a flag whether generation should be stoped, if true generation stops.
194
+
return false;
183
195
};
184
196
185
197
void end() {
186
-
/* custom finalization */
198
+
// Custom finalization logic.
187
199
};
188
200
};
189
201
@@ -192,10 +204,35 @@ int main(int argc, char* argv[]) {
192
204
193
205
std::string model_path = argv[1];
194
206
ov::genai::LLMPipeline pipe(model_path, "CPU");
195
-
std::cout << pipe.generate("The Sun is yellow because", ov::genai::streamer(custom_streamer), ov::genai::max_new_tokens(200));
207
+
std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(15), ov::genai::streamer(custom_streamer));
196
208
}
197
209
```
198
210
211
+
Python template for a streamer.
212
+
```py
213
+
import openvino_genai as ov_genai
214
+
215
+
class CustomStreamer(ov_genai.StreamerBase):
216
+
def __init__(self):
217
+
super().__init__()
218
+
# Initialization logic.
219
+
220
+
def put(self, token_id) -> bool:
221
+
# Custom decoding/tokens processing logic.
222
+
223
+
# Returns a flag whether generation should be stoped, if true generation stops.
224
+
return False
225
+
226
+
def end(self):
227
+
# Custom finalization logic.
228
+
229
+
pipe = ov_genai.LLMPipeline(model_path, "CPU")
230
+
custom_streamer = CustomStreamer()
231
+
232
+
pipe.generate("The Sun is yellow because", max_new_tokens=15, streamer=custom_streamer)
233
+
```
234
+
For fully implemented iterable CustomStreamer please refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/samples/python/multinomial_causal_lm/README.md) sample.
235
+
199
236
### Performance Metrics
200
237
201
238
`openvino_genai.PerfMetrics` (referred as `PerfMetrics` for simplicity) is a structure that holds performance metrics for each generate call. `PerfMetrics` holds fields with mean and standard deviations for the following metrics:
@@ -289,8 +326,8 @@ For more examples of how metrics are used, please refer to the Python [benchmark
289
326
290
327
## How It Works
291
328
292
-
For information on how OpenVINO™ GenAI works, refer to the [How It Works Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/2/src/docs/HOW_IT_WORKS.md).
329
+
For information on how OpenVINO™ GenAI works, refer to the [How It Works Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/src/docs/HOW_IT_WORKS.md).
293
330
294
331
## Supported Models
295
332
296
-
For a list of supported models, refer to the [Supported Models Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/2/src/docs/SUPPORTED_MODELS.md).
333
+
For a list of supported models, refer to the [Supported Models Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/src/docs/SUPPORTED_MODELS.md).
0 commit comments