Skip to content

Commit faf0bee

Browse files
committed
Merge branch 'releases/2024/3' of https://github.com/openvinotoolkit/openvino.genai into guozhong/add_requirements_2024_3
2 parents a923804 + 33667bf commit faf0bee

File tree

11 files changed

+511
-99
lines changed

11 files changed

+511
-99
lines changed

.github/workflows/causal_lm_cpp.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ concurrency:
1313
cancel-in-progress: true
1414

1515
env:
16-
l_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/linux/l_openvino_toolkit_ubuntu20_2024.3.0.dev20240711_x86_64.tgz
17-
m_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/macos/m_openvino_toolkit_macos_12_6_2024.3.0.dev20240711_x86_64.tgz
18-
w_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/windows/w_openvino_toolkit_windows_2024.3.0.dev20240711_x86_64.zip
16+
l_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/linux/l_openvino_toolkit_ubuntu20_2024.3.0.dev20240719_x86_64.tgz
17+
m_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/macos/m_openvino_toolkit_macos_12_6_2024.3.0.dev20240719_x86_64.tgz
18+
w_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/windows/w_openvino_toolkit_windows_2024.3.0.dev20240719_x86_64.zip
1919
jobs:
2020
cpp-multinomial-greedy_causal_lm-ubuntu:
2121
runs-on: ubuntu-20.04-8-cores

.github/workflows/genai_package.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ concurrency:
55
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
66
cancel-in-progress: true
77
env:
8-
l_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/linux/l_openvino_toolkit_ubuntu20_2024.3.0.dev20240711_x86_64.tgz
9-
m_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/macos/m_openvino_toolkit_macos_12_6_2024.3.0.dev20240711_x86_64.tgz
10-
w_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/windows/w_openvino_toolkit_windows_2024.3.0.dev20240711_x86_64.zip
8+
l_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/linux/l_openvino_toolkit_ubuntu20_2024.3.0.dev20240719_x86_64.tgz
9+
m_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/macos/m_openvino_toolkit_macos_12_6_2024.3.0.dev20240719_x86_64.tgz
10+
w_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/windows/w_openvino_toolkit_windows_2024.3.0.dev20240719_x86_64.zip
1111
jobs:
1212
ubuntu_genai_package:
1313
strategy:

.github/workflows/genai_python_lib.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ concurrency:
55
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
66
cancel-in-progress: true
77
env:
8-
l_ov_centos_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/linux/l_openvino_toolkit_centos7_2024.3.0.dev20240711_x86_64.tgz
9-
m_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/macos/m_openvino_toolkit_macos_12_6_2024.3.0.dev20240711_x86_64.tgz
10-
w_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc1/windows/w_openvino_toolkit_windows_2024.3.0.dev20240711_x86_64.zip
8+
l_ov_centos_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/linux/l_openvino_toolkit_centos7_2024.3.0.dev20240719_x86_64.tgz
9+
m_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/macos/m_openvino_toolkit_macos_12_6_2024.3.0.dev20240719_x86_64.tgz
10+
w_ov_link: https://storage.openvinotoolkit.org/repositories/openvino/packages/pre-release/2024.3.0rc2/windows/w_openvino_toolkit_windows_2024.3.0.dev20240719_x86_64.zip
1111
jobs:
1212
ubuntu_genai_python_lib:
1313
# A tokenizers' dependency fails to compile on ubuntu-20 n CenOS7 env.

llm_bench/python/requirements.txt

+9-8
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
--extra-index-url https://download.pytorch.org/whl/cpu
22
numpy
33
--extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
4-
openvino
5-
openvino-tokenizers
6-
openvino_genai
4+
openvino~=2024.3.0
5+
openvino-tokenizers~=2024.3.0
6+
openvino_genai~=2024.3.0
77
auto-gptq>=0.5.1 # for gptq
88
pillow
9-
torch
10-
transformers>=4.40.0
9+
torch<2.5.0
10+
torchvision<0.20.0
11+
transformers>=4.40.0,<4.43.0
1112
diffusers>=0.22.0
12-
#optimum is in dependency list of optimum-intel
13-
git+https://github.com/huggingface/optimum-intel.git@439d61f79cf55d5d0b28334f577b6ac3c5ced28f#egg=optimum-intel
14-
git+https://github.com/openvinotoolkit/nncf.git@develop#egg=nncf
13+
#optimum is in dependency list of optimum-intel
14+
git+https://github.com/huggingface/optimum-intel.git@6388aeb8738b63e28fc594af84df94590e77cb9a#egg=optimum-intel
15+
nncf~=2.12.0
1516
packaging
1617
psutil
1718
timm

samples/python/multinomial_causal_lm/README.md

+8
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This example showcases inference of text-generation Large Language Models (LLMs): `chatglm`, `LLaMA`, `Qwen` and other models with the same signature. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample fearures `ov::genai::LLMPipeline` and configures it to run random sampling algorithm. There is also a Jupyter [notebook](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot) which provides an example of LLM-powered Chatbot in Python.
44

5+
This sample also contains example implementation of an iterable streamer with bufferisation.
6+
57
## Download and convert the model and tokenizers
68

79
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.
@@ -22,6 +24,12 @@ Discrete GPUs (dGPUs) usually provide better performance compared to CPUs. It is
2224

2325
See https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md#supported-models for the list of supported models.
2426

27+
## Streaming
28+
29+
This Python example demonstrates custom detokenization with bufferization. The streamer receives integer tokens corresponding to each word or subword, one by one. If tokens are decoded individually, the resulting text misses necessary spaces because of detokenize(tokenize(" a")) == "a".
30+
31+
To address this, the detokenizer needs a larger context. We accumulate tokens in a tokens_cache buffer and decode multiple tokens together, adding the text to the streaming queue only when a complete decoded chunk is ready. We run a separate thread to print all new elements arriving in this queue from the generation pipeline. Each generated chunk of text is put into a synchronized queue, ensuring that all put and get operations are thread-safe and blocked until they can proceed.
32+
2533
### Troubleshooting
2634

2735
#### Unicode characters encoding error on Windows

samples/python/multinomial_causal_lm/multinomial_causal_lm.py

+125-8
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,120 @@
44

55
import argparse
66
import openvino_genai
7+
import queue
8+
import threading
79

810

9-
def streamer(subword):
10-
print(subword, end='', flush=True)
11-
return False
11+
class IterableStreamer(openvino_genai.StreamerBase):
12+
"""
13+
A custom streamer class for handling token streaming and detokenization with buffering.
14+
15+
Attributes:
16+
tokenizer (Tokenizer): The tokenizer used for encoding and decoding tokens.
17+
tokens_cache (list): A buffer to accumulate tokens for detokenization.
18+
text_queue (Queue): A synchronized queue for storing decoded text chunks.
19+
print_len (int): The length of the printed text to manage incremental decoding.
20+
"""
21+
22+
def __init__(self, tokenizer):
23+
"""
24+
Initializes the IterableStreamer with the given tokenizer.
25+
26+
Args:
27+
tokenizer (Tokenizer): The tokenizer to use for encoding and decoding tokens.
28+
"""
29+
super().__init__()
30+
self.tokenizer = tokenizer
31+
self.tokens_cache = []
32+
self.text_queue = queue.Queue()
33+
self.print_len = 0
34+
35+
def __iter__(self):
36+
"""
37+
Returns the iterator object itself.
38+
"""
39+
return self
40+
41+
def __next__(self):
42+
"""
43+
Returns the next value from the text queue.
44+
45+
Returns:
46+
str: The next decoded text chunk.
47+
48+
Raises:
49+
StopIteration: If there are no more elements in the queue.
50+
"""
51+
value = self.text_queue.get() # get() will be blocked until a token is available.
52+
if value is None:
53+
raise StopIteration
54+
return value
55+
56+
def get_stop_flag(self):
57+
"""
58+
Checks whether the generation process should be stopped.
59+
60+
Returns:
61+
bool: Always returns False in this implementation.
62+
"""
63+
return False
64+
65+
def put_word(self, word: str):
66+
"""
67+
Puts a word into the text queue.
68+
69+
Args:
70+
word (str): The word to put into the queue.
71+
"""
72+
self.text_queue.put(word)
73+
74+
def put(self, token_id: int) -> bool:
75+
"""
76+
Processes a token and manages the decoding buffer. Adds decoded text to the queue.
77+
78+
Args:
79+
token_id (int): The token_id to process.
80+
81+
Returns:
82+
bool: True if generation should be stopped, False otherwise.
83+
"""
84+
self.tokens_cache.append(token_id)
85+
text = self.tokenizer.decode(self.tokens_cache)
86+
87+
word = ''
88+
if len(text) > self.print_len and '\n' == text[-1]:
89+
# Flush the cache after the new line symbol.
90+
word = text[self.print_len:]
91+
self.tokens_cache = []
92+
self.print_len = 0
93+
elif len(text) >= 3 and text[-3:] == chr(65533):
94+
# Don't print incomplete text.
95+
pass
96+
elif len(text) > self.print_len:
97+
# It is possible to have a shorter text after adding new token.
98+
# Print to output only if text lengh is increaesed.
99+
word = text[self.print_len:]
100+
self.print_len = len(text)
101+
self.put_word(word)
102+
103+
if self.get_stop_flag():
104+
# When generation is stopped from streamer then end is not called, need to call it here manually.
105+
self.end()
106+
return True # True means stop generation
107+
else:
108+
return False # False means continue generation
109+
110+
def end(self):
111+
"""
112+
Flushes residual tokens from the buffer and puts a None value in the queue to signal the end.
113+
"""
114+
text = self.tokenizer.decode(self.tokens_cache)
115+
if len(text) > self.print_len:
116+
word = text[self.print_len:]
117+
self.put_word(word)
118+
self.tokens_cache = []
119+
self.print_len = 0
120+
self.put_word(None)
12121

13122

14123
def main():
@@ -19,17 +128,25 @@ def main():
19128

20129
device = 'CPU' # GPU can be used as well
21130
pipe = openvino_genai.LLMPipeline(args.model_dir, device)
22-
131+
132+
text_print_streamer = IterableStreamer(pipe.get_tokenizer())
133+
def token_printer():
134+
# Getting next elements from iterable will be blocked until a new token is available.
135+
for word in text_print_streamer:
136+
print(word, end='', flush=True)
137+
printer_thread = threading.Thread(target=token_printer, daemon=True)
138+
printer_thread.start()
139+
23140
config = openvino_genai.GenerationConfig()
24141
config.max_new_tokens = 100
25142
config.do_sample = True
26143
config.top_p = 0.9
27144
config.top_k = 30
28145

29-
# Since the streamer is set, the results will
30-
# be printed each time a new token is generated.
31-
pipe.generate(args.prompt, config, streamer)
32-
146+
# Since the streamer is set, the results will be printed
147+
# every time a new token is generated and put into the streamer queue.
148+
pipe.generate(args.prompt, config, text_print_streamer)
149+
printer_thread.join()
33150

34151
if '__main__' == __name__:
35152
main()

src/README.md

+60-23
Original file line numberDiff line numberDiff line change
@@ -5,27 +5,41 @@ It hides the complexity of the generation process and minimizes the amount of co
55

66
## Install OpenVINO™ GenAI
77

8+
> **NOTE**: Please make sure that you are following the versions compatibility rules, refer to the [OpenVINO™ GenAI Dependencies](#openvino-genai-dependencies) for more information.
9+
810
The OpenVINO™ GenAI flavor is available for installation via Archive and PyPI distributions.
911
To install OpenVINO™ GenAI, refer to the [Install Guide](https://docs.openvino.ai/2024/get-started/install-openvino.html).
1012

11-
To build OpenVINO™ GenAI library from source, refer to the [Build Instructions](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/2/src/docs/BUILD.md).
13+
To build OpenVINO™ GenAI library from source, refer to the [Build Instructions](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/src/docs/BUILD.md).
14+
15+
### OpenVINO™ GenAI Dependencies
16+
17+
OpenVINO™ GenAI depends on [OpenVINO](https://github.com/openvinotoolkit/openvino) and [OpenVINO Tokenizers](https://github.com/openvinotoolkit/openvino_tokenizers).
18+
19+
When installing OpenVINO™ GenAI from PyPi, the same versions of OpenVINO and OpenVINO Tokenizers are used (e.g. `openvino==2024.3.0` and `openvino-tokenizers==2024.3.0.0` are installed for `openvino-genai==2024.3.0`).
20+
If you update one of the dependency packages (e.g. install `openvino-nightly`), versions might be incompatible due to different ABI and running OpenVINO GenAI can result in errors (e.g. `ImportError: libopenvino.so.2440: cannot open shared object file: No such file or directory`).
21+
Having packages version in format `<MAJOR>.<MINOR>.<PATCH>.<REVISION>`, only `<REVISION>` part of the full version can be varied to ensure ABI compatibility, while changing `<MAJOR>`, `<MINOR>` or `<PATCH>` parts of the version might break ABI.
22+
23+
GenAI, Tokenizers, and OpenVINO wheels for Linux on PyPI are compiled with `_GLIBCXX_USE_CXX11_ABI=0` to cover a wider range of platforms. In contrast, C++ archive distributions for Ubuntu are compiled with `_GLIBCXX_USE_CXX11_ABI=1`. It is not possible to mix different Application Binary Interfaces (ABIs) because doing so results in a link error. This incompatibility prevents the use of, for example, OpenVINO from C++ archive distributions alongside GenAI from PyPI.
24+
25+
If you want to try OpenVINO GenAI with different dependencies versions (**not** prebuilt packages as archives or python wheels), build OpenVINO GenAI library from source.
1226

1327
## Usage
1428

1529
### Prerequisites
1630

1731
1. Installed OpenVINO™ GenAI
1832

19-
> If OpenVINO GenAI is installed via archive distribution or built from source, you will need to install additional python dependencies (e.g. `optimum-cli` for simplified model downloading and exporting, it's not required to install [./samples/requirements.txt](./samples/requirements.txt) for deployment if the model has already been exported):
20-
>
21-
> ```sh
22-
> # (Optional) Clone OpenVINO GenAI repository if it does not exist
23-
> git clone --recursive https://github.com/openvinotoolkit/openvino.genai.git
24-
> cd openvino.genai
25-
> # Install python dependencies
26-
> python -m pip install ./thirdparty/openvino_tokenizers/[transformers] --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
27-
> python -m pip install --upgrade-strategy eager -r ./samples/requirements.txt
28-
> ```
33+
> To use OpenVINO GenAI with models that are already in OpenVINO format, no additional python dependencies are needed. To
34+
> convert models with optimum-cli and to run the examples, install the dependencies in [./samples/requirements.txt](./samples/requirements.txt):
35+
```sh
36+
# (Optional) Clone OpenVINO GenAI repository if it does not exist
37+
git clone --recursive https://github.com/openvinotoolkit/openvino.genai.git
38+
cd openvino.genai
39+
# Install python dependencies
40+
python -m pip install ./thirdparty/openvino_tokenizers/[transformers] --pre --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
41+
python -m pip install --upgrade-strategy eager -r ./samples/requirements.txt
42+
```
2943

3044
2. A model in OpenVINO IR format
3145

@@ -164,6 +178,8 @@ int main(int argc, char* argv[]) {
164178
```
165179

166180
Streaming with a custom class:
181+
182+
C++ template for a stremer.
167183
```cpp
168184
#include "openvino/genai/streamer_base.hpp"
169185
#include "openvino/genai/llm_pipeline.hpp"
@@ -172,18 +188,14 @@ Streaming with a custom class:
172188
class CustomStreamer: public ov::genai::StreamerBase {
173189
public:
174190
bool put(int64_t token) {
175-
bool stop_flag = false;
176-
/*
177-
custom decoding/tokens processing code
178-
tokens_cache.push_back(token);
179-
std::string text = m_tokenizer.decode(tokens_cache);
180-
...
181-
*/
182-
return stop_flag; // flag whether generation should be stoped, if true generation stops.
191+
// Custom decoding/tokens processing logic.
192+
193+
// Returns a flag whether generation should be stoped, if true generation stops.
194+
return false;
183195
};
184196

185197
void end() {
186-
/* custom finalization */
198+
// Custom finalization logic.
187199
};
188200
};
189201

@@ -192,10 +204,35 @@ int main(int argc, char* argv[]) {
192204

193205
std::string model_path = argv[1];
194206
ov::genai::LLMPipeline pipe(model_path, "CPU");
195-
std::cout << pipe.generate("The Sun is yellow because", ov::genai::streamer(custom_streamer), ov::genai::max_new_tokens(200));
207+
std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(15), ov::genai::streamer(custom_streamer));
196208
}
197209
```
198210
211+
Python template for a streamer.
212+
```py
213+
import openvino_genai as ov_genai
214+
215+
class CustomStreamer(ov_genai.StreamerBase):
216+
def __init__(self):
217+
super().__init__()
218+
# Initialization logic.
219+
220+
def put(self, token_id) -> bool:
221+
# Custom decoding/tokens processing logic.
222+
223+
# Returns a flag whether generation should be stoped, if true generation stops.
224+
return False
225+
226+
def end(self):
227+
# Custom finalization logic.
228+
229+
pipe = ov_genai.LLMPipeline(model_path, "CPU")
230+
custom_streamer = CustomStreamer()
231+
232+
pipe.generate("The Sun is yellow because", max_new_tokens=15, streamer=custom_streamer)
233+
```
234+
For fully implemented iterable CustomStreamer please refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/samples/python/multinomial_causal_lm/README.md) sample.
235+
199236
### Performance Metrics
200237

201238
`openvino_genai.PerfMetrics` (referred as `PerfMetrics` for simplicity) is a structure that holds performance metrics for each generate call. `PerfMetrics` holds fields with mean and standard deviations for the following metrics:
@@ -289,8 +326,8 @@ For more examples of how metrics are used, please refer to the Python [benchmark
289326

290327
## How It Works
291328

292-
For information on how OpenVINO™ GenAI works, refer to the [How It Works Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/2/src/docs/HOW_IT_WORKS.md).
329+
For information on how OpenVINO™ GenAI works, refer to the [How It Works Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/src/docs/HOW_IT_WORKS.md).
293330

294331
## Supported Models
295332

296-
For a list of supported models, refer to the [Supported Models Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/2/src/docs/SUPPORTED_MODELS.md).
333+
For a list of supported models, refer to the [Supported Models Section](https://github.com/openvinotoolkit/openvino.genai/tree/releases/2024/3/src/docs/SUPPORTED_MODELS.md).

0 commit comments

Comments
 (0)