Skip to content

Commit 5018f73

Browse files
Readme draft (openvinotoolkit#930)
Updates to describe: - latest changes in library with focus on GenAI api - expose more functionality that we have (continuous batching, speculative decoding, etc.) - provide lightweight samples as a beginning --------- Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
1 parent 6d2763a commit 5018f73

File tree

3 files changed

+302
-45
lines changed

3 files changed

+302
-45
lines changed

.gitattributes

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
###############################################################################
2+
# Set default behavior to automatically normalize line endings.
3+
###############################################################################
4+
* text=auto
5+
###############################################################################
6+
# Set default behavior for command prompt diff.
7+
#
8+
# This is need for earlier builds of msysgit that does not have it on by
9+
# default for csharp files.
10+
# Note: This is only used by command line
11+
###############################################################################
12+
#*.cs diff=csharp
13+
*.py text eol=lf
14+
###############################################################################
15+
# Set the merge driver for project and solution files
16+
#
17+
# Merging from the command prompt will add diff markers to the files if there
18+
# are conflicts (Merging from VS is not affected by the settings below, in VS
19+
# the diff markers are never inserted). Diff markers may cause the following
20+
# file extensions to fail to load in VS. An alternative would be to treat
21+
# these files as binary and thus will always conflict and require user
22+
# intervention with every merge. To do so, just uncomment the entries below
23+
###############################################################################
24+
#*.sln merge=binary
25+
#*.csproj merge=binary
26+
#*.vbproj merge=binary
27+
#*.vcxproj merge=binary
28+
#*.vcproj merge=binary
29+
#*.dbproj merge=binary
30+
#*.fsproj merge=binary
31+
#*.lsproj merge=binary
32+
#*.wixproj merge=binary
33+
#*.modelproj merge=binary
34+
#*.sqlproj merge=binary
35+
#*.wwaproj merge=binary
36+
###############################################################################
37+
# behavior for image files
38+
#
39+
# image files are treated as binary by default.
40+
###############################################################################
41+
#*.jpg binary
42+
#*.png binary
43+
#*.gif binary
44+
###############################################################################
45+
# diff behavior for common document formats
46+
#
47+
# Convert binary document formats to text before diffing them. This feature
48+
# is only available from the command line. Turn it on by uncommenting the
49+
# entries below.
50+
###############################################################################
51+
#*.doc diff=astextplain
52+
#*.DOC diff=astextplain
53+
#*.docx diff=astextplain
54+
#*.DOCX diff=astextplain
55+
#*.dot diff=astextplain
56+
#*.DOT diff=astextplain
57+
#*.pdf diff=astextplain
58+
#*.PDF diff=astextplain
59+
#*.rtf diff=astextplain
60+
#*.RTF diff=astextplain
61+
*.PNG filter=lfs diff=lfs merge=lfs -text
62+
*.png filter=lfs diff=lfs merge=lfs -text
63+
*.jpg filter=lfs diff=lfs merge=lfs -text
64+
*.gif filter=lfs diff=lfs merge=lfs -text
65+
*.vsdx filter=lfs diff=lfs merge=lfs -text
66+
*.bmp filter=lfs diff=lfs merge=lfs -text
67+
*.svg filter=lfs diff=lfs merge=lfs -text

README.md

+235-45
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,240 @@
11
# OpenVINO™ GenAI
22

3-
The OpenVINO™ GenAI repository consists of the GenAI library and additional GenAI samples.
4-
5-
## OpenVINO™ GenAI Library
6-
7-
OpenVINO™ GenAI is a flavor of OpenVINO, aiming to simplify running inference of generative AI models.
8-
It hides the complexity of the generation process and minimizes the amount of code required.
9-
10-
For installation and usage instructions, refer to the [GenAI Library README](./src/README.md).
11-
12-
## OpenVINO™ GenAI Samples
13-
14-
The OpenVINO™ GenAI repository contains pipelines that implement image and text generation tasks.
15-
The implementation uses OpenVINO capabilities to optimize the pipelines. Each sample covers
16-
a family of models and suggests certain modifications to adapt the code to specific needs.
17-
It includes the following pipelines:
18-
19-
1. [Benchmarking script for large language models](./llm_bench/python/README.md)
20-
2. Text generation samples that support most popular models like LLaMA 2:
21-
- Python:
22-
1. [beam_search_causal_lm](./samples/python/beam_search_causal_lm/README.md)
23-
1. [benchmark_genai](./samples/python/benchmark_genai/README.md)
24-
2. [chat_sample](./samples/python/chat_sample/README.md)
25-
3. [greedy_causal_lm](./samples/python/greedy_causal_lm/README.md)
26-
4. [multinomial_causal_lm](./samples/python/multinomial_causal_lm/README.md)
27-
- C++:
28-
1. [beam_search_causal_lm](./samples/cpp/beam_search_causal_lm/README.md)
29-
1. [benchmark_genai](./samples/cpp/benchmark_genai/README.md)
30-
2. [chat_sample](./samples/cpp/chat_sample/README.md)
31-
3. [continuous_batching_accuracy](./samples/cpp/continuous_batching_accuracy)
32-
4. [continuous_batching_benchmark](./samples/cpp/continuous_batching_benchmark)
33-
5. [greedy_causal_lm](./samples/cpp/greedy_causal_lm/README.md)
34-
6. [multinomial_causal_lm](./samples/cpp/multinomial_causal_lm/README.md)
35-
7. [prompt_lookup_decoding_lm](./samples/cpp/prompt_lookup_decoding_lm/README.md)
36-
8. [speculative_decoding_lm](./samples/cpp/speculative_decoding_lm/README.md)
37-
3. [Stable Diffuison and Latent Consistency Model (with LoRA) C++ image generation pipeline](./samples/cpp/text2image/README.md)
38-
39-
### Requirements
40-
41-
Requirements may vary for different samples. See respective readme files for more details,
42-
and make sure to install the OpenVINO version listed there. Refer to documentation to see
43-
[how to install OpenVINO](https://docs.openvino.ai/install).
44-
45-
The supported devices are CPU and GPU including Intel discrete GPU.
46-
47-
See also: https://docs.openvino.ai/2023.3/gen_ai_guide.html.
3+
OpenVINO™ GenAI is a library of most popular Generative AI model pipelines, optimized execution methods and samples that runs on top of highly performant [OpenVINO Runtime](https://github.com/openvinotoolkit/openvino).
4+
5+
Library is friendly to PC and laptop execution, optimized for resource consumption and requires no external dependencies to run generative models and includes all required functionality (e.g. tokenization via openvino-tokenizers).
6+
7+
![Text generation using LLaMa 3.2 model running on Intel ARC770 dGPU](./samples/generation.gif)
8+
9+
## Supported Generative AI scenarios
10+
11+
OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run following Generative Scenarios:
12+
- Text generation using Large Language Models. For example, chat with local LLaMa model
13+
- Image generation using Diffuser models, for example generation using Stable Diffusion models
14+
- Speech recognition using Whisper family models
15+
- Text generation using Large Visual Models, for instance Image analysis using LLaVa or miniCPM models family
16+
17+
Library efficiently supports LoRA adapters for Text and Image generation scenarios:
18+
- Load multiple adapters per model
19+
- Select active adapters for every generation
20+
- Mix multiple adapters with coefficients via alpha blending
21+
22+
All scenarios are run on top of OpenVINO Runtime that supports inference on CPU, GPU and NPU. See [here](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html) for platform support matrix.
23+
24+
## Supported Generative AI optimization methods
25+
26+
OpenVINO™ GenAI library provides transparent way to use state of the art generation optimizations:
27+
- Speculative decoding that employs two models of different size and uses large model to periodically correct results of small model. See [here](https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/) for more detailed overview
28+
- KVCache token eviction algorithm that reduces size of the KVCache by pruning less impacting tokens.
29+
30+
Additionally, OpenVINO™ GenAI library implements continuous batching approach to use OpenVINO within LLM serving. Continuous batching library could be used in LLM serving frameworks and supports following features:
31+
- Prefix caching that caches fragments of previous generation requests and corresponding KVCache entries internally and uses them in case of repeated query. See [here](https://google.com) for more detailed overview
32+
33+
Continuous batching functionality is used within OpenVINO Model Server (OVMS) to serve LLMs, see [here](https://docs.openvino.ai/2024/ovms_docs_llm_reference.html) for more details.
34+
35+
## Installing OpenVINO GenAI
36+
37+
```sh
38+
# Installing OpenVINO GenAI via pip
39+
pip install openvino-genai
40+
41+
# Install optimum-intel to be able to download, convert and optimize LLMs from Hugging Face
42+
# Optimum is not required to run models, only to convert and compress
43+
pip install optimum[openvino]
44+
45+
# (Optional) Install (TBD) to be able to download models from Model Scope
46+
#pip install optimum[openvino]
47+
```
48+
49+
## Performing text generation
50+
<details>
51+
For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
52+
53+
### Converting and compressing text generation model from Hugging Face library
54+
55+
```sh
56+
#(Basic) download and convert to OpenVINO TinyLlama-Chat-v1.0 model
57+
optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format fp16 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"
58+
59+
#(Recommended) download, convert to OpenVINO and compress to int4 TinyLlama-Chat-v1.0 model
60+
optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"
61+
```
62+
63+
### Run generation using LLMPipeline API in Python
64+
65+
```python
66+
import openvino_genai as ov_genai
67+
#Will run model on CPU, GPU or NPU are possible options
68+
pipe = ov_genai.LLMPipeline("./TinyLlama-1.1B-Chat-v1.0/", "CPU")
69+
print(pipe.generate("The Sun is yellow because", max_new_tokens=100))
70+
```
71+
72+
### Run generation using LLM Pipeline in C++
73+
74+
Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details)
75+
76+
```cpp
77+
#include "openvino/genai/llm_pipeline.hpp"
78+
#include <iostream>
79+
80+
int main(int argc, char* argv[]) {
81+
std::string model_path = argv[1];
82+
ov::genai::LLMPipeline pipe(model_path, "CPU");
83+
std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(100));
84+
}
85+
```
86+
87+
### Sample notebooks using this API
88+
89+
(TBD)
90+
91+
</details>
92+
93+
## Performing image generation
94+
95+
<details>
96+
For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
97+
98+
### Converting and compressing image generation model from Hugging Face library
99+
100+
```sh
101+
#Download and convert to OpenVINO dreamlike-anime-1.0 model
102+
optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 dreamlike_anime_1_0_ov/FP16
103+
```
104+
105+
### Run generation using Text2Image API in Python
106+
107+
```python
108+
109+
#WIP
110+
111+
```
112+
113+
### Run generation using Text2Image API in C++
114+
115+
Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details)
116+
117+
```cpp
118+
#include "openvino/genai/text2image/pipeline.hpp"
119+
#include "imwrite.hpp"
120+
int main(int argc, char* argv[]) {
121+
122+
const std::string models_path = argv[1], prompt = argv[2];
123+
const std::string device = "CPU"; // GPU, NPU can be used as well
124+
125+
ov::genai::Text2ImagePipeline pipe(models_path, device);
126+
ov::Tensor image = pipe.generate(prompt,
127+
ov::genai::width(512),
128+
ov::genai::height(512),
129+
ov::genai::num_inference_steps(20));
130+
131+
imwrite("image.bmp", image, true);
132+
}
133+
```
134+
### Sample notebooks using this API
135+
136+
(TBD)
137+
138+
</details>
139+
140+
## Speech to text processing using Whisper Pipeline
141+
<details>
142+
For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
143+
144+
NOTE: Whisper Pipeline requires preprocessing of audio input (to adjust sampling rate and normalize)
145+
146+
### Converting and compressing image generation model from Hugging Face library
147+
```sh
148+
#Download and convert to OpenVINO whisper-base model
149+
optimum-cli export openvino --trust-remote-code --model openai/whisper-base whisper-base
150+
```
151+
152+
### Run generation using Whisper Pipeline API in Python
153+
154+
NOTE: this sample is simplified version of full sample that is available [here](./samples/python/whisper_speech_recognition/whisper_speech_recognition.py)
155+
156+
```python
157+
import argparse
158+
import openvino_genai
159+
import librosa
160+
161+
def read_wav(filepath):
162+
raw_speech, samplerate = librosa.load(filepath, sr=16000)
163+
return raw_speech.tolist()
164+
165+
def main():
166+
parser = argparse.ArgumentParser()
167+
parser.add_argument("model_dir")
168+
parser.add_argument("wav_file_path")
169+
args = parser.parse_args()
170+
171+
raw_speech = read_wav(args.wav_file_path)
172+
173+
pipe = openvino_genai.WhisperPipeline(args.model_dir)
174+
175+
def streamer(word: str) -> bool:
176+
print(word, end="")
177+
return False
178+
179+
pipe.generate(
180+
raw_speech,
181+
max_new_tokens=100,
182+
# 'task' and 'language' parameters are supported for multilingual models only
183+
language="<|en|>",
184+
task="transcribe",
185+
streamer=streamer,
186+
)
187+
188+
print()
189+
```
190+
191+
192+
### Run generation using Whisper Pipeline API in C++
193+
194+
NOTE: this sample is simplified version of full sample that is available [here](./samples/cpp/whisper_speech_recognition/whisper_speech_recognition.cpp)
195+
196+
```cpp
197+
#include "audio_utils.hpp"
198+
#include "openvino/genai/whisper_pipeline.hpp"
199+
200+
int main(int argc, char* argv[]) try {
201+
202+
std::string model_path = argv[1];
203+
std::string wav_file_path = argv[2];
204+
205+
ov::genai::RawSpeechInput raw_speech = utils::audio::read_wav(wav_file_path);
206+
207+
ov::genai::WhisperPipeline pipeline{model_path};
208+
209+
ov::genai::WhisperGenerationConfig config{model_path + "/generation_config.json"};
210+
config.max_new_tokens = 100;
211+
// 'task' and 'language' parameters are supported for multilingual models only
212+
config.language = "<|en|>";
213+
config.task = "transcribe";
214+
215+
auto streamer = [](std::string word) {
216+
std::cout << word;
217+
return false;
218+
};
219+
220+
pipeline.generate(raw_speech, config, streamer);
221+
222+
std::cout << std::endl;
223+
}
224+
```
225+
226+
### Sample notebooks using this API
227+
228+
(TBD)
229+
230+
</details>
231+
232+
233+
## Additional materials
234+
235+
- [List of supported models](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md) (NOTE: models can work, but were not tried yet)
236+
- [OpenVINO LLM inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
237+
- [Optimum-intel and OpenVINO](https://huggingface.co/docs/optimum/intel/openvino/export)
48238
49239
## License
50240

samples/generation.gif

5.29 MB
Loading

0 commit comments

Comments
 (0)