|
1 | 1 | # OpenVINO™ GenAI
|
2 | 2 |
|
3 |
| -The OpenVINO™ GenAI repository consists of the GenAI library and additional GenAI samples. |
4 |
| - |
5 |
| -## OpenVINO™ GenAI Library |
6 |
| - |
7 |
| -OpenVINO™ GenAI is a flavor of OpenVINO, aiming to simplify running inference of generative AI models. |
8 |
| -It hides the complexity of the generation process and minimizes the amount of code required. |
9 |
| - |
10 |
| -For installation and usage instructions, refer to the [GenAI Library README](./src/README.md). |
11 |
| - |
12 |
| -## OpenVINO™ GenAI Samples |
13 |
| - |
14 |
| -The OpenVINO™ GenAI repository contains pipelines that implement image and text generation tasks. |
15 |
| -The implementation uses OpenVINO capabilities to optimize the pipelines. Each sample covers |
16 |
| -a family of models and suggests certain modifications to adapt the code to specific needs. |
17 |
| -It includes the following pipelines: |
18 |
| - |
19 |
| -1. [Benchmarking script for large language models](./llm_bench/python/README.md) |
20 |
| -2. Text generation samples that support most popular models like LLaMA 2: |
21 |
| - - Python: |
22 |
| - 1. [beam_search_causal_lm](./samples/python/beam_search_causal_lm/README.md) |
23 |
| - 1. [benchmark_genai](./samples/python/benchmark_genai/README.md) |
24 |
| - 2. [chat_sample](./samples/python/chat_sample/README.md) |
25 |
| - 3. [greedy_causal_lm](./samples/python/greedy_causal_lm/README.md) |
26 |
| - 4. [multinomial_causal_lm](./samples/python/multinomial_causal_lm/README.md) |
27 |
| - - C++: |
28 |
| - 1. [beam_search_causal_lm](./samples/cpp/beam_search_causal_lm/README.md) |
29 |
| - 1. [benchmark_genai](./samples/cpp/benchmark_genai/README.md) |
30 |
| - 2. [chat_sample](./samples/cpp/chat_sample/README.md) |
31 |
| - 3. [continuous_batching_accuracy](./samples/cpp/continuous_batching_accuracy) |
32 |
| - 4. [continuous_batching_benchmark](./samples/cpp/continuous_batching_benchmark) |
33 |
| - 5. [greedy_causal_lm](./samples/cpp/greedy_causal_lm/README.md) |
34 |
| - 6. [multinomial_causal_lm](./samples/cpp/multinomial_causal_lm/README.md) |
35 |
| - 7. [prompt_lookup_decoding_lm](./samples/cpp/prompt_lookup_decoding_lm/README.md) |
36 |
| - 8. [speculative_decoding_lm](./samples/cpp/speculative_decoding_lm/README.md) |
37 |
| -3. [Stable Diffuison and Latent Consistency Model (with LoRA) C++ image generation pipeline](./samples/cpp/text2image/README.md) |
38 |
| - |
39 |
| -### Requirements |
40 |
| - |
41 |
| -Requirements may vary for different samples. See respective readme files for more details, |
42 |
| -and make sure to install the OpenVINO version listed there. Refer to documentation to see |
43 |
| -[how to install OpenVINO](https://docs.openvino.ai/install). |
44 |
| - |
45 |
| -The supported devices are CPU and GPU including Intel discrete GPU. |
46 |
| - |
47 |
| -See also: https://docs.openvino.ai/2023.3/gen_ai_guide.html. |
| 3 | +OpenVINO™ GenAI is a library of most popular Generative AI model pipelines, optimized execution methods and samples that runs on top of highly performant [OpenVINO Runtime](https://github.com/openvinotoolkit/openvino). |
| 4 | + |
| 5 | +Library is friendly to PC and laptop execution, optimized for resource consumption and requires no external dependencies to run generative models and includes all required functionality (e.g. tokenization via openvino-tokenizers). |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | +## Supported Generative AI scenarios |
| 10 | + |
| 11 | +OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run following Generative Scenarios: |
| 12 | + - Text generation using Large Language Models. For example, chat with local LLaMa model |
| 13 | + - Image generation using Diffuser models, for example generation using Stable Diffusion models |
| 14 | + - Speech recognition using Whisper family models |
| 15 | + - Text generation using Large Visual Models, for instance Image analysis using LLaVa or miniCPM models family |
| 16 | + |
| 17 | +Library efficiently supports LoRA adapters for Text and Image generation scenarios: |
| 18 | +- Load multiple adapters per model |
| 19 | +- Select active adapters for every generation |
| 20 | +- Mix multiple adapters with coefficients via alpha blending |
| 21 | + |
| 22 | +All scenarios are run on top of OpenVINO Runtime that supports inference on CPU, GPU and NPU. See [here](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html) for platform support matrix. |
| 23 | + |
| 24 | +## Supported Generative AI optimization methods |
| 25 | + |
| 26 | +OpenVINO™ GenAI library provides transparent way to use state of the art generation optimizations: |
| 27 | +- Speculative decoding that employs two models of different size and uses large model to periodically correct results of small model. See [here](https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/) for more detailed overview |
| 28 | +- KVCache token eviction algorithm that reduces size of the KVCache by pruning less impacting tokens. |
| 29 | + |
| 30 | +Additionally, OpenVINO™ GenAI library implements continuous batching approach to use OpenVINO within LLM serving. Continuous batching library could be used in LLM serving frameworks and supports following features: |
| 31 | +- Prefix caching that caches fragments of previous generation requests and corresponding KVCache entries internally and uses them in case of repeated query. See [here](https://google.com) for more detailed overview |
| 32 | + |
| 33 | +Continuous batching functionality is used within OpenVINO Model Server (OVMS) to serve LLMs, see [here](https://docs.openvino.ai/2024/ovms_docs_llm_reference.html) for more details. |
| 34 | + |
| 35 | +## Installing OpenVINO GenAI |
| 36 | + |
| 37 | +```sh |
| 38 | + # Installing OpenVINO GenAI via pip |
| 39 | + pip install openvino-genai |
| 40 | + |
| 41 | + # Install optimum-intel to be able to download, convert and optimize LLMs from Hugging Face |
| 42 | + # Optimum is not required to run models, only to convert and compress |
| 43 | + pip install optimum[openvino] |
| 44 | + |
| 45 | + # (Optional) Install (TBD) to be able to download models from Model Scope |
| 46 | + #pip install optimum[openvino] |
| 47 | +``` |
| 48 | + |
| 49 | +## Performing text generation |
| 50 | +<details> |
| 51 | +For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html) |
| 52 | + |
| 53 | +### Converting and compressing text generation model from Hugging Face library |
| 54 | + |
| 55 | +```sh |
| 56 | +#(Basic) download and convert to OpenVINO TinyLlama-Chat-v1.0 model |
| 57 | +optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format fp16 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0" |
| 58 | + |
| 59 | +#(Recommended) download, convert to OpenVINO and compress to int4 TinyLlama-Chat-v1.0 model |
| 60 | +optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0" |
| 61 | +``` |
| 62 | + |
| 63 | +### Run generation using LLMPipeline API in Python |
| 64 | + |
| 65 | +```python |
| 66 | +import openvino_genai as ov_genai |
| 67 | +#Will run model on CPU, GPU or NPU are possible options |
| 68 | +pipe = ov_genai.LLMPipeline("./TinyLlama-1.1B-Chat-v1.0/", "CPU") |
| 69 | +print(pipe.generate("The Sun is yellow because", max_new_tokens=100)) |
| 70 | +``` |
| 71 | + |
| 72 | +### Run generation using LLM Pipeline in C++ |
| 73 | + |
| 74 | +Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details) |
| 75 | + |
| 76 | +```cpp |
| 77 | +#include "openvino/genai/llm_pipeline.hpp" |
| 78 | +#include <iostream> |
| 79 | + |
| 80 | +int main(int argc, char* argv[]) { |
| 81 | + std::string model_path = argv[1]; |
| 82 | + ov::genai::LLMPipeline pipe(model_path, "CPU"); |
| 83 | + std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(100)); |
| 84 | +} |
| 85 | +``` |
| 86 | +
|
| 87 | +### Sample notebooks using this API |
| 88 | +
|
| 89 | +(TBD) |
| 90 | +
|
| 91 | +</details> |
| 92 | +
|
| 93 | +## Performing image generation |
| 94 | +
|
| 95 | +<details> |
| 96 | +For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html) |
| 97 | +
|
| 98 | +### Converting and compressing image generation model from Hugging Face library |
| 99 | +
|
| 100 | +```sh |
| 101 | +#Download and convert to OpenVINO dreamlike-anime-1.0 model |
| 102 | +optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 dreamlike_anime_1_0_ov/FP16 |
| 103 | +``` |
| 104 | + |
| 105 | +### Run generation using Text2Image API in Python |
| 106 | + |
| 107 | +```python |
| 108 | + |
| 109 | +#WIP |
| 110 | + |
| 111 | +``` |
| 112 | + |
| 113 | +### Run generation using Text2Image API in C++ |
| 114 | + |
| 115 | +Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details) |
| 116 | + |
| 117 | +```cpp |
| 118 | +#include "openvino/genai/text2image/pipeline.hpp" |
| 119 | +#include "imwrite.hpp" |
| 120 | +int main(int argc, char* argv[]) { |
| 121 | + |
| 122 | + const std::string models_path = argv[1], prompt = argv[2]; |
| 123 | + const std::string device = "CPU"; // GPU, NPU can be used as well |
| 124 | + |
| 125 | + ov::genai::Text2ImagePipeline pipe(models_path, device); |
| 126 | + ov::Tensor image = pipe.generate(prompt, |
| 127 | + ov::genai::width(512), |
| 128 | + ov::genai::height(512), |
| 129 | + ov::genai::num_inference_steps(20)); |
| 130 | + |
| 131 | + imwrite("image.bmp", image, true); |
| 132 | +} |
| 133 | +``` |
| 134 | +### Sample notebooks using this API |
| 135 | +
|
| 136 | +(TBD) |
| 137 | +
|
| 138 | +</details> |
| 139 | +
|
| 140 | +## Speech to text processing using Whisper Pipeline |
| 141 | +<details> |
| 142 | +For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html) |
| 143 | +
|
| 144 | +NOTE: Whisper Pipeline requires preprocessing of audio input (to adjust sampling rate and normalize) |
| 145 | + |
| 146 | + ### Converting and compressing image generation model from Hugging Face library |
| 147 | +```sh |
| 148 | +#Download and convert to OpenVINO whisper-base model |
| 149 | +optimum-cli export openvino --trust-remote-code --model openai/whisper-base whisper-base |
| 150 | +``` |
| 151 | + |
| 152 | +### Run generation using Whisper Pipeline API in Python |
| 153 | + |
| 154 | +NOTE: this sample is simplified version of full sample that is available [here](./samples/python/whisper_speech_recognition/whisper_speech_recognition.py) |
| 155 | + |
| 156 | +```python |
| 157 | +import argparse |
| 158 | +import openvino_genai |
| 159 | +import librosa |
| 160 | + |
| 161 | +def read_wav(filepath): |
| 162 | + raw_speech, samplerate = librosa.load(filepath, sr=16000) |
| 163 | + return raw_speech.tolist() |
| 164 | + |
| 165 | +def main(): |
| 166 | + parser = argparse.ArgumentParser() |
| 167 | + parser.add_argument("model_dir") |
| 168 | + parser.add_argument("wav_file_path") |
| 169 | + args = parser.parse_args() |
| 170 | + |
| 171 | + raw_speech = read_wav(args.wav_file_path) |
| 172 | + |
| 173 | + pipe = openvino_genai.WhisperPipeline(args.model_dir) |
| 174 | + |
| 175 | + def streamer(word: str) -> bool: |
| 176 | + print(word, end="") |
| 177 | + return False |
| 178 | + |
| 179 | + pipe.generate( |
| 180 | + raw_speech, |
| 181 | + max_new_tokens=100, |
| 182 | + # 'task' and 'language' parameters are supported for multilingual models only |
| 183 | + language="<|en|>", |
| 184 | + task="transcribe", |
| 185 | + streamer=streamer, |
| 186 | + ) |
| 187 | + |
| 188 | + print() |
| 189 | +``` |
| 190 | + |
| 191 | + |
| 192 | +### Run generation using Whisper Pipeline API in C++ |
| 193 | + |
| 194 | +NOTE: this sample is simplified version of full sample that is available [here](./samples/cpp/whisper_speech_recognition/whisper_speech_recognition.cpp) |
| 195 | + |
| 196 | +```cpp |
| 197 | +#include "audio_utils.hpp" |
| 198 | +#include "openvino/genai/whisper_pipeline.hpp" |
| 199 | + |
| 200 | +int main(int argc, char* argv[]) try { |
| 201 | + |
| 202 | + std::string model_path = argv[1]; |
| 203 | + std::string wav_file_path = argv[2]; |
| 204 | + |
| 205 | + ov::genai::RawSpeechInput raw_speech = utils::audio::read_wav(wav_file_path); |
| 206 | + |
| 207 | + ov::genai::WhisperPipeline pipeline{model_path}; |
| 208 | + |
| 209 | + ov::genai::WhisperGenerationConfig config{model_path + "/generation_config.json"}; |
| 210 | + config.max_new_tokens = 100; |
| 211 | + // 'task' and 'language' parameters are supported for multilingual models only |
| 212 | + config.language = "<|en|>"; |
| 213 | + config.task = "transcribe"; |
| 214 | + |
| 215 | + auto streamer = [](std::string word) { |
| 216 | + std::cout << word; |
| 217 | + return false; |
| 218 | + }; |
| 219 | + |
| 220 | + pipeline.generate(raw_speech, config, streamer); |
| 221 | + |
| 222 | + std::cout << std::endl; |
| 223 | +} |
| 224 | +``` |
| 225 | +
|
| 226 | + ### Sample notebooks using this API |
| 227 | +
|
| 228 | +(TBD) |
| 229 | +
|
| 230 | +</details> |
| 231 | +
|
| 232 | +
|
| 233 | +## Additional materials |
| 234 | +
|
| 235 | +- [List of supported models](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md) (NOTE: models can work, but were not tried yet) |
| 236 | +- [OpenVINO LLM inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html) |
| 237 | +- [Optimum-intel and OpenVINO](https://huggingface.co/docs/optimum/intel/openvino/export) |
48 | 238 |
|
49 | 239 | ## License
|
50 | 240 |
|
|
0 commit comments