Readme draft (openvinotoolkit#930)

yury-gorbachev · ilya-lavrenov · web-flow · commit 5018f73e9000 · 2024-10-09T14:37:46.000Z
Updates to describe:
- latest changes in library with focus on GenAI api
- expose more functionality that we have (continuous batching,
speculative decoding, etc.)
- provide lightweight samples as a beginning

---------

Co-authored-by: Ilya Lavrenov &lt;ilya.lavrenov@intel.com&gt;
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,67 @@
+###############################################################################
+# Set default behavior to automatically normalize line endings.
+###############################################################################
+* text=auto
+###############################################################################
+# Set default behavior for command prompt diff.
+#
+# This is need for earlier builds of msysgit that does not have it on by
+# default for csharp files.
+# Note: This is only used by command line
+###############################################################################
+#*.cs     diff=csharp
+*.py text eol=lf
+###############################################################################
+# Set the merge driver for project and solution files
+#
+# Merging from the command prompt will add diff markers to the files if there
+# are conflicts (Merging from VS is not affected by the settings below, in VS
+# the diff markers are never inserted). Diff markers may cause the following 
+# file extensions to fail to load in VS. An alternative would be to treat
+# these files as binary and thus will always conflict and require user
+# intervention with every merge. To do so, just uncomment the entries below
+###############################################################################
+#*.sln       merge=binary
+#*.csproj    merge=binary
+#*.vbproj    merge=binary
+#*.vcxproj   merge=binary
+#*.vcproj    merge=binary
+#*.dbproj    merge=binary
+#*.fsproj    merge=binary
+#*.lsproj    merge=binary
+#*.wixproj   merge=binary
+#*.modelproj merge=binary
+#*.sqlproj   merge=binary
+#*.wwaproj   merge=binary
+###############################################################################
+# behavior for image files
+#
+# image files are treated as binary by default.
+###############################################################################
+#*.jpg   binary
+#*.png   binary
+#*.gif   binary
+###############################################################################
+# diff behavior for common document formats
+# 
+# Convert binary document formats to text before diffing them. This feature
+# is only available from the command line. Turn it on by uncommenting the 
+# entries below.
+###############################################################################
+#*.doc   diff=astextplain
+#*.DOC   diff=astextplain
+#*.docx  diff=astextplain
+#*.DOCX  diff=astextplain
+#*.dot   diff=astextplain
+#*.DOT   diff=astextplain
+#*.pdf   diff=astextplain
+#*.PDF   diff=astextplain
+#*.rtf   diff=astextplain
+#*.RTF   diff=astextplain
+*.PNG filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.gif filter=lfs diff=lfs merge=lfs -text
+*.vsdx filter=lfs diff=lfs merge=lfs -text
+*.bmp filter=lfs diff=lfs merge=lfs -text
+*.svg filter=lfs diff=lfs merge=lfs -text
diff --git a/README.md b/README.md
@@ -1,50 +1,240 @@
 # OpenVINO™ GenAI
 
-The OpenVINO™ GenAI repository consists of the GenAI library and additional GenAI samples.
-
-## OpenVINO™ GenAI Library
-
-OpenVINO™ GenAI is a flavor of OpenVINO, aiming to simplify running inference of generative AI models.
-It hides the complexity of the generation process and minimizes the amount of code required.
-
-For installation and usage instructions, refer to the [GenAI Library README](./src/README.md).
-
-## OpenVINO™ GenAI Samples
-
-The OpenVINO™ GenAI repository contains pipelines that implement image and text generation tasks.
-The implementation uses OpenVINO capabilities to optimize the pipelines. Each sample covers
-a family of models and suggests certain modifications to adapt the code to specific needs.
-It includes the following pipelines:
-
-1. [Benchmarking script for large language models](./llm_bench/python/README.md)
-2. Text generation samples that support most popular models like LLaMA 2:
-   - Python:
-     1. [beam_search_causal_lm](./samples/python/beam_search_causal_lm/README.md)
-     1. [benchmark_genai](./samples/python/benchmark_genai/README.md)
-     2. [chat_sample](./samples/python/chat_sample/README.md)
-     3. [greedy_causal_lm](./samples/python/greedy_causal_lm/README.md)
-     4. [multinomial_causal_lm](./samples/python/multinomial_causal_lm/README.md)
-   - C++:
-     1. [beam_search_causal_lm](./samples/cpp/beam_search_causal_lm/README.md)
-     1. [benchmark_genai](./samples/cpp/benchmark_genai/README.md)
-     2. [chat_sample](./samples/cpp/chat_sample/README.md)
-     3. [continuous_batching_accuracy](./samples/cpp/continuous_batching_accuracy)
-     4. [continuous_batching_benchmark](./samples/cpp/continuous_batching_benchmark)
-     5. [greedy_causal_lm](./samples/cpp/greedy_causal_lm/README.md)
-     6. [multinomial_causal_lm](./samples/cpp/multinomial_causal_lm/README.md)
-     7. [prompt_lookup_decoding_lm](./samples/cpp/prompt_lookup_decoding_lm/README.md)
-     8. [speculative_decoding_lm](./samples/cpp/speculative_decoding_lm/README.md)
-3. [Stable Diffuison and Latent Consistency Model (with LoRA) C++ image generation pipeline](./samples/cpp/text2image/README.md)
-
-### Requirements
-
-Requirements may vary for different samples. See respective readme files for more details,
-and make sure to install the OpenVINO version listed there. Refer to documentation to see
-[how to install OpenVINO](https://docs.openvino.ai/install).
-
-The supported devices are CPU and GPU including Intel discrete GPU.
-
-See also: https://docs.openvino.ai/2023.3/gen_ai_guide.html.
+OpenVINO™ GenAI is a library of most popular Generative AI model pipelines, optimized execution methods and samples that runs on top of highly performant [OpenVINO Runtime](https://github.com/openvinotoolkit/openvino).
+
+Library is friendly to PC and laptop execution, optimized for resource consumption and requires no external dependencies to run generative models and includes all required functionality (e.g. tokenization via openvino-tokenizers).
+
+![Text generation using LLaMa 3.2 model running on Intel ARC770 dGPU](./samples/generation.gif)
+
+## Supported Generative AI scenarios
+
+OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run following Generative Scenarios:
+ - Text generation using Large Language Models. For example, chat with local LLaMa model
+ - Image generation using Diffuser models, for example generation using Stable Diffusion models
+ - Speech recognition using Whisper family models
+ - Text generation using Large Visual Models, for instance Image analysis using LLaVa or miniCPM models family
+
+Library efficiently supports LoRA adapters for Text and Image generation scenarios:
+- Load multiple adapters per model
+- Select active adapters for every generation
+- Mix multiple adapters with coefficients via alpha blending
+
+All scenarios are run on top of OpenVINO Runtime that supports inference on CPU, GPU and NPU. See [here](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html) for platform support matrix.
+
+## Supported Generative AI optimization methods
+
+OpenVINO™ GenAI library provides transparent way to use state of the art generation optimizations:
+- Speculative decoding that employs two models of different size and uses large model to periodically correct results of small model. See [here](https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/) for more detailed overview
+- KVCache token eviction algorithm that reduces size of the KVCache by pruning less impacting tokens.
+
+Additionally, OpenVINO™ GenAI library implements continuous batching approach to use OpenVINO within LLM serving. Continuous batching library could be used in LLM serving frameworks and supports following features:
+- Prefix caching that caches fragments of previous generation requests and corresponding KVCache entries internally and uses them in case of repeated query. See [here](https://google.com) for more detailed overview
+
+Continuous batching functionality is used within OpenVINO Model Server (OVMS) to serve LLMs, see [here](https://docs.openvino.ai/2024/ovms_docs_llm_reference.html) for more details.
+
+## Installing OpenVINO GenAI
+
+```sh
+    # Installing OpenVINO GenAI via pip
+    pip install openvino-genai
+
+    # Install optimum-intel to be able to download, convert and optimize LLMs from Hugging Face
+    # Optimum is not required to run models, only to convert and compress
+    pip install optimum[openvino]
+
+    # (Optional) Install (TBD) to be able to download models from Model Scope
+    #pip install optimum[openvino]
+```
+
+## Performing text generation 
+<details>
+For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
+
+### Converting and compressing text generation model from Hugging Face library
+
+```sh
+#(Basic) download and convert to OpenVINO TinyLlama-Chat-v1.0 model
+optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format fp16 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"
+
+#(Recommended) download, convert to OpenVINO and compress to int4 TinyLlama-Chat-v1.0 model
+optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"
+```
+
+### Run generation using LLMPipeline API in Python
+
+```python
+import openvino_genai as ov_genai
+#Will run model on CPU, GPU or NPU are possible options
+pipe = ov_genai.LLMPipeline("./TinyLlama-1.1B-Chat-v1.0/", "CPU")
+print(pipe.generate("The Sun is yellow because", max_new_tokens=100))
+```
+
+### Run generation using LLM Pipeline in C++
+
+Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details)
+
+```cpp
+#include "openvino/genai/llm_pipeline.hpp"
+#include <iostream>
+
+int main(int argc, char* argv[]) {
+   std::string model_path = argv[1];
+   ov::genai::LLMPipeline pipe(model_path, "CPU");
+   std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(100));
+}
+```
+
+### Sample notebooks using this API
+
+(TBD)
+
+</details>
+
+## Performing image generation
+
+<details>
+For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
+
+### Converting and compressing image generation model from Hugging Face library
+
+```sh
+#Download and convert to OpenVINO dreamlike-anime-1.0 model
+optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 dreamlike_anime_1_0_ov/FP16
+```
+
+### Run generation using Text2Image API in Python
+
+```python
+
+#WIP
+
+```
+
+### Run generation using Text2Image API in C++
+
+Code below requires installation of C++ compatible package (see [here](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html#archive-installation) for more details)
+
+```cpp
+#include "openvino/genai/text2image/pipeline.hpp"
+#include "imwrite.hpp"
+int main(int argc, char* argv[]) {
+
+   const std::string models_path = argv[1], prompt = argv[2];
+   const std::string device = "CPU";  // GPU, NPU can be used as well
+
+   ov::genai::Text2ImagePipeline pipe(models_path, device);
+   ov::Tensor image = pipe.generate(prompt,
+        ov::genai::width(512),
+        ov::genai::height(512),
+        ov::genai::num_inference_steps(20));
+
+   imwrite("image.bmp", image, true);
+}
+```
+### Sample notebooks using this API
+
+(TBD)
+
+</details>
+
+## Speech to text processing using Whisper Pipeline
+<details>
+For more examples check out our [LLM Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
+
+NOTE: Whisper Pipeline requires preprocessing of audio input (to adjust sampling rate and normalize)
+ 
+ ### Converting and compressing image generation model from Hugging Face library
+```sh
+#Download and convert to OpenVINO whisper-base model
+optimum-cli export openvino --trust-remote-code --model openai/whisper-base whisper-base
+```
+
+### Run generation using Whisper Pipeline API in Python
+
+NOTE: this sample is simplified version of full sample that is available [here](./samples/python/whisper_speech_recognition/whisper_speech_recognition.py)
+
+```python
+import argparse
+import openvino_genai
+import librosa
+
+def read_wav(filepath):
+    raw_speech, samplerate = librosa.load(filepath, sr=16000)
+    return raw_speech.tolist()
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("model_dir")
+    parser.add_argument("wav_file_path")
+    args = parser.parse_args()
+
+    raw_speech = read_wav(args.wav_file_path)
+
+    pipe = openvino_genai.WhisperPipeline(args.model_dir)
+
+    def streamer(word: str) -> bool:
+        print(word, end="")
+        return False
+
+    pipe.generate(
+        raw_speech,
+        max_new_tokens=100,
+        # 'task' and 'language' parameters are supported for multilingual models only
+        language="<|en|>",
+        task="transcribe",
+        streamer=streamer,
+    )
+
+    print()
+```
+
+ 
+### Run generation using Whisper Pipeline API in C++
+
+NOTE: this sample is simplified version of full sample that is available [here](./samples/cpp/whisper_speech_recognition/whisper_speech_recognition.cpp)
+
+```cpp
+#include "audio_utils.hpp"
+#include "openvino/genai/whisper_pipeline.hpp"
+
+int main(int argc, char* argv[]) try {
+
+    std::string model_path = argv[1];
+    std::string wav_file_path = argv[2];
+
+    ov::genai::RawSpeechInput raw_speech = utils::audio::read_wav(wav_file_path);
+
+    ov::genai::WhisperPipeline pipeline{model_path};
+
+    ov::genai::WhisperGenerationConfig config{model_path + "/generation_config.json"};
+    config.max_new_tokens = 100;
+    // 'task' and 'language' parameters are supported for multilingual models only
+    config.language = "<|en|>";
+    config.task = "transcribe";
+
+    auto streamer = [](std::string word) {
+        std::cout << word;
+        return false;
+    };
+
+    pipeline.generate(raw_speech, config, streamer);
+
+    std::cout << std::endl;
+}
+```
+
+ ### Sample notebooks using this API
+
+(TBD)
+
+</details>
+
+
+## Additional materials
+
+- [List of supported models](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md) (NOTE: models can work, but were not tried yet)
+- [OpenVINO LLM inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html)
+- [Optimum-intel and OpenVINO](https://huggingface.co/docs/optimum/intel/openvino/export)
 
 ## License
 
diff --git a/samples/generation.gif b/samples/generation.gif