Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Add VLM use case #1907

Merged
merged 38 commits into from
Mar 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6e3d7c5
Add short description of image generation pipelines
yatarkan Mar 7, 2025
09d975c
Add model preparation nav menu, add page for download pre-converted m…
yatarkan Mar 7, 2025
c02e124
Fix position
yatarkan Mar 7, 2025
e17b4c9
Add link to model preparation page in introduction
yatarkan Mar 7, 2025
bb91823
Add optimum cli command component
yatarkan Mar 10, 2025
a1eb33e
Add content for convert to openvino page
yatarkan Mar 10, 2025
dd480fd
Disable separate guides page
yatarkan Mar 10, 2025
89fb5b5
Remove tm
yatarkan Mar 10, 2025
3bb188e
Add link to dedicated model preparation guide page, reuse in llm and …
yatarkan Mar 10, 2025
5602e88
Add code examples for all image generation pipelines
yatarkan Mar 10, 2025
86639f7
Move lora adapters to separate page, add references
yatarkan Mar 10, 2025
fc99109
Fix links
yatarkan Mar 10, 2025
3261fa8
Merge branch 'master' into docs-pages-image-generation
yatarkan Mar 11, 2025
fdc38b1
Update site/docs/guides/lora-adapters.mdx
yatarkan Mar 11, 2025
d14e98f
Reuse use cases note
yatarkan Mar 11, 2025
b6a0b02
Move convert model section to shared
yatarkan Mar 11, 2025
94de0f3
Rename text generation directory
yatarkan Mar 11, 2025
cc836a9
Rename image generation directory
yatarkan Mar 11, 2025
e21d48f
Remove digits from use cases directory names
yatarkan Mar 11, 2025
d2a6fa1
Update use cases links
yatarkan Mar 11, 2025
c115768
Add link to supported llm models
yatarkan Mar 11, 2025
4b2aac0
Add initial page for VLM use case
yatarkan Mar 11, 2025
7714109
Reuse vlm code examples
yatarkan Mar 11, 2025
0e51bbb
Move basic generation configuration to separate file and reuse in llm…
yatarkan Mar 12, 2025
ed1001d
Move chat scenario and streaming to separate page, reuse in LLM and V…
yatarkan Mar 12, 2025
feb6b90
Fix link
yatarkan Mar 12, 2025
360323e
Merge branch 'master' into docs-pages-vlm
yatarkan Mar 18, 2025
c557f7d
Fix VLM run model section
yatarkan Mar 18, 2025
09baa88
Add links to model sources
yatarkan Mar 18, 2025
61899d6
Remove mentions of models
yatarkan Mar 18, 2025
d880d3e
Add links to pipeline classes api reference
yatarkan Mar 18, 2025
f4dafd7
Add example model export commands for use cases
yatarkan Mar 18, 2025
c6b50c1
Add link to KV cache page
yatarkan Mar 18, 2025
fb592eb
Add link to chat samples
yatarkan Mar 18, 2025
a463de0
Fix c++ custom streamer type
yatarkan Mar 18, 2025
6bbe8b9
Add comment
yatarkan Mar 18, 2025
58e6287
Add direct links to samples for use cases
yatarkan Mar 18, 2025
0a8c967
Merge branch 'master' into docs-pages-vlm
yatarkan Mar 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions site/docs/guides/chat-scenario.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
sidebar_position: 2
title: Chat Scenario
---

# Using OpenVINO GenAI in Chat Scenario

For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.

Refer to the [Stateful Models vs Stateless Models](/docs/concepts/stateful-vs-stateless-models) for more information about KV-cache.

:::tip
Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
:::

:::info
Chat mode is supported for both `LLMPipeline` and `VLMPipeline`.
:::

A simple chat example (with grouped beam search decoding):

<LanguageTabs>
<TabItemPython>
```python showLineNumbers
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path, 'CPU')

config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
pipe.set_generation_config(config)

# highlight-next-line
pipe.start_chat()
while True:
try:
prompt = input('question:\n')
except EOFError:
break
answer = pipe.generate(prompt)
print('answer:\n')
print(answer)
print('\n----------\n')
# highlight-next-line
pipe.finish_chat()
```
</TabItemPython>
<TabItemCpp>
```cpp showLineNumbers
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
std::string prompt;

std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "CPU");

ov::genai::GenerationConfig config;
config.max_new_tokens = 100;
config.num_beam_groups = 3;
config.num_beams = 15;
config.diversity_penalty = 1.0f;

// highlight-next-line
pipe.start_chat();
std::cout << "question:\n";
while (std::getline(std::cin, prompt)) {
std::cout << "answer:\n";
auto answer = pipe.generate(prompt, config);
std::cout << answer << std::endl;
std::cout << "\n----------\n"
"question:\n";
}
// highlight-next-line
pipe.finish_chat();
}
```
</TabItemCpp>
</LanguageTabs>

:::info
For more information, refer to the [Python](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/text_generation/chat_sample.py) and [C++](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/cpp/text_generation/chat_sample.cpp) chat samples.
:::
3 changes: 2 additions & 1 deletion site/docs/guides/model-preparation/convert-to-openvino.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ import UseCasesNote from './_use_cases_note.mdx';

# Convert Models to OpenVINO Format

This page explains how to convert various generative AI models from Hugging Face and ModelScope to OpenVINO IR format. Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
This page explains how to convert various generative AI models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) to OpenVINO IR format.
Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.

For downloading pre-converted models, see [Download Pre-Converted OpenVINO Models](./download-openvino-models.mdx).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import UseCasesNote from './_use_cases_note.mdx';
# Download Pre-Converted OpenVINO Models

OpenVINO GenAI allows to run different generative AI models (see [Supported Models](../../supported-models/index.mdx)).
While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models can save time and effort.
While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) can save time and effort.

## Download from Hugging Face

Expand Down
Original file line number Diff line number Diff line change
@@ -1,76 +1,16 @@
### Using GenAI in Chat Scenario
---
sidebar_position: 3
---

For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.

:::tip
Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
:::

A simple chat example (with grouped beam search decoding):

<LanguageTabs>
<TabItemPython>
```python showLineNumbers
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path, 'CPU')

config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
pipe.set_generation_config(config)

# highlight-next-line
pipe.start_chat()
while True:
try:
prompt = input('question:\n')
except EOFError:
break
answer = pipe.generate(prompt)
print('answer:\n')
print(answer)
print('\n----------\n')
# highlight-next-line
pipe.finish_chat()
```
</TabItemPython>
<TabItemCpp>
```cpp showLineNumbers
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
std::string prompt;

std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "CPU");

ov::genai::GenerationConfig config;
config.max_new_tokens = 100;
config.num_beam_groups = 3;
config.num_beams = 15;
config.diversity_penalty = 1.0f;

// highlight-next-line
pipe.start_chat();
std::cout << "question:\n";
while (std::getline(std::cin, prompt)) {
std::cout << "answer:\n";
auto answer = pipe.generate(prompt, config);
std::cout << answer << std::endl;
std::cout << "\n----------\n"
"question:\n";
}
// highlight-next-line
pipe.finish_chat();
}
```
</TabItemCpp>
</LanguageTabs>

#### Streaming the Output
# Streaming the Output

For more interactive UIs during generation, you can stream output tokens.

##### Streaming Function
:::info
Streaming is supported for both `LLMPipeline` and `VLMPipeline`.
:::

## Streaming Function

In this example, a function outputs words to the console immediately upon generation:

Expand Down Expand Up @@ -138,11 +78,7 @@ In this example, a function outputs words to the console immediately upon genera
</TabItemCpp>
</LanguageTabs>

:::info
For more information, refer to the [chat sample](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample/).
:::

##### Custom Streamer Class
## Custom Streamer Class

You can also create your custom streamer for more sophisticated processing:

Expand Down Expand Up @@ -210,7 +146,7 @@ You can also create your custom streamer for more sophisticated processing:
int main(int argc, char* argv[]) {
std::string prompt;
// highlight-next-line
CustomStreamer custom_streamer;
std::shared_ptr<CustomStreamer> custom_streamer;

std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "CPU");
Expand All @@ -232,5 +168,5 @@ You can also create your custom streamer for more sophisticated processing:
</LanguageTabs>

:::info
For fully implemented iterable CustomStreamer refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
For fully implemented iterable `CustomStreamer` refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
:::

This file was deleted.

This file was deleted.

14 changes: 0 additions & 14 deletions site/docs/use-cases/1-LLM-pipeline/index.mdx

This file was deleted.

14 changes: 0 additions & 14 deletions site/docs/use-cases/2-Image-Generation/index.mdx

This file was deleted.

5 changes: 0 additions & 5 deletions site/docs/use-cases/3-Processing-speech-whisper.md

This file was deleted.

Loading
Loading