Skip to content

Commit 5d986b7

Browse files
yatarkanWovchena
andauthored
1 parent 831e0f0 commit 5d986b7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+502
-323
lines changed

site/docs/guides/chat-scenario.mdx

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
sidebar_position: 2
3+
title: Chat Scenario
4+
---
5+
6+
# Using OpenVINO GenAI in Chat Scenario
7+
8+
For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.
9+
10+
Refer to the [Stateful Models vs Stateless Models](/docs/concepts/stateful-vs-stateless-models) for more information about KV-cache.
11+
12+
:::tip
13+
Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
14+
:::
15+
16+
:::info
17+
Chat mode is supported for both `LLMPipeline` and `VLMPipeline`.
18+
:::
19+
20+
A simple chat example (with grouped beam search decoding):
21+
22+
<LanguageTabs>
23+
<TabItemPython>
24+
```python showLineNumbers
25+
import openvino_genai as ov_genai
26+
pipe = ov_genai.LLMPipeline(model_path, 'CPU')
27+
28+
config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
29+
pipe.set_generation_config(config)
30+
31+
# highlight-next-line
32+
pipe.start_chat()
33+
while True:
34+
try:
35+
prompt = input('question:\n')
36+
except EOFError:
37+
break
38+
answer = pipe.generate(prompt)
39+
print('answer:\n')
40+
print(answer)
41+
print('\n----------\n')
42+
# highlight-next-line
43+
pipe.finish_chat()
44+
```
45+
</TabItemPython>
46+
<TabItemCpp>
47+
```cpp showLineNumbers
48+
#include "openvino/genai/llm_pipeline.hpp"
49+
#include <iostream>
50+
51+
int main(int argc, char* argv[]) {
52+
std::string prompt;
53+
54+
std::string model_path = argv[1];
55+
ov::genai::LLMPipeline pipe(model_path, "CPU");
56+
57+
ov::genai::GenerationConfig config;
58+
config.max_new_tokens = 100;
59+
config.num_beam_groups = 3;
60+
config.num_beams = 15;
61+
config.diversity_penalty = 1.0f;
62+
63+
// highlight-next-line
64+
pipe.start_chat();
65+
std::cout << "question:\n";
66+
while (std::getline(std::cin, prompt)) {
67+
std::cout << "answer:\n";
68+
auto answer = pipe.generate(prompt, config);
69+
std::cout << answer << std::endl;
70+
std::cout << "\n----------\n"
71+
"question:\n";
72+
}
73+
// highlight-next-line
74+
pipe.finish_chat();
75+
}
76+
```
77+
</TabItemCpp>
78+
</LanguageTabs>
79+
80+
:::info
81+
For more information, refer to the [Python](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/text_generation/chat_sample.py) and [C++](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/cpp/text_generation/chat_sample.cpp) chat samples.
82+
:::

site/docs/guides/model-preparation/convert-to-openvino.mdx

+2-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ import UseCasesNote from './_use_cases_note.mdx';
88

99
# Convert Models to OpenVINO Format
1010

11-
This page explains how to convert various generative AI models from Hugging Face and ModelScope to OpenVINO IR format. Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
11+
This page explains how to convert various generative AI models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) to OpenVINO IR format.
12+
Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
1213

1314
For downloading pre-converted models, see [Download Pre-Converted OpenVINO Models](./download-openvino-models.mdx).
1415

site/docs/guides/model-preparation/download-openvino-models.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import UseCasesNote from './_use_cases_note.mdx';
88
# Download Pre-Converted OpenVINO Models
99

1010
OpenVINO GenAI allows to run different generative AI models (see [Supported Models](../../supported-models/index.mdx)).
11-
While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models can save time and effort.
11+
While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) can save time and effort.
1212

1313
## Download from Hugging Face
1414

+12-76
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,16 @@
1-
### Using GenAI in Chat Scenario
1+
---
2+
sidebar_position: 3
3+
---
24

3-
For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.
4-
5-
:::tip
6-
Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
7-
:::
8-
9-
A simple chat example (with grouped beam search decoding):
10-
11-
<LanguageTabs>
12-
<TabItemPython>
13-
```python showLineNumbers
14-
import openvino_genai as ov_genai
15-
pipe = ov_genai.LLMPipeline(model_path, 'CPU')
16-
17-
config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
18-
pipe.set_generation_config(config)
19-
20-
# highlight-next-line
21-
pipe.start_chat()
22-
while True:
23-
try:
24-
prompt = input('question:\n')
25-
except EOFError:
26-
break
27-
answer = pipe.generate(prompt)
28-
print('answer:\n')
29-
print(answer)
30-
print('\n----------\n')
31-
# highlight-next-line
32-
pipe.finish_chat()
33-
```
34-
</TabItemPython>
35-
<TabItemCpp>
36-
```cpp showLineNumbers
37-
#include "openvino/genai/llm_pipeline.hpp"
38-
#include <iostream>
39-
40-
int main(int argc, char* argv[]) {
41-
std::string prompt;
42-
43-
std::string model_path = argv[1];
44-
ov::genai::LLMPipeline pipe(model_path, "CPU");
45-
46-
ov::genai::GenerationConfig config;
47-
config.max_new_tokens = 100;
48-
config.num_beam_groups = 3;
49-
config.num_beams = 15;
50-
config.diversity_penalty = 1.0f;
51-
52-
// highlight-next-line
53-
pipe.start_chat();
54-
std::cout << "question:\n";
55-
while (std::getline(std::cin, prompt)) {
56-
std::cout << "answer:\n";
57-
auto answer = pipe.generate(prompt, config);
58-
std::cout << answer << std::endl;
59-
std::cout << "\n----------\n"
60-
"question:\n";
61-
}
62-
// highlight-next-line
63-
pipe.finish_chat();
64-
}
65-
```
66-
</TabItemCpp>
67-
</LanguageTabs>
68-
69-
#### Streaming the Output
5+
# Streaming the Output
706

717
For more interactive UIs during generation, you can stream output tokens.
728

73-
##### Streaming Function
9+
:::info
10+
Streaming is supported for both `LLMPipeline` and `VLMPipeline`.
11+
:::
12+
13+
## Streaming Function
7414

7515
In this example, a function outputs words to the console immediately upon generation:
7616

@@ -138,11 +78,7 @@ In this example, a function outputs words to the console immediately upon genera
13878
</TabItemCpp>
13979
</LanguageTabs>
14080

141-
:::info
142-
For more information, refer to the [chat sample](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample/).
143-
:::
144-
145-
##### Custom Streamer Class
81+
## Custom Streamer Class
14682

14783
You can also create your custom streamer for more sophisticated processing:
14884

@@ -210,7 +146,7 @@ You can also create your custom streamer for more sophisticated processing:
210146
int main(int argc, char* argv[]) {
211147
std::string prompt;
212148
// highlight-next-line
213-
CustomStreamer custom_streamer;
149+
std::shared_ptr<CustomStreamer> custom_streamer;
214150

215151
std::string model_path = argv[1];
216152
ov::genai::LLMPipeline pipe(model_path, "CPU");
@@ -232,5 +168,5 @@ You can also create your custom streamer for more sophisticated processing:
232168
</LanguageTabs>
233169

234170
:::info
235-
For fully implemented iterable CustomStreamer refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
171+
For fully implemented iterable `CustomStreamer` refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
236172
:::

site/docs/use-cases/1-LLM-pipeline/_sections/_usage_options/_generation_parameters.mdx

-127
This file was deleted.

site/docs/use-cases/1-LLM-pipeline/_sections/_usage_options/index.mdx

-18
This file was deleted.

site/docs/use-cases/1-LLM-pipeline/index.mdx

-14
This file was deleted.

site/docs/use-cases/2-Image-Generation/index.mdx

-14
This file was deleted.

site/docs/use-cases/3-Processing-speech-whisper.md

-5
This file was deleted.

0 commit comments

Comments
 (0)