openvinotoolkit · andrei-kochin · Mar 20, 2025 · Mar 7, 2025 · Mar 7, 2025 · Mar 7, 2025
diff --git a/site/docs/guides/chat-scenario.mdx b/site/docs/guides/chat-scenario.mdx
@@ -0,0 +1,82 @@
+---
+sidebar_position: 2
+title: Chat Scenario
+---
+
+# Using OpenVINO GenAI in Chat Scenario
+
+For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.
+
+Refer to the [Stateful Models vs Stateless Models](/docs/concepts/stateful-vs-stateless-models) for more information about KV-cache.
+
+:::tip
+Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
+:::
+
+:::info
+Chat mode is supported for both `LLMPipeline` and `VLMPipeline`.
+:::
+
+A simple chat example (with grouped beam search decoding):
+
+<LanguageTabs>
+    <TabItemPython>
+        ```python showLineNumbers
+        import openvino_genai as ov_genai
+        pipe = ov_genai.LLMPipeline(model_path, 'CPU')
+
+        config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
+        pipe.set_generation_config(config)
+
+        # highlight-next-line
+        pipe.start_chat()
+        while True:
+            try:
+                prompt = input('question:\n')
+            except EOFError:
+                break
+            answer = pipe.generate(prompt)
+            print('answer:\n')
+            print(answer)
+            print('\n----------\n')
+        # highlight-next-line
+        pipe.finish_chat()
+        ```
+    </TabItemPython>
+    <TabItemCpp>
+        ```cpp showLineNumbers
+        #include "openvino/genai/llm_pipeline.hpp"
+        #include <iostream>
+
+        int main(int argc, char* argv[]) {
+            std::string prompt;
+
+            std::string model_path = argv[1];
+            ov::genai::LLMPipeline pipe(model_path, "CPU");
+
+            ov::genai::GenerationConfig config;
+            config.max_new_tokens = 100;
+            config.num_beam_groups = 3;
+            config.num_beams = 15;
+            config.diversity_penalty = 1.0f;
+
+            // highlight-next-line
+            pipe.start_chat();
+            std::cout << "question:\n";
+            while (std::getline(std::cin, prompt)) {
+                std::cout << "answer:\n";
+                auto answer = pipe.generate(prompt, config);
+                std::cout << answer << std::endl;
+                std::cout << "\n----------\n"
+                    "question:\n";
+            }
+            // highlight-next-line
+            pipe.finish_chat();
+        }
+        ```
+    </TabItemCpp>
+</LanguageTabs>
+
+:::info
+For more information, refer to the [Python](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/text_generation/chat_sample.py) and [C++](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/cpp/text_generation/chat_sample.cpp) chat samples.
+:::
diff --git a/site/docs/guides/model-preparation/convert-to-openvino.mdx b/site/docs/guides/model-preparation/convert-to-openvino.mdx
@@ -8,7 +8,8 @@ import UseCasesNote from './_use_cases_note.mdx';
 
 # Convert Models to OpenVINO Format
 
-This page explains how to convert various generative AI models from Hugging Face and ModelScope to OpenVINO IR format. Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
+This page explains how to convert various generative AI models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) to OpenVINO IR format.
+Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
 
 For downloading pre-converted models, see [Download Pre-Converted OpenVINO Models](./download-openvino-models.mdx).
 

diff --git a/site/docs/guides/model-preparation/download-openvino-models.mdx b/site/docs/guides/model-preparation/download-openvino-models.mdx
@@ -8,7 +8,7 @@ import UseCasesNote from './_use_cases_note.mdx';
 # Download Pre-Converted OpenVINO Models
 
 OpenVINO GenAI allows to run different generative AI models (see [Supported Models](../../supported-models/index.mdx)).
-While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models can save time and effort.
+While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) can save time and effort.
 
 ## Download from Hugging Face
 

diff --git a/...ections/_usage_options/_chat_scenario.mdx → site/docs/guides/streaming.mdx b/...ections/_usage_options/_chat_scenario.mdx → site/docs/guides/streaming.mdx
@@ -1,76 +1,16 @@
-### Using GenAI in Chat Scenario
+---
+sidebar_position: 3
+---
 
-For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.
-
-:::tip
-Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
-:::
-
-A simple chat example (with grouped beam search decoding):
-
-<LanguageTabs>
-    <TabItemPython>
-        ```python showLineNumbers
-        import openvino_genai as ov_genai
-        pipe = ov_genai.LLMPipeline(model_path, 'CPU')
-
-        config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
-        pipe.set_generation_config(config)
-
-        # highlight-next-line
-        pipe.start_chat()
-        while True:
-            try:
-                prompt = input('question:\n')
-            except EOFError:
-                break
-            answer = pipe.generate(prompt)
-            print('answer:\n')
-            print(answer)
-            print('\n----------\n')
-        # highlight-next-line
-        pipe.finish_chat()
-        ```
-    </TabItemPython>
-    <TabItemCpp>
-        ```cpp showLineNumbers
-        #include "openvino/genai/llm_pipeline.hpp"
-        #include <iostream>
-
-        int main(int argc, char* argv[]) {
-            std::string prompt;
-
-            std::string model_path = argv[1];
-            ov::genai::LLMPipeline pipe(model_path, "CPU");
-
-            ov::genai::GenerationConfig config;
-            config.max_new_tokens = 100;
-            config.num_beam_groups = 3;
-            config.num_beams = 15;
-            config.diversity_penalty = 1.0f;
-
-            // highlight-next-line
-            pipe.start_chat();
-            std::cout << "question:\n";
-            while (std::getline(std::cin, prompt)) {
-                std::cout << "answer:\n";
-                auto answer = pipe.generate(prompt, config);
-                std::cout << answer << std::endl;
-                std::cout << "\n----------\n"
-                    "question:\n";
-            }
-            // highlight-next-line
-            pipe.finish_chat();
-        }
-        ```
-    </TabItemCpp>
-</LanguageTabs>
-
-#### Streaming the Output
+# Streaming the Output
 
 For more interactive UIs during generation, you can stream output tokens.
 
-##### Streaming Function
+:::info
+Streaming is supported for both `LLMPipeline` and `VLMPipeline`.
+:::
+
+## Streaming Function
 
 In this example, a function outputs words to the console immediately upon generation:
 
@@ -138,11 +78,7 @@ In this example, a function outputs words to the console immediately upon genera
     </TabItemCpp>
 </LanguageTabs>
 
-:::info
-For more information, refer to the [chat sample](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample/).
-:::
-
-##### Custom Streamer Class
+## Custom Streamer Class
 
 You can also create your custom streamer for more sophisticated processing:
 
@@ -210,7 +146,7 @@ You can also create your custom streamer for more sophisticated processing:
         int main(int argc, char* argv[]) {
             std::string prompt;
             // highlight-next-line
-            CustomStreamer custom_streamer;
+            std::shared_ptr<CustomStreamer> custom_streamer;
 
             std::string model_path = argv[1];
             ov::genai::LLMPipeline pipe(model_path, "CPU");
@@ -232,5 +168,5 @@ You can also create your custom streamer for more sophisticated processing:
 </LanguageTabs>
 
 :::info
-For fully implemented iterable CustomStreamer refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
+For fully implemented iterable `CustomStreamer` refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
 :::
diff --git a/...cs/use-cases/1-LLM-pipeline/_sections/_usage_options/_generation_parameters.mdx b/...cs/use-cases/1-LLM-pipeline/_sections/_usage_options/_generation_parameters.mdx
diff --git a/site/docs/use-cases/1-LLM-pipeline/_sections/_usage_options/index.mdx b/site/docs/use-cases/1-LLM-pipeline/_sections/_usage_options/index.mdx
diff --git a/site/docs/use-cases/1-LLM-pipeline/index.mdx b/site/docs/use-cases/1-LLM-pipeline/index.mdx
diff --git a/site/docs/use-cases/2-Image-Generation/index.mdx b/site/docs/use-cases/2-Image-Generation/index.mdx
diff --git a/site/docs/use-cases/3-Processing-speech-whisper.md b/site/docs/use-cases/3-Processing-speech-whisper.md