sidebar_position |
---|
3 |
For more interactive UIs during generation, you can stream output tokens.
:::info
Streaming is supported for both LLMPipeline
and VLMPipeline
.
:::
In this example, a function outputs words to the console immediately upon generation:
```python showLineNumbers import openvino_genai as ov_genai pipe = ov_genai.LLMPipeline(model_path, "CPU") # highlight-start
# Create a streamer function
def streamer(subword):
print(subword, end='', flush=True)
# Return flag corresponds whether generation should be stopped.
return ov_genai.StreamingStatus.RUNNING
# highlight-end
# highlight-next-line
pipe.start_chat()
while True:
try:
prompt = input('question:\n')
except EOFError:
break
# highlight-next-line
pipe.generate(prompt, streamer=streamer, max_new_tokens=100)
print('\n----------\n')
# highlight-next-line
pipe.finish_chat()
```
</TabItemPython>
<TabItemCpp>
```cpp showLineNumbers
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>
int main(int argc, char* argv[]) {
std::string prompt;
std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "CPU");
// highlight-start
// Create a streamer function
auto streamer = [](std::string word) {
std::cout << word << std::flush;
// Return flag corresponds whether generation should be stopped.
return ov::genai::StreamingStatus::RUNNING;
};
// highlight-end
// highlight-next-line
pipe.start_chat();
std::cout << "question:\n";
while (std::getline(std::cin, prompt)) {
// highlight-next-line
pipe.generate(prompt, ov::genai::streamer(streamer), ov::genai::max_new_tokens(100));
std::cout << "\n----------\n"
"question:\n";
}
// highlight-next-line
pipe.finish_chat();
}
```
</TabItemCpp>
You can also create your custom streamer for more sophisticated processing:
```python showLineNumbers import openvino_genai as ov_genai pipe = ov_genai.LLMPipeline(model_path, "CPU") # highlight-start
# Create custom streamer class
class CustomStreamer(ov_genai.StreamerBase):
def __init__(self):
super().__init__()
# Initialization logic.
def write(self, token_id) -> bool:
# Custom decoding/tokens processing logic.
# Return flag corresponds whether generation should be stopped.
return ov_genai.StreamingStatus.RUNNING
def end(self):
# Custom finalization logic.
pass
# highlight-end
# highlight-next-line
pipe.start_chat()
while True:
try:
prompt = input('question:\n')
except EOFError:
break
# highlight-next-line
pipe.generate(prompt, streamer=CustomStreamer(), max_new_tokens=100)
print('\n----------\n')
# highlight-next-line
pipe.finish_chat()
```
</TabItemPython>
<TabItemCpp>
```cpp showLineNumbers
#include "openvino/genai/streamer_base.hpp"
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>
// highlight-start
// Create custom streamer class
class CustomStreamer: public ov::genai::StreamerBase {
public:
bool write(int64_t token) {
// Custom decoding/tokens processing logic.
// Return flag corresponds whether generation should be stopped.
return ov::genai::StreamingStatus::RUNNING;
};
void end() {
// Custom finalization logic.
};
};
// highlight-end
int main(int argc, char* argv[]) {
std::string prompt;
// highlight-next-line
std::shared_ptr<CustomStreamer> custom_streamer;
std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "CPU");
// highlight-next-line
pipe.start_chat();
std::cout << "question:\n";
while (std::getline(std::cin, prompt)) {
// highlight-next-line
pipe.generate(prompt, ov::genai::streamer(custom_streamer), ov::genai::max_new_tokens(100));
std::cout << "\n----------\n"
"question:\n";
}
// highlight-next-line
pipe.finish_chat();
}
```
</TabItemCpp>
:::info
For fully implemented iterable CustomStreamer
refer to multinomial_causal_lm sample.
:::