Skip to content

35 files changed

+445
-212
lines changed

site/docs/getting-started/introduction.mdx

+3-1
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,14 @@ This library is friendly to PC and laptop execution, and optimized for resource
2828
Using OpenVINO GenAI typically involves three main steps:
2929

3030
1. **Model Preparation:**
31-
- Convert model from other frameworks to the OpenVINO IR format (e.g. using `optimum-intel`), optionally applying weights compression.
3231
- Download pre-converted model in OpenVINO IR format (e.g. from [OpenVINO Toolkit](https://huggingface.co/OpenVINO) organization on Hugging Face).
32+
- Convert model from other frameworks to the OpenVINO IR format (e.g. using `optimum-intel`), optionally applying weights compression.
3333
:::info
3434

3535
You can use models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/home)
3636

37+
Refer to [Model Preparation](/docs/category/model-preparation) for more details.
38+
3739
:::
3840
2. **Pipeline Setup:** Initialize the appropriate pipeline for your task (`LLMPipeline`, `Text2ImagePipeline`, `WhisperPipeline`, `VLMPipeline`, etc.) with the converted model.
3941
3. **Inference:** Run the model with your inputs using the pipeline's simple API.

site/docs/guides/_category_.json

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"label": "Guides",
3+
"position": 3,
4+
"link": null
5+
}

site/docs/guides/lora-adapters.mdx

+101
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
5+
# LoRA Adapters
6+
7+
## Overview
8+
9+
Low-Rank Adaptation (LoRA) is a technique for efficiently fine-tuning large models without changing the base model's weights.
10+
LoRA adapters enable customization of model outputs for specific tasks, styles, or domains while requiring significantly fewer computational resources than full fine-tuning.
11+
12+
OpenVINO GenAI provides built-in support for LoRA adapters in text generation and image generation pipelines.
13+
This capability allows you to dynamically switch between or combine multiple adapters without recompiling the model.
14+
15+
:::info
16+
See [Supported Models](/docs/supported-models/) for the list of models that support LoRA adapters.
17+
:::
18+
19+
## Key Features
20+
21+
- **Dynamic Adapter Application:** Apply LoRA adapters at runtime without model recompilation.
22+
- **Multiple Adapter Support:** Blend effects from multiple adapters with different weights.
23+
- **Adapter Switching:** Change adapters between generation calls without pipeline reconstruction.
24+
- **Safetensors Format:** Support for industry-standard `safetensors` format for adapter files.
25+
26+
## Using LoRA Adapters
27+
28+
<LanguageTabs>
29+
<TabItemPython>
30+
```python
31+
import openvino_genai as ov_genai
32+
33+
# Initialize pipeline with adapters
34+
adapter_config = ov_genai.AdapterConfig()
35+
36+
# Add multiple adapters with different weights
37+
adapter1 = ov_genai.Adapter("path/to/lora1.safetensors")
38+
adapter2 = ov_genai.Adapter("path/to/lora2.safetensors")
39+
40+
adapter_config.add(adapter1, alpha=0.5)
41+
adapter_config.add(adapter2, alpha=0.5)
42+
43+
pipe = ov_genai.LLMPipeline(
44+
model_path,
45+
"CPU",
46+
adapters=adapter_config
47+
)
48+
49+
# Generate with current adapters
50+
output1 = pipe.generate("Generate story about", max_new_tokens=100)
51+
52+
# Switch to different adapter configuration
53+
new_config = ov_genai.AdapterConfig()
54+
new_config.add(adapter1, alpha=1.0)
55+
output2 = pipe.generate(
56+
"Generate story about",
57+
max_new_tokens=100,
58+
adapters=new_config
59+
)
60+
```
61+
</TabItemPython>
62+
<TabItemCpp>
63+
```cpp
64+
#include "openvino/genai/llm_pipeline.hpp"
65+
66+
int main() {
67+
ov::genai::AdapterConfig adapter_config;
68+
69+
// Add multiple adapters with different weights
70+
ov::genai::Adapter adapter1("path/to/lora1.safetensors");
71+
ov::genai::Adapter adapter2("path/to/lora2.safetensors");
72+
73+
adapter_config.add(adapter1, 0.5f);
74+
adapter_config.add(adapter2, 0.5f);
75+
76+
ov::genai::LLMPipeline pipe(
77+
model_path,
78+
"CPU",
79+
ov::genai::adapters(adapter_config)
80+
);
81+
82+
// Generate with current adapters
83+
auto output1 = pipe.generate("Generate story about", ov::genai::max_new_tokens(100));
84+
85+
// Switch to different adapter configuration
86+
ov::genai::AdapterConfig new_config;
87+
new_config.add(adapter1, 1.0f);
88+
auto output2 = pipe.generate(
89+
"Generate story about",
90+
ov::genai::adapters(new_config),
91+
ov::genai::max_new_tokens(100)
92+
);
93+
}
94+
```
95+
</TabItemCpp>
96+
</LanguageTabs>
97+
98+
## LoRA Adapters Sources
99+
100+
1. **Hugging Face:** Browse adapters for various models at [huggingface.co/models](https://huggingface.co/models?other=lora) using "LoRA" filter.
101+
2. **Civitai:** For stable diffusion models, [Civitai](https://civitai.com/) offers a wide range of LoRA adapters for various styles and subjects.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"label": "Model Preparation",
3+
"position": 1,
4+
"link": {
5+
"type": "generated-index",
6+
"description": "Prepare generative models for inference with OpenVINO GenAI."
7+
}
8+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
:::info
2+
3+
Refer to the [Use Cases](/docs/category/use-cases) for detailed instructions on using models with OpenVINO GenAI.
4+
5+
:::
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
sidebar_position: 2
3+
description: How to convert models to OpenVINO format
4+
---
5+
6+
import OptimumCLI from '@site/src/components/OptimumCLI';
7+
import UseCasesNote from './_use_cases_note.mdx';
8+
9+
# Convert Models to OpenVINO Format
10+
11+
This page explains how to convert various generative AI models from Hugging Face and ModelScope to OpenVINO IR format. Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
12+
13+
For downloading pre-converted models, see [Download Pre-Converted OpenVINO Models](./download-openvino-models.mdx).
14+
15+
## Converting Models from Hugging Face
16+
17+
1. Install `optimum-intel` package to download, convert and optimize models:
18+
```bash
19+
pip install optimum-intel@git+https://github.com/huggingface/optimum-intel.git
20+
```
21+
2. Download and convert a model to the OpenVINO IR format using `optimum-cli` tool from Hugging Face:
22+
<OptimumCLI />
23+
24+
:::tip
25+
26+
For better performance with minimal accuracy impact, convert the model to lower precision by using `--weight-format` argument:
27+
28+
<Tabs groupId="export-precision">
29+
<TabItem label="INT4" value="int4">
30+
<OptimumCLI weightFormat='int4' />
31+
</TabItem>
32+
<TabItem label="INT8" value="int8">
33+
<OptimumCLI weightFormat='int8' />
34+
</TabItem>
35+
<TabItem label="FP16" value="fp16">
36+
<OptimumCLI weightFormat='fp16' />
37+
</TabItem>
38+
</Tabs>
39+
40+
:::
41+
42+
:::info
43+
44+
The `--trust-remote-code` flag is required for some models that use custom code.
45+
46+
Check a full list of conversion options [here](https://huggingface.co/docs/optimum/en/intel/openvino/export).
47+
48+
:::
49+
50+
## Converting Models from ModelScope
51+
52+
ModelScope models need to be downloaded first, then converted to OpenVINO IR format.
53+
54+
1. Install `modelscope` and `optimum-intel` packages to download, convert and optimize models:
55+
```bash
56+
pip install modelscope
57+
pip install optimum-intel@git+https://github.com/huggingface/optimum-intel.git
58+
```
59+
2. Download the required model (e.g. `Qwen/Qwen2-7b`) to a local directory using `modelscope` tool:
60+
```bash
61+
modelscope download --model 'Qwen/Qwen2-7b' --local_dir <model_path>
62+
```
63+
3. Convert the model (and optionally compress weights) using `optimum-cli` tool:
64+
<OptimumCLI model='<model_path>' weightFormat='int4' />
65+
66+
<UseCasesNote />
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
sidebar_position: 1
3+
description: How to get pre-converted OpenVINO models
4+
---
5+
6+
import UseCasesNote from './_use_cases_note.mdx';
7+
8+
# Download Pre-Converted OpenVINO Models
9+
10+
OpenVINO GenAI allows to run different generative AI models (see [Supported Models](../../supported-models/index.mdx)).
11+
While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models can save time and effort.
12+
13+
## Download from Hugging Face
14+
15+
The simplest way to download models is using the `huggingface_hub` package:
16+
1. Install the package:
17+
```bash
18+
pip install huggingface_hub
19+
```
20+
2. Download the model, specifying model id (e.g. [`OpenVINO/phi-2-fp16-ov`](https://huggingface.co/OpenVINO/phi-2-fp16-ov)) and output directory `model_path`:
21+
```bash
22+
huggingface-cli download "OpenVINO/phi-2-fp16-ov" --local-dir model_path
23+
```
24+
:::info
25+
The `-ov` suffix in model id usually defines OpenVINO pre-converted model.
26+
:::
27+
28+
:::tip Available Model Collections
29+
OpenVINO offers collections of pre-converted and pre-optimized models available on Hugging Face under the [OpenVINO Toolkit](https://huggingface.co/OpenVINO) organization:
30+
31+
- [Large Language Models](https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd)
32+
- [Image Generation](https://huggingface.co/collections/OpenVINO/image-generation-67697d9952fb1eee4a252aa8)
33+
- [Speech-to-Text](https://huggingface.co/collections/OpenVINO/speech-to-text-672321d5c070537a178a8aeb)
34+
- [Visual Language Models](https://huggingface.co/collections/OpenVINO/visual-language-models-6792248a0eed57085d2b094b)
35+
- [Speculative Decoding Draft Models Collection](https://huggingface.co/collections/OpenVINO/speculative-decoding-draft-models-673f5d944d58b29ba6e94161)
36+
- and others.
37+
38+
These models are ready to use with OpenVINO GenAI.
39+
:::
40+
41+
42+
## Download from ModelScope
43+
44+
1. Install the package:
45+
```bash
46+
pip install modelscope
47+
```
48+
2. Download the model, specifying model id (e.g. [`OpenVINO/phi-2-fp16-ov`](https://modelscope.cn/models/OpenVINO/phi-2-fp16-ov)) and output directory `model_path`:
49+
```bash
50+
modelscope download --model "OpenVINO/phi-2-fp16-ov" --local_dir model_path
51+
```
52+
53+
<UseCasesNote />

site/docs/how-to-guides/_category_.json

-8
This file was deleted.

site/docs/how-to-guides/build-chat-agent.md

-5
This file was deleted.

site/docs/how-to-guides/hugging-face-to-openvino.md

-5
This file was deleted.

site/docs/how-to-guides/image-generation.md

-5
This file was deleted.

site/docs/how-to-guides/llm.md

-5
This file was deleted.

site/docs/how-to-guides/lora-adapters.md

-5
This file was deleted.

site/docs/how-to-guides/model-download-hugging-face.md

-5
This file was deleted.

site/docs/how-to-guides/model-scope-to-openvino.md

-5
This file was deleted.

site/docs/how-to-guides/speech-to-text.md

-5
This file was deleted.

site/docs/how-to-guides/vlm.md

-5
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,71 +1,3 @@
11
## Convert and Optimize Model
22

3-
<Tabs groupId="model-source">
4-
<TabItem label="From Hugging Face" value="huggingface">
5-
Use `optimum-intel` package to convert and optimize models:
6-
```bash
7-
pip install optimum-intel[openvino]
8-
```
9-
10-
Download and convert a model to the OpenVINO IR format:
11-
<Tabs groupId="export-precision">
12-
<TabItem label="Compress weights to the int4 precision" value="int4">
13-
```bash
14-
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format int4 ov_llama_2_7b_int4 --trust-remote-code
15-
```
16-
</TabItem>
17-
<TabItem label="Keep full model precision" value="fp16">
18-
```bash
19-
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format fp16 ov_llama_2_7b_fp16 --trust-remote-code
20-
```
21-
</TabItem>
22-
</Tabs>
23-
24-
:::info
25-
26-
Check a full list of conversion options [here](https://huggingface.co/docs/optimum/en/intel/openvino/export).
27-
28-
:::
29-
30-
:::tip
31-
32-
You can also use [pre-converted LLMs](https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd).
33-
34-
:::
35-
</TabItem>
36-
<TabItem label="From Model Scope" value="modelscope">
37-
Use `modelscope` and `optimum-intel` packages to convert and optimize models:
38-
```bash
39-
pip install modelscope optimum-intel[openvino]
40-
```
41-
42-
Download the required model to a local folder:
43-
```bash
44-
modelscope download --model 'Qwen/Qwen2-7b' --local_dir model_path
45-
```
46-
47-
:::tip
48-
49-
Convert the model and compress weights:
50-
51-
<Tabs groupId="export-precision">
52-
<TabItem label="INT4" value="int4">
53-
```bash
54-
optimum-cli export openvino -m model_path --weight-format int4 ov_qwen2_7b_int4 --task text-generation-with-past
55-
```
56-
</TabItem>
57-
<TabItem label="INT8" value="int8">
58-
```bash
59-
optimum-cli export openvino -m model_path --weight-format int8 ov_qwen2_7b_int8 --task text-generation-with-past
60-
```
61-
</TabItem>
62-
<TabItem label="FP16" value="fp16">
63-
```bash
64-
optimum-cli export openvino -m model_path --weight-format fp16 ov_qwen2_7b_fp16 --task text-generation-with-past
65-
```
66-
</TabItem>
67-
</Tabs>
68-
69-
:::
70-
</TabItem>
71-
</Tabs>
3+
Refer to the [Model Preparation](/docs/category/model-preparation) guide for detailed instructions on how to download, convert and optimize models for OpenVINO GenAI.

0 commit comments

Comments
 (0)