You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- run: set "PYTHONPATH=./src/python;" && call w_openvino_toolkit_windows_2024.2.0.dev20240515_x86_64\setupvars.bat && python -c "from openvino_genai.py_generate_pipeline import LLMPipeline" # cmd evaluates variables in a different way. Setting PYTHONPATH before setupvars.bat instead of doing that after solves that.
- run: set "PYTHONPATH=./src/python;" && call w_openvino_toolkit_windows_2024.2.0.dev20240515_x86_64\setupvars.bat && python -c "from openvino_genai import LLMPipeline" # cmd evaluates variables in a different way. Setting PYTHONPATH before setupvars.bat instead of doing that after solves that.
LLMPipeline is the main object used for decoding. You can initiliza it straigh away from the folder with the converted model. It will automanically load the main model, tokenizer, detokenizer and default generation configuration.
12
12
13
-
### In Python
13
+
### Python
14
14
15
15
A minimalist example:
16
16
```python
17
-
importpy_generate_pipelineasgenai # set more friendly module name
18
-
pipe =genai.LLMPipeline(model_path, "CPU")
17
+
importopenvino_genaiasov_genai
18
+
pipe =ov_genai.LLMPipeline(model_path, "CPU")
19
19
print(pipe.generate("The Sun is yellow bacause"))
20
20
```
21
21
22
+
Calling generate with custom generation config parameters, e.g. config for grouped beam search
23
+
```python
24
+
import openvino_genai as ov_genai
25
+
pipe = ov_genai.LLMPipeline(model_path, "CPU")
26
+
27
+
res = pipe.generate("The Sun is yellow bacause", max_new_tokens=30, num_groups=3, group_size=5)
28
+
print(res)
29
+
```
30
+
31
+
output:
32
+
```
33
+
'it is made up of carbon atoms. The carbon atoms are arranged in a linear pattern, which gives the yellow color. The arrangement of carbon atoms in'
Copy file name to clipboardexpand all lines: src/cpp/include/openvino/genai/generation_config.hpp
+9-8
Original file line number
Diff line number
Diff line change
@@ -14,9 +14,10 @@
14
14
namespaceov {
15
15
16
16
/**
17
-
* @brief controls the stopping condition for grouped beam search. The following values are possible:
18
-
* "early", where the generation stops as soon as there are `num_beams` complete candidates; "heuristic", where an
19
-
* heuristic is applied and the generation stops when is it very unlikely to find better candidates;
17
+
* @brief controls the stopping condition for grouped beam search. The following values are possible:
18
+
* "early" stops as soon as there are `num_beams` complete candidates.
19
+
"heuristic" stops when is it unlikely to find better candidates.
20
+
"never" stops when there cannot be better candidates.
20
21
*/
21
22
enumclassStopCriteria { early, heuristic, never };
22
23
@@ -25,11 +26,11 @@ enum class StopCriteria { early, heuristic, never };
25
26
*
26
27
* @param max_length the maximum length the generated tokens can have. Corresponds to the length of the input prompt +
27
28
* `max_new_tokens`. Its effect is overridden by `max_new_tokens`, if also set.
28
-
* @param max_new_tokens the maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
29
+
* @param max_new_tokens the maximum numbers of tokens to generate, excluding the number of tokens in the prompt. max_new_tokens has priority over max_length.
29
30
* @param ignore_eos if set to true, then generation will not stop even if <eos> token is met.
30
-
* @param num_beams number of beams for beam search. 1 means no beam search.
31
+
* @param num_beams number of beams for beam search. 1 disables beam search.
31
32
* @param num_beam_groups number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
32
-
* @param diversity_penalty this value is subtracted from a beam's score if it generates a token same as any beam from other group at a
33
+
* @param diversity_penalty this value is subtracted from a beam's score if it generates the same token as any beam from other group at a
33
34
* particular time. Note that `diversity_penalty` is only effective if `group beam search` is enabled.
34
35
* @param length_penalty exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to
35
36
* the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log
@@ -42,11 +43,11 @@ enum class StopCriteria { early, heuristic, never };
42
43
* heuristic is applied and the generation stops when is it very unlikely to find better candidates;
43
44
* "never", where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm).
44
45
* @param temperature the value used to modulate token probabilities for random sampling
45
-
* @param top_p if set to float < 1, only the smallest set of most probable tokens with probabilities
46
+
* @param top_p - if set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
46
47
* @param top_k the number of highest probability vocabulary tokens to keep for top-k-filtering.
47
48
* @param do_sample whether or not to use multinomial random sampling
48
49
* that add up to `top_p` or higher are kept.
49
-
* @param repetition_penalty the parameter for repetition penalty. 1.0 means no penalty.
50
+
* @param repetition_penalty the parameter for repetition penalty. 1.0 means no penalty. See https://arxiv.org/pdf/1909.05858.
0 commit comments