You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- run: set "PYTHONPATH=./src/python;" && call w_openvino_toolkit_windows_2024.2.0.dev20240515_x86_64\setupvars.bat && python -c "from openvino_genai import LLMPipeline" # cmd evaluates variables in a different way. Setting PYTHONPATH before setupvars.bat instead of doing that after solves that.
LLMPipeline is the main object used for decoding. You can initiliza it straigh away from the folder with the converted model. It will automanically load the main model, tokenizer, detokenizer and default generation configuration.
11
+
`LLMPipeline` is the main object used for decoding. You can construct it straight away from the folder with the converted model. It will automatically load the main model, tokenizer, detokenizer and default generation configuration.
12
12
13
13
### Python
14
14
@@ -24,8 +24,8 @@ Calling generate with custom generation config parameters, e.g. config for group
24
24
import openvino_genai as ov_genai
25
25
pipe = ov_genai.LLMPipeline(model_path, "CPU")
26
26
27
-
res= pipe.generate("The Sun is yellow bacause", max_new_tokens=30, num_groups=3, group_size=5)
28
-
print(res)
27
+
result= pipe.generate("The Sun is yellow bacause", max_new_tokens=30, num_groups=3, group_size=5, diversity_penalty=1.5)
Copy file name to clipboardexpand all lines: src/cpp/include/openvino/genai/generation_config.hpp
+28-15
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@
12
12
#include"openvino/genai/tokenizer.hpp"
13
13
14
14
namespaceov {
15
+
namespacegenai {
15
16
16
17
/**
17
18
* @brief controls the stopping condition for grouped beam search. The following values are possible:
@@ -22,43 +23,48 @@ namespace ov {
22
23
enumclassStopCriteria { early, heuristic, never };
23
24
24
25
/**
25
-
* @brief structure to keep generation config parameters.
26
+
* @brief Structure to keep generation config parameters. For a selected method of decoding, only parameters from that group
27
+
* and generic parameters are used. For example, if do_sample is set to true, then only generic parameters and random sampling parameters will
28
+
* be used while greedy and beam search parameters will not affect decoding at all.
26
29
*
30
+
* Generic parameters:
27
31
* @param max_length the maximum length the generated tokens can have. Corresponds to the length of the input prompt +
28
32
* `max_new_tokens`. Its effect is overridden by `max_new_tokens`, if also set.
29
33
* @param max_new_tokens the maximum numbers of tokens to generate, excluding the number of tokens in the prompt. max_new_tokens has priority over max_length.
30
34
* @param ignore_eos if set to true, then generation will not stop even if <eos> token is met.
35
+
* @param pad_token_id token_id of <pad> (padding)
36
+
* @param bos_token_id token_id of <bos> (beggining of sentence)
37
+
* @param eos_token_id token_id of <eos> (end of sentence)
* @param num_return_sequences the number of sequences to return for grouped beam search decoding
50
+
* @param num_return_sequences the number of sequences to return for grouped beam search decoding.
40
51
* @param no_repeat_ngram_size if set to int > 0, all ngrams of that size can only occur once.
41
52
* @param stop_criteria controls the stopping condition for grouped beam search. It accepts the following values:
42
53
* "early", where the generation stops as soon as there are `num_beams` complete candidates; "heuristic", where an
43
54
* heuristic is applied and the generation stops when is it very unlikely to find better candidates;
44
55
* "never", where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm).
45
-
* @param temperature the value used to modulate token probabilities for random sampling
56
+
*
57
+
* Random sampling parameters:
58
+
* @param temperature the value used to modulate token probabilities for random sampling.
46
59
* @param top_p - if set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
47
60
* @param top_k the number of highest probability vocabulary tokens to keep for top-k-filtering.
48
-
* @param do_sample whether or not to use multinomial random sampling
49
-
* that add up to `top_p` or higher are kept.
50
-
* @param repetition_penalty the parameter for repetition penalty. 1.0 means no penalty. See https://arxiv.org/pdf/1909.05858.
0 commit comments