Skip to content

Commit d6d4f00

Browse files
Merge pull request #306 from pavel-esir/update_readme
remove disable-statefull from speculative decoding Readme
2 parents 1b8f68b + a3b6b39 commit d6d4f00

File tree

1 file changed

+2
-5
lines changed

1 file changed

+2
-5
lines changed

text_generation/causal_lm/cpp/README.md

+2-5
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,8 @@ Speculative decoding works the following way. The draft model predicts the next
4444

4545
This approach reduces the need for multiple infer requests to the main model, enhancing performance. For instance, in more predictable parts of text generation, the draft model can, in best-case scenarios, generate the next K tokens that exactly match the target. In tha caste the are validated in a single inference request to the main model (which is bigger, more accurate but slower) instead of running K subsequent requests. More details can be found in the original paper https://arxiv.org/pdf/2211.17192.pdf, https://arxiv.org/pdf/2302.01318.pdf
4646

47-
Important note: models should belong to the same familiy and have same tokenizers, and they both should be converted with `--disable-stateful`, e.g.:
48-
49-
```sh
50-
python3 ../../../llm_bench/python/convert.py --model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --output_dir ./TinyLlama-1.1B-Chat-v1.0/ --precision FP16
51-
```
47+
> [!NOTE]
48+
>Models should belong to the same family and have same tokenizers.
5249
5350
## Install OpenVINO
5451

0 commit comments

Comments
 (0)