Name		Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt		CMakeLists.txt
README.md		README.md
beam_search_causal_lm.cpp		beam_search_causal_lm.cpp
greedy_causal_lm.cpp		greedy_causal_lm.cpp
group_beam_searcher.hpp		group_beam_searcher.hpp

README.md

Text generation C++ samples that support most popular models like LLaMA 2

These examples showcase inference of text-generation Large Language Models (LLMs): chatglm, LLaMA, Qwen and other models with the same signature. The applications don't have many configuration options to encourage the reader to explore and modify the source code. Loading user_ov_extensions provided by openvino-tokenizers to ov::Core enables tokenization. Run convert_tokenizer to generate IRs for the samples. group_beam_searcher.hpp implements the algorithm of the same name, which is used by beam_search_causal_lm. There is also a Jupyter notebook which provides an example of LLM-powered Chatbot in Python.

How it works

greedy_causal_lm

The program loads a tokenizer, a detokenizer and a model (.xml and .bin) to OpenVINO. A prompt is tokenized and passed to the model. The model greedily generates token by token until the special end of sequence (EOS) token is obtained. The predicted tokens are converted to chars and printed in a streaming fashion.

beam_search_causal_lm

The program loads a tokenizer, a detokenizer and a model (.xml and .bin) to OpenVINO. A prompt is tokenized and passed to the model. The model predicts a distribution over the next tokens and group beam search samples from that distribution to explore possible sequesnses. The result is converted to chars and printed.

Install OpenVINO

Install OpenVINO Archives >= 2023.3. <INSTALL_DIR> below refers to the extraction location.

Build `greedy_causal_lm`, `beam_search_causal_lm` and `user_ov_extensions`

Linux/macOS

git submodule update --init
source <INSTALL_DIR>/setupvars.sh
cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j

Windows

git submodule update --init
<INSTALL_DIR>\setupvars.bat
cmake -S .\ -B .\build\ && cmake --build .\build\ --config Release -j

Download and convert the model and tokenizers

The --upgrade-strategy eager option is needed to ensure optimum-intel is upgraded to the latest version.

Linux/macOS

source <INSTALL_DIR>/setupvars.sh
python3 -m pip install --upgrade-strategy eager "optimum>=1.14" -r ../../../llm_bench/python/requirements.txt ../../../thirdparty/openvino_contrib/modules/custom_operations/[transformers] --extra-index-url https://download.pytorch.org/whl/cpu
python3 ../../../llm_bench/python/convert.py --model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --output_dir ./TinyLlama-1.1B-Chat-v1.0/ --precision FP16 --stateful
convert_tokenizer ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --output ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ --with-detokenizer --trust-remote-code

Windows

<INSTALL_DIR>\setupvars.bat
python -m pip install --upgrade-strategy eager "optimum>=1.14" -r ..\..\..\llm_bench\python\requirements.txt ..\..\..\thirdparty\openvino_contrib\modules\custom_operations\[transformers] --extra-index-url https://download.pytorch.org/whl/cpu
python ..\..\..\llm_bench\python\convert.py --model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --output_dir .\TinyLlama-1.1B-Chat-v1.0\ --precision FP16 --stateful
convert_tokenizer .\TinyLlama-1.1B-Chat-v1.0\pytorch\dldt\FP16\ --output .\TinyLlama-1.1B-Chat-v1.0\pytorch\dldt\FP16\ --with-detokenizer --trust-remote-code

Run

Usage:

greedy_causal_lm <MODEL_DIR> "<PROMPT>"
beam_search_causal_lm <MODEL_DIR> "<PROMPT>"

Examples:

./build/greedy_causal_lm ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ "Why is the Sun yellow?"
./build/beam_search_causal_lm ./TinyLlama-1.1B-Chat-v1.0/pytorch/dldt/FP16/ "Why is the Sun yellow?"

To enable Unicode characters for Windows cmd open Region settings from Control panel. Administrative->Change system locale->Beta: Use Unicode UTF-8 for worldwide language support->OK. Reboot.

Supported models

This pipeline can work with other similar topologies produced by optimum-intel with the same model signature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpp

cpp

README.md

Text generation C++ samples that support most popular models like LLaMA 2

How it works

greedy_causal_lm

beam_search_causal_lm

Install OpenVINO

Build `greedy_causal_lm`, `beam_search_causal_lm` and `user_ov_extensions`

Linux/macOS

Windows

Download and convert the model and tokenizers

Linux/macOS

Windows

Run

Supported models

Files

cpp

Directory actions

More options

Directory actions

More options

Latest commit

History

cpp

Folders and files

parent directory

README.md

Text generation C++ samples that support most popular models like LLaMA 2

How it works

greedy_causal_lm

beam_search_causal_lm

Install OpenVINO

Build greedy_causal_lm, beam_search_causal_lm and user_ov_extensions

Linux/macOS

Windows

Download and convert the model and tokenizers

Linux/macOS

Windows

Run

Supported models

Build `greedy_causal_lm`, `beam_search_causal_lm` and `user_ov_extensions`