Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 3e745f7

Browse files
committedMar 24, 2025·
Start the release process for 6.1.0
1 parent e4c5845 commit 3e745f7

18 files changed

+527
-314
lines changed
 

‎README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@
55
[![OS - Windows | Linux](https://img.shields.io/badge/OS-windows%20%7C%20linux-blue)](https://github.com/onnx/turnkeyml/blob/main/docs/install.md "Check out our instructions")
66
[![Made with Python](https://img.shields.io/badge/Python-3.8,3.10-blue?logo=python&logoColor=white)](https://github.com/onnx/turnkeyml/blob/main/docs/install.md "Check out our instructions")
77

8-
We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing no-code CLIs and low-code APIs for both general ONNX workflows with `turnkey` as well as LLMs with `lemonade`.
8+
We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing a full SDK for LLMs with the Lemonade SDK, as well as a no-code CLIs for general ONNX workflows with `turnkey`.
99

10-
| [**Lemonade SDK**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md) | [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md) |
10+
| [**Lemonade SDK**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) | [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md) |
1111
|:----------------------------------------------: |:-----------------------------------------------------------------: |
12-
| Serve and benchmark LLMs on CPU, GPU, and NPU. <br/> [Click here to get started with `lemonade`.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md) | Export and optimize ONNX models for CNNs and Transformers. <br/> [Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md) |
12+
| Serve and benchmark LLMs on CPU, GPU, and NPU. <br/> [Click here to get started with Lemonade.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) | Export and optimize ONNX models for CNNs and Transformers. <br/> [Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md) |
1313
| <img src="https://github.com/onnx/turnkeyml/blob/main/img/llm_demo.png?raw=true"/> | <img src="https://github.com/onnx/turnkeyml/blob/main/img/classic_demo.png?raw=true"/> |
1414

1515

‎docs/lemonade/getting_started.md ‎docs/lemonade/README.md

+31-35
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
1-
# Lemonade SDK
1+
# 🍋 Lemonade SDK
22

3-
The `lemonade` SDK provides everything needed to get up and running quickly with LLMs on OnnxRuntime GenAI (OGA).
3+
*The long-term objective of the Lemonade SDK is to provide the ONNX ecosystem with the same kind of tools available in the GGUF ecosystem.*
4+
5+
Lemonade SDK is built on top of [OnnxRuntime GenAI (OGA)](https://github.com/microsoft/onnxruntime-genai), an ONNX LLM inference engine developed by Microsoft to improve the LLM experience on AI PCs, especially those with accelerator hardware such as Neural Processing Units (NPUs).
6+
7+
The Lemonade SDK provides everything needed to get up and running quickly with LLMs on OGA:
48

59
- [Quick installation from PyPI](#install).
610
- [CLI with tools for prompting, benchmarking, and accuracy tests](#cli-commands).
@@ -9,56 +13,48 @@ The `lemonade` SDK provides everything needed to get up and running quickly with
913

1014
# Install
1115

12-
You can quickly get started with `lemonade` by installing the `turnkeyml` [PyPI package](#from-pypi) with the appropriate extras for your backend, or you can [install from source](#from-source-code) by cloning and installing this repository.
16+
You can quickly get started with Lemonade by installing the `turnkeyml` [PyPI package](#installing-from-pypi) with the appropriate extras for your backend, [install from source](#installing-from-source) by cloning and installing this repository, or [with GUI installation for Lemonade Server](#installing-from-lemonade_server_installerexe).
1317

14-
## From PyPI
18+
## Installing From PyPI
1519

16-
To install `lemonade` from PyPI:
20+
To install the Lemonade SDK from PyPI:
1721

1822
1. Create and activate a [miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
1923
```bash
2024
conda create -n lemon python=3.10
25+
```
26+
27+
```bash
2128
conda activate lemon
2229
```
2330

24-
3. Install lemonade for you backend of choice:
31+
3. Install Lemonade for your backend of choice:
2532
- [OnnxRuntime GenAI with CPU backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
2633
```bash
27-
pip install -e turnkeyml[llm-oga-cpu]
34+
pip install turnkeyml[llm-oga-cpu]
2835
```
2936
- [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
3037
> Note: Requires Windows and a DirectML-compatible iGPU.
3138
```bash
32-
pip install -e turnkeyml[llm-oga-igpu]
39+
pip install turnkeyml[llm-oga-igpu]
3340
```
3441
- OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
35-
> Note: Ryzen AI Hybrid requires a Windows 11 PC with a AMD Ryzen™ AI 9 HX375, Ryzen AI 9 HX370, or Ryzen AI 9 365 processor.
36-
> - Install the [Ryzen AI driver >= 32.0.203.237](https://ryzenai.docs.amd.com/en/latest/inst.html#install-npu-drivers) (you can check your driver version under Device Manager > Neural Processors).
37-
> - Visit the [AMD Hugging Face page](https://huggingface.co/collections/amd/quark-awq-g128-int4-asym-fp16-onnx-hybrid-13-674b307d2ffa21dd68fa41d5) for supported checkpoints.
38-
```bash
39-
pip install -e turnkeyml[llm-oga-hybrid]
40-
lemonade-install --ryzenai hybrid
41-
```
42+
> Note: Ryzen AI Hybrid requires a Windows 11 PC with an AMD Ryzen™ AI 300-series processor.
43+
44+
- Follow the environment setup instructions [here](https://ryzenai.docs.amd.com/en/latest/llm/high_level_python.html)
4245
- Hugging Face (PyTorch) LLMs for CPU backend:
4346
```bash
44-
pip install -e turnkeyml[llm]
47+
pip install turnkeyml[llm]
4548
```
4649
- llama.cpp: see [instructions](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/llamacpp.md).
4750

4851
4. Use `lemonade -h` to explore the LLM tools, and see the [command](#cli-commands) and [API](#api) examples below.
4952

53+
## Installing From Source
5054

51-
## From Source Code
52-
53-
To install `lemonade` from source code:
54-
55-
1. Clone: `git clone https://github.com/onnx/turnkeyml.git`
56-
1. `cd turnkeyml` (where `turnkeyml` is the repo root of your clone)
57-
- Note: be sure to run these installation instructions from the repo root.
58-
1. Follow the same instructions as in the [PyPI installation](#from-pypi), except replace the `turnkeyml` with a `.`.
59-
- For example: `pip install -e .[llm-oga-igpu]`
55+
The Lemonade SDK can be installed from source code by cloning this repository and following the instructions [here](source_installation_inst.md).
6056

61-
## From Lemonade_Server_Installer.exe
57+
## Installing From Lemonade_Server_Installer.exe
6258

6359
The Lemonade Server is available as a standalone tool with a one-click Windows installer `.exe`. Check out the [Lemonade_Server_Installer.exe guide](lemonade_server_exe.md) for installation instructions and the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the functionality.
6460

@@ -76,14 +72,14 @@ lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4
7672

7773
Can be read like this:
7874

79-
> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), on to the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
75+
> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), onto the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
8076

8177
The `lemonade -h` command will show you which options and Tools are available, and `lemonade TOOL -h` will tell you more about that specific Tool.
8278

8379

8480
## Prompting
8581

86-
To prompt your LLM try:
82+
To prompt your LLM, try one of the following:
8783

8884
OGA iGPU:
8985
```bash
@@ -101,11 +97,11 @@ You can also replace the `facebook/opt-125m` with any Hugging Face checkpoint yo
10197
10298
You can also set the `--device` argument in `oga-load` and `huggingface-load` to load your LLM on a different device.
10399
104-
Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about those tools.
100+
Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about these tools.
105101
106102
## Accuracy
107103
108-
To measure the accuracy of an LLM using MMLU, try this:
104+
To measure the accuracy of an LLM using MMLU (Measuring Massive Multitask Language Understanding), try the following:
109105
110106
OGA iGPU:
111107
```bash
@@ -117,13 +113,13 @@ Hugging Face:
117113
lemonade -i facebook/opt-125m huggingface-load accuracy-mmlu --tests management
118114
```
119115
120-
That command will run just the management test from MMLU on your LLM and save the score to the lemonade cache at `~/.cache/lemonade`.
116+
This command will run just the management test from MMLU on your LLM and save the score to the lemonade cache at `~/.cache/lemonade`. You can also run other subject tests by replacing management with the new test subject name. For the full list of supported subjects, see the [MMLU Accuracy Read Me](mmlu_accuracy.md).
121117
122118
You can run the full suite of MMLU subjects by omitting the `--test` argument. You can learn more about this with `lemonade accuracy-mmlu -h`.
123119
124120
## Benchmarking
125121
126-
To measure the time-to-first-token and tokens/second of an LLM, try this:
122+
To measure the time-to-first-token and tokens/second of an LLM, try the following:
127123
128124
OGA iGPU:
129125
```bash
@@ -135,7 +131,7 @@ Hugging Face:
135131
lemonade -i facebook/opt-125m huggingface-load huggingface-bench
136132
```
137133
138-
That command will run a few warmup iterations, then a few generation iterations where performance data is collected.
134+
This command will run a few warm-up iterations, then a few generation iterations where performance data is collected.
139135
140136
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` or `lemonade huggingface-bench -h`.
141137
@@ -173,15 +169,15 @@ You can launch an OpenAI-compatible server with:
173169
lemonade serve
174170
```
175171
176-
Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided.
172+
Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided as well as how to launch the server with more detailed informational messages enabled.
177173
178174
# API
179175
180176
Lemonade is also available via API.
181177
182178
## High-Level APIs
183179
184-
The high-level lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate lemonade LLMs into Python applications.
180+
The high-level Lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate Lemonade LLMs into Python applications.
185181
186182
OGA iGPU:
187183
```python

‎docs/lemonade/ort_genai_igpu.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,20 @@
44

55
## Installation
66

7-
See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install) for the OGA iGPU backend.
7+
See [Lemonade Installation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install) for the OGA iGPU backend.
88

99
## Get models
1010

11-
- The oga-load tool can download models from Hugging Face and build ONNX files using OGA's `model_builder`, which can quantized and optimize models for both igpu and cpu.
11+
- The oga-load tool can download models from Hugging Face and build ONNX files using OGA's `model_builder`, which can quantize and optimize models for both iGPU and CPU.
1212
- Download and build ONNX model files:
1313
- `lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4`
1414
- `lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device cpu --dtype int4`
1515
- The ONNX model files will be stored in the respective subfolder of the lemonade cache folder and will be reused in future oga-load calls:
1616
- `oga_models\microsoft_phi-3-mini-4k-instruct\dml-int4`
1717
- `oga_models\microsoft_phi-3-mini-4k-instruct\cpu-int4`
18-
- The ONNX model build process can be forced to run again, overwriting the above cache, by using the --force flag:
18+
- The ONNX model build process can be forced to run again, overwriting the above cache, by using the `--force` flag:
1919
- `lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 --force`
20-
- Transformer model architectures supported by the model_builder tool include many popular state-of-the-art models:
20+
- Transformer model architectures supported by the model_builder tool include many popular state-of-the-art models, such as:
2121
- Gemma
2222
- LLaMa
2323
- Mistral
@@ -26,16 +26,16 @@ See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lem
2626
- Nemotron
2727
- For the full list of supported models, please see the [model_builder documentation](https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/README.md).
2828
- The following quantizations are supported for automatically building ONNXRuntime GenAI model files from the Hugging Face repository:
29-
- cpu: fp32, int4
30-
- igpu: fp16, int4
29+
- `cpu`: `fp32`, `int4`
30+
- `igpu`: `fp16`, `int4`
3131

3232
## Directory structure:
3333
- The model_builder tool caches Hugging Face files and temporary ONNX external data files in `<LEMONADE CACHE>\model_builder`
3434
- The output from model_builder is stored in `<LEMONADE_CACHE>\oga_models\<MODELNAME>\<SUBFOLDER>`
3535
- `MODELNAME` is the Hugging Face checkpoint name where any '/' is mapped to an '_' and everything is lower case.
36-
- `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for igpu, `cpu` for cpu, and `npu` for npu) and `DTYPE` is the datatype.
37-
- If the --int4-block-size flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
38-
- Other ONNX models in the format required by onnxruntime-genai can be loaded in lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
39-
- Use the -i and --subfolder flags to specify the folder and subfolder:
36+
- `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for `igpu`, `cpu` for `cpu`, and `npu` for `npu`) and `DTYPE` is the datatype.
37+
- If the `--int4-block-size` flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
38+
- Other ONNX models in the format required by onnxruntime-genai can be loaded by Lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
39+
- Use the `-i` and `--subfolder` flags to specify the folder and subfolder, for example:
4040
- `lemonade -i my_model_name --subfolder my_subfolder --device igpu --dtype int4 oga-load`
4141
- Lemonade will expect the ONNX model files to be located in `<LEMONADE_CACHE>\oga_models\my_model_name\my_subfolder`

‎docs/lemonade/server_spec.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ See the [Lemonade_Server_Installer.exe instructions](lemonade_server_exe.md) to
4141

4242
### Python Environment
4343

44-
If you have `lemonade` [installed in a Python environment](getting_started.md#from-pypi), simply activate it and run the following command to start the server:
44+
If you have Lemonade [installed in a Python environment](README.md#install), simply activate it and run the following command to start the server:
4545

4646
```bash
4747
lemonade serve
+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Installing the Lemonade SDK From Source Code
2+
3+
The following provides the steps to install Lemonade from source code. We also provide 2 alternative ways to install Lemonade:
4+
5+
* To install Lemonade via PyPi, see the [Lemonade README](README.md).
6+
* To install Lemonade Server using the standalone GUI installer, see the [Lemonade Server Installer README](lemonade_server_exe.md).
7+
8+
1. Clone: `git clone https://github.com/onnx/turnkeyml.git`
9+
1. `cd turnkeyml` (where `turnkeyml` is the repo root of your clone)
10+
- Note: be sure to run these installation instructions from the repo root.
11+
1. Create and activate a [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
12+
```bash
13+
conda create -n lemon python=3.10
14+
```
15+
16+
```bash
17+
conda activate lemon
18+
```
19+
20+
3. Install Lemonade for your backend of choice:
21+
- [OnnxRuntime GenAI with CPU backend](ort_genai_igpu.md):
22+
```bash
23+
pip install -e .[llm-oga-cpu]
24+
```
25+
- [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](ort_genai_igpu.md):
26+
> Note: Requires Windows and a DirectML-compatible iGPU.
27+
```bash
28+
pip install -e .[llm-oga-igpu]
29+
```
30+
- OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
31+
> Note: Ryzen AI Hybrid requires a Windows 11 PC with an AMD Ryzen™ AI 300-series processor.
32+
> - Ensure you have the correct driver version installed by checking [here](https://ryzenai.docs.amd.com/en/latest/inst.html#install-npu-drivers).
33+
> - Visit the [AMD Hugging Face OGA Hybrid collection](https://huggingface.co/collections/amd/ryzenai-14-llm-hybrid-models-67da31231bba0f733750a99c) for supported checkpoints.
34+
```bash
35+
pip install -e .[llm-oga-hybrid]
36+
```
37+
38+
```bash
39+
lemonade-install --ryzenai hybrid
40+
```
41+
- Hugging Face (PyTorch) LLMs for CPU backend:
42+
```bash
43+
pip install -e .[llm]
44+
```
45+
- llama.cpp: see [instructions](llamacpp.md).
46+
47+
4. Use `lemonade -h` to explore the LLM tools, and see the [commands](README.md#cli-commands) and [APIs](README.md#api) in the [Lemonade SDK REAMDE](README.md).

‎docs/readme.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## LLMs: `lemonade` tooling
44

55
The `docs/lemonade` directory has documentation for the LLM-focused `lemonade` tooling:
6-
- [Getting Started](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md): start here for LLMs.
6+
- [Getting Started](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md): start here for LLMs.
77
- Accuracy tests (task performance):
88
- [HumanEval](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/humaneval_accuracy.md): details of the HumanEval coding task test.
99
- [MMLU](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md): details of the MMLU general reasoning test.

‎examples/lemonade/api_oga_cpu.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
66
Make sure you have set up your OGA device in your Python environment.
77
See for details:
8-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
8+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
99
"""
1010

1111
from lemonade.api import from_pretrained

‎examples/lemonade/api_oga_cpu_streaming.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
99
Make sure you have set up your OGA device in your Python environment.
1010
See for details:
11-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
11+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
1212
"""
1313

1414
from threading import Thread

‎examples/lemonade/api_oga_hybrid.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
66
Make sure you have set up your OGA device in your Python environment.
77
See for details:
8-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
8+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
99
"""
1010

1111
from lemonade.api import from_pretrained

‎examples/lemonade/api_oga_hybrid_streaming.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""
22
This example demonstrates how to use the lemonade API to load a model for
3-
inference on Ryzen AI hybrid mode (NPU and iGPU together) via OnnxRuntime-GenAI
3+
inference on Ryzen AI hybrid mode (NPU and iGPU together) via OnnxRuntime-GenAI
44
using the oga-cpu recipe, and then use a thread to generate a streaming the
55
response to a prompt.
66
@@ -9,7 +9,7 @@
99
1010
Make sure you have set up your OGA device in your Python environment.
1111
See for details:
12-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
12+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
1313
"""
1414

1515
from threading import Thread

‎examples/lemonade/api_oga_igpu.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
66
Make sure you have set up your OGA device in your Python environment.
77
See for details:
8-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
8+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
99
"""
1010

1111
from lemonade.api import from_pretrained

‎examples/lemonade/api_oga_igpu_streaming.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
99
Make sure you have set up your OGA device in your Python environment.
1010
See for details:
11-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
11+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
1212
"""
1313

1414
from threading import Thread

‎examples/lemonade/api_oga_npu.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
66
Make sure you have set up your OGA device in your Python environment.
77
See for details:
8-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
8+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
99
"""
1010

1111
from lemonade.api import from_pretrained

‎examples/lemonade/api_oga_npu_streaming.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
99
Make sure you have set up your OGA device in your Python environment.
1010
See for details:
11-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
11+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
1212
"""
1313

1414
from threading import Thread

‎src/lemonade/cli.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def main():
6464
description=f"""Tools for evaluating and deploying LLMs (v{version_number}).
6565
6666
Read this to learn the command syntax:
67-
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md""",
67+
https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md""",
6868
formatter_class=NiceHelpFormatter,
6969
)
7070

‎src/lemonade/tools/ort_genai/oga.py

+266-198
Large diffs are not rendered by default.

‎src/lemonade_install/install.py

+157-55
Original file line numberDiff line numberDiff line change
@@ -13,26 +13,85 @@
1313
import pkg_resources
1414
from pathlib import Path
1515
from typing import Optional
16+
import glob
1617
import zipfile
1718
import requests
1819
import huggingface_hub
1920

21+
DEFAULT_RYZEN_AI_VERSION = "1.4.0"
22+
2023
lemonade_install_dir = Path(__file__).parent.parent.parent
2124
DEFAULT_AMD_OGA_NPU_DIR = os.path.join(
2225
lemonade_install_dir, "install", "ryzen_ai", "npu"
2326
)
2427
DEFAULT_AMD_OGA_HYBRID_DIR = os.path.join(
2528
lemonade_install_dir, "install", "ryzen_ai", "hybrid"
2629
)
27-
DEFAULT_AMD_OGA_HYBRID_ARTIFACTS_PARENT_DIR = os.path.join(
28-
DEFAULT_AMD_OGA_HYBRID_DIR,
29-
"hybrid-llm-artifacts_1.3.0_lounge",
30-
)
3130
DEFAULT_QUARK_VERSION = "quark-0.6.0"
3231
DEFAULT_QUARK_DIR = os.path.join(
3332
lemonade_install_dir, "install", "quark", DEFAULT_QUARK_VERSION
3433
)
3534

35+
npu_install_data = {
36+
"1.3.0": {
37+
"artifacts_zipfile": "ryzen_ai_13_ga/npu-llm-artifacts_1.3.0.zip",
38+
"license_file": (
39+
"https://account.amd.com/content/dam/account/en/licenses/download/"
40+
"amd-end-user-license-agreement.pdf"
41+
),
42+
"license_tag": "Beta ",
43+
},
44+
"1.4.0": {
45+
"artifacts_zipfile": "ryzen_ai_14_4571/npu-llm-artifacts_1.4.0.zip",
46+
"license_file": (
47+
"https://account.amd.com/content/dam/account/en/licenses/download/"
48+
"amd-end-user-license-agreement.pdf"
49+
),
50+
"license_tag": "Beta ",
51+
},
52+
}
53+
54+
hybrid_install_data = {
55+
"1.3.0": {
56+
"artifacts_parents_dir": os.path.join(
57+
DEFAULT_AMD_OGA_HYBRID_DIR, "hybrid-llm-artifacts_1.3.0_lounge"
58+
),
59+
"artifacts_zipfile": (
60+
"https://www.xilinx.com/bin/public/openDownload?"
61+
"filename=hybrid-llm-artifacts_1.3.0_012725.zip"
62+
),
63+
"license_file": (
64+
"https://www.xilinx.com/bin/public/openDownload?"
65+
"filename=AMD%20End%20User%20License%20Agreement.pdf"
66+
),
67+
"license_tag": "",
68+
},
69+
"1.4.0": {
70+
"artifacts_parents_dir": os.path.join(
71+
DEFAULT_AMD_OGA_HYBRID_DIR, "hybrid-llm-artifacts_1.4.0"
72+
),
73+
"artifacts_zipfile": ("ryzen_ai_14_4571/hybrid-llm-artifacts_1.4.0.zip"),
74+
"license_file": (
75+
"https://account.amd.com/content/dam/account/en/licenses/download/"
76+
"amd-end-user-license-agreement.pdf"
77+
),
78+
"license_tag": "Beta ",
79+
},
80+
}
81+
82+
83+
def get_oga_npu_dir():
84+
return DEFAULT_AMD_OGA_NPU_DIR
85+
86+
87+
def get_oga_hybrid_artifacts_parent_dir():
88+
candidates = [f.path for f in os.scandir(DEFAULT_AMD_OGA_HYBRID_DIR) if f.is_dir()]
89+
if not len(candidates) == 1:
90+
raise Exception(
91+
f"Expecting exactly one set of hybrid artifacts at {DEFAULT_AMD_OGA_HYBRID_DIR}"
92+
)
93+
return candidates[0]
94+
3695

3796
class ModelManager:
3897

@@ -294,8 +353,10 @@ def parser() -> argparse.ArgumentParser:
294353

295354
parser.add_argument(
296355
"--ryzenai",
297-
help="Install Ryzen AI software for LLMs. Requires an authentication token.",
298-
choices=["npu", "hybrid", None],
356+
help="Install Ryzen AI software for LLMs. Requires an authentication token. "
357+
"The 'npu' and 'hybrid' choices install the default "
358+
f"{DEFAULT_RYZEN_AI_VERSION} version.",
359+
choices=["npu", "hybrid", "npu-1.3.0", "hybrid-1.3.0"],
299360
)
300361

301362
parser.add_argument(
@@ -348,57 +409,77 @@ def run(
348409
model_manager.download_models(models)
349410

350411
if ryzenai is not None:
412+
version = DEFAULT_RYZEN_AI_VERSION
413+
if ryzenai == "npu-1.3.0":
414+
ryzenai = "npu"
415+
version = "1.3.0"
416+
if ryzenai == "hybrid-1.3.0":
417+
ryzenai = "hybrid"
418+
version = "1.3.0"
351419
if ryzenai == "npu":
352-
file = "ryzen_ai_13_ga/npu-llm-artifacts_1.3.0.zip"
420+
# Check version is valid
421+
if version not in npu_install_data:
422+
raise ValueError(
423+
"Invalid version for NPU. Valid options are "
424+
f"{list(npu_install_data.keys())}."
425+
)
426+
file = npu_install_data[version].get("artifacts_zipfile", None)
353427
install_dir = DEFAULT_AMD_OGA_NPU_DIR
354428
wheels_full_path = os.path.join(install_dir, "amd_oga/wheels")
355-
license = "https://account.amd.com/content/dam/account/en/licenses/download/amd-end-user-license-agreement.pdf"
356-
license_tag = "Beta "
429+
license_file = npu_install_data[version].get("license_file", None)
430+
license_tag = npu_install_data[version].get("license_tag", None)
357431
elif ryzenai == "hybrid":
358-
file = "https://www.xilinx.com/bin/public/openDownload?filename=hybrid-llm-artifacts_1.3.0_012725.zip"
432+
if version not in hybrid_install_data:
433+
raise ValueError(
434+
"Invalid version for Hybrid. Valid options are "
435+
f"{hybrid_install_data.keys()}."
436+
)
437+
file = hybrid_install_data[version].get("artifacts_zipfile", None)
359438
install_dir = DEFAULT_AMD_OGA_HYBRID_DIR
360439
wheels_full_path = os.path.join(
361-
DEFAULT_AMD_OGA_HYBRID_ARTIFACTS_PARENT_DIR,
440+
hybrid_install_data[version]["artifacts_parents_dir"],
362441
"hybrid-llm-artifacts",
363442
"onnxruntime_genai",
364443
"wheel",
365444
)
366-
license = r"https://www.xilinx.com/bin/public/openDownload?filename=AMD%20End%20User%20License%20Agreement.pdf"
367-
license_tag = ""
445+
license_file = hybrid_install_data[version].get("license_file", None)
446+
license_tag = hybrid_install_data[version].get("license_tag", None)
368447
else:
369448
raise ValueError(
370449
f"Value passed to ryzenai argument is not supported: {ryzenai}"
371450
)
372451

373-
if yes:
374-
print(
375-
f"\nYou have accepted the AMD {license_tag}Software End User License Agreement for "
376-
f"Ryzen AI {ryzenai} by providing the `--yes` option. "
377-
"The license file is available for your review at "
378-
# pylint: disable=line-too-long
379-
f"{license}\n"
380-
)
381-
else:
382-
print(
383-
f"\nYou must accept the AMD {license_tag}Software End User License Agreement in "
384-
"order to install this software. To continue, type the word yes "
385-
"to assert that you agree and are authorized to agree "
386-
"on behalf of your organization, to the terms and "
387-
f"conditions, in the {license_tag}Software End User License Agreement, "
388-
"which terms and conditions may be reviewed, downloaded and "
389-
"printed from this link: "
390-
# pylint: disable=line-too-long
391-
f"{license}\n"
392-
)
393-
394-
response = input("Would you like to accept the license (yes/No)? ")
395-
if response.lower() == "yes" or response.lower() == "y":
396-
pass
452+
if license_file:
453+
if yes:
454+
print(
455+
f"\nYou have accepted the AMD {license_tag}Software End User License "
456+
f"Agreement for Ryzen AI {ryzenai} by providing the `--yes` option. "
457+
"The license file is available for your review at "
458+
f"{license_file}\n"
459+
)
397460
else:
398-
raise LicenseRejected(
399-
"Exiting because the license was not accepted."
461+
print(
462+
f"\nYou must accept the AMD {license_tag}Software End User License "
463+
"Agreement in order to install this software. To continue, type the word "
464+
"yes to assert that you agree and are authorized to agree "
465+
"on behalf of your organization, to the terms and "
466+
f"conditions, in the {license_tag}Software End User License Agreement, "
467+
"which terms and conditions may be reviewed, downloaded and "
468+
"printed from this link: "
469+
f"{license_file}\n"
400470
)
401471

472+
response = input("Would you like to accept the license (yes/No)? ")
473+
if response.lower() == "yes" or response.lower() == "y":
474+
pass
475+
else:
476+
raise LicenseRejected(
477+
"Exiting because the license was not accepted."
478+
)
479+
480+
print(f"Downloading {ryzenai} artifacts for Ryzen AI {version}.")
481+
print("Wheels will be added to current activated environment.")
482+
402483
archive_file_name = f"oga_{ryzenai}.zip"
403484
archive_file_path = os.path.join(install_dir, archive_file_name)
404485

@@ -412,30 +493,48 @@ def run(
412493
# Remove any artifacts from a previous installation attempt
413494
shutil.rmtree(install_dir)
414495
os.makedirs(install_dir)
415-
if ryzenai == "npu":
416-
print(f"\nDownloading {file} from GitHub LFS to {install_dir}\n")
417-
download_lfs_file(token_to_use, file, archive_file_path)
418-
elif ryzenai == "hybrid":
496+
if any(proto in file for proto in ["https:", "http:"]):
419497
print(f"\nDownloading {file}\n")
420498
download_file(file, archive_file_path)
499+
elif "file:" in file:
500+
local_file = file.replace("file://", "C:/")
501+
print(f"\nCopying {local_file}\n")
502+
shutil.copy(local_file, archive_file_path)
503+
else:
504+
print(f"\nDownloading {file} from GitHub LFS to {install_dir}\n")
505+
download_lfs_file(token_to_use, file, archive_file_path)
421506

422507
# Unzip the file
423508
print(f"\nUnzipping archive {archive_file_path}\n")
424509
unzip_file(archive_file_path, install_dir)
425510

511+
# Write artifacts filename to text file
512+
with open(
513+
os.path.join(install_dir, f"{version}.txt"), "w", encoding="utf-8"
514+
) as f:
515+
f.write(f"Installed artifacts: {file}")
516+
426517
# Install all whl files in the specified wheels folder
427518
print(f"\nInstalling wheels from {wheels_full_path}\n")
428-
for file in os.listdir(wheels_full_path):
429-
if file.endswith(".whl"):
430-
install_cmd = f"{sys.executable} -m pip install {os.path.join(wheels_full_path, file)}"
431-
432-
print(f"\nInstalling {file} with command {install_cmd}\n")
433-
434-
subprocess.run(
435-
install_cmd,
436-
check=True,
437-
shell=True,
438-
)
519+
if version == "1.3.0":
520+
# Install one wheel file at a time (1.3.0 npu build only works this way)
521+
for file in os.listdir(wheels_full_path):
522+
if file.endswith(".whl"):
523+
install_cmd = (
524+
f"{sys.executable} -m pip install "
525+
f"{os.path.join(wheels_full_path, file)}"
526+
)
527+
print(f"\nInstalling {file} with command {install_cmd}\n")
528+
subprocess.run(install_cmd, check=True, shell=True)
529+
else:
530+
# Install all the wheel files together, allowing pip to work out the dependencies
531+
wheel_files = glob.glob(os.path.join(wheels_full_path, "*.whl"))
532+
install_cmd = [sys.executable, "-m", "pip", "install"] + wheel_files
533+
subprocess.run(
534+
install_cmd,
535+
check=True,
536+
shell=True,
537+
)
439538

440539
# Delete the zip file
441540
print(f"\nCleaning up, removing {archive_file_path}\n")
@@ -454,7 +553,10 @@ def run(
454553
package_name="quark",
455554
)
456555
# Install Quark wheel
457-
wheel_url = f"https://www.xilinx.com/bin/public/openDownload?filename=quark-{quark}-py3-none-any.whl"
556+
wheel_url = (
557+
"https://www.xilinx.com/bin/public/openDownload?"
558+
f"filename=quark-{quark}-py3-none-any.whl"
559+
)
458560
wheel_path = os.path.join(
459561
quark_install_dir, f"quark-{quark}-py3-none-any.whl"
460562
)

‎src/turnkeyml/version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "6.0.3"
1+
__version__ = "6.1.0"

0 commit comments

Comments
 (0)
Please sign in to comment.