You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -5,11 +5,11 @@
5
5
[](https://github.com/onnx/turnkeyml/blob/main/docs/install.md"Check out our instructions")
6
6
[](https://github.com/onnx/turnkeyml/blob/main/docs/install.md"Check out our instructions")
7
7
8
-
We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing no-code CLIs and low-code APIs for both general ONNX workflows with `turnkey`as well as LLMs with `lemonade`.
8
+
We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing a full SDK for LLMs with the Lemonade SDK, as well as a no-code CLIs for general ONNX workflows with `turnkey`.
| Serve and benchmark LLMs on CPU, GPU, and NPU. <br/> [Click here to get started with `lemonade`.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md)| Export and optimize ONNX models for CNNs and Transformers. <br/> [Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)|
12
+
| Serve and benchmark LLMs on CPU, GPU, and NPU. <br/> [Click here to get started with Lemonade.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md)| Export and optimize ONNX models for CNNs and Transformers. <br/> [Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)|
Copy file name to clipboardexpand all lines: docs/lemonade/README.md
+31-35
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,10 @@
1
-
# Lemonade SDK
1
+
# 🍋 Lemonade SDK
2
2
3
-
The `lemonade` SDK provides everything needed to get up and running quickly with LLMs on OnnxRuntime GenAI (OGA).
3
+
*The long-term objective of the Lemonade SDK is to provide the ONNX ecosystem with the same kind of tools available in the GGUF ecosystem.*
4
+
5
+
Lemonade SDK is built on top of [OnnxRuntime GenAI (OGA)](https://github.com/microsoft/onnxruntime-genai), an ONNX LLM inference engine developed by Microsoft to improve the LLM experience on AI PCs, especially those with accelerator hardware such as Neural Processing Units (NPUs).
6
+
7
+
The Lemonade SDK provides everything needed to get up and running quickly with LLMs on OGA:
4
8
5
9
-[Quick installation from PyPI](#install).
6
10
-[CLI with tools for prompting, benchmarking, and accuracy tests](#cli-commands).
@@ -9,56 +13,48 @@ The `lemonade` SDK provides everything needed to get up and running quickly with
9
13
10
14
# Install
11
15
12
-
You can quickly get started with `lemonade` by installing the `turnkeyml`[PyPI package](#from-pypi) with the appropriate extras for your backend, or you can [install from source](#from-source-code) by cloning and installing this repository.
16
+
You can quickly get started with Lemonade by installing the `turnkeyml`[PyPI package](#installing-from-pypi) with the appropriate extras for your backend, [install from source](#installing-from-source) by cloning and installing this repository, or [with GUI installation for Lemonade Server](#installing-from-lemonade_server_installerexe).
13
17
14
-
## From PyPI
18
+
## Installing From PyPI
15
19
16
-
To install `lemonade` from PyPI:
20
+
To install the Lemonade SDK from PyPI:
17
21
18
22
1. Create and activate a [miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
19
23
```bash
20
24
conda create -n lemon python=3.10
25
+
```
26
+
27
+
```bash
21
28
conda activate lemon
22
29
```
23
30
24
-
3. Install lemonadeforyou backend of choice:
31
+
3. Install Lemonadeforyour backend of choice:
25
32
- [OnnxRuntime GenAI with CPU backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
26
33
```bash
27
-
pip install -e turnkeyml[llm-oga-cpu]
34
+
pip install turnkeyml[llm-oga-cpu]
28
35
```
29
36
- [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
30
37
> Note: Requires Windows and a DirectML-compatible iGPU.
31
38
```bash
32
-
pip install -e turnkeyml[llm-oga-igpu]
39
+
pip install turnkeyml[llm-oga-igpu]
33
40
```
34
41
- OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
35
-
> Note: Ryzen AI Hybrid requires a Windows 11 PC with a AMD Ryzen™ AI 9 HX375, Ryzen AI 9 HX370, or Ryzen AI 9 365 processor.
36
-
> - Install the [Ryzen AI driver >= 32.0.203.237](https://ryzenai.docs.amd.com/en/latest/inst.html#install-npu-drivers) (you can check your driver version under Device Manager > Neural Processors).
37
-
> - Visit the [AMD Hugging Face page](https://huggingface.co/collections/amd/quark-awq-g128-int4-asym-fp16-onnx-hybrid-13-674b307d2ffa21dd68fa41d5) for supported checkpoints.
38
-
```bash
39
-
pip install -e turnkeyml[llm-oga-hybrid]
40
-
lemonade-install --ryzenai hybrid
41
-
```
42
+
> Note: Ryzen AI Hybrid requires a Windows 11 PC with an AMD Ryzen™ AI 300-series processor.
43
+
44
+
- Follow the environment setup instructions [here](https://ryzenai.docs.amd.com/en/latest/llm/high_level_python.html)
42
45
- Hugging Face (PyTorch) LLMs for CPU backend:
43
46
```bash
44
-
pip install -e turnkeyml[llm]
47
+
pip install turnkeyml[llm]
45
48
```
46
49
- llama.cpp: see [instructions](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/llamacpp.md).
47
50
48
51
4. Use `lemonade -h` to explore the LLM tools, and see the [command](#cli-commands) and [API](#api) examples below.
1. `cd turnkeyml` (where `turnkeyml` is the repo root of your clone)
57
-
- Note: be sure to run these installation instructions from the repo root.
58
-
1. Follow the same instructions as in the [PyPI installation](#from-pypi), except replace the `turnkeyml` with a `.`.
59
-
- For example: `pip install -e .[llm-oga-igpu]`
55
+
The Lemonade SDK can be installed from source code by cloning this repository and following the instructions [here](source_installation_inst.md).
60
56
61
-
## From Lemonade_Server_Installer.exe
57
+
## Installing From Lemonade_Server_Installer.exe
62
58
63
59
The Lemonade Server is available as a standalone tool with a one-click Windows installer `.exe`. Check out the [Lemonade_Server_Installer.exe guide](lemonade_server_exe.md) for installation instructions and the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the functionality.
> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), on to the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
75
+
> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), onto the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
80
76
81
77
The `lemonade -h`command will show you which options and Tools are available, and `lemonade TOOL -h` will tell you more about that specific Tool.
82
78
83
79
84
80
## Prompting
85
81
86
-
To prompt your LLM try:
82
+
To prompt your LLM, try one of the following:
87
83
88
84
OGA iGPU:
89
85
```bash
@@ -101,11 +97,11 @@ You can also replace the `facebook/opt-125m` with any Hugging Face checkpoint yo
101
97
102
98
You can also set the `--device` argument in `oga-load` and `huggingface-load` to load your LLM on a different device.
103
99
104
-
Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about those tools.
100
+
Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about these tools.
105
101
106
102
## Accuracy
107
103
108
-
To measure the accuracy of an LLM using MMLU, try this:
104
+
To measure the accuracy of an LLM using MMLU (Measuring Massive Multitask Language Understanding), try the following:
That command will run just the management test from MMLU on your LLM and save the score to the lemonade cache at `~/.cache/lemonade`.
116
+
This command will run just the management test from MMLU on your LLM and save the score to the lemonade cache at `~/.cache/lemonade`. You can also run other subject tests by replacing management with the new test subject name. For the full list of supported subjects, see the [MMLU Accuracy Read Me](mmlu_accuracy.md).
121
117
122
118
You can run the full suite of MMLU subjects by omitting the `--test` argument. You can learn more about this with `lemonade accuracy-mmlu -h`.
123
119
124
120
## Benchmarking
125
121
126
-
To measure the time-to-first-token and tokens/second of an LLM, try this:
122
+
To measure the time-to-first-token and tokens/second of an LLM, try the following:
That command will run a few warmup iterations, then a few generation iterations where performance data is collected.
134
+
This command will run a few warm-up iterations, then a few generation iterations where performance data is collected.
139
135
140
136
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` or `lemonade huggingface-bench -h`.
141
137
@@ -173,15 +169,15 @@ You can launch an OpenAI-compatible server with:
173
169
lemonade serve
174
170
```
175
171
176
-
Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided.
172
+
Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided as well as how to launch the server with more detailed informational messages enabled.
177
173
178
174
# API
179
175
180
176
Lemonade is also available via API.
181
177
182
178
## High-Level APIs
183
179
184
-
The high-level lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate lemonade LLMs into Python applications.
180
+
The high-level Lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate Lemonade LLMs into Python applications.
Copy file name to clipboardexpand all lines: docs/lemonade/ort_genai_igpu.md
+10-10
Original file line number
Diff line number
Diff line change
@@ -4,20 +4,20 @@
4
4
5
5
## Installation
6
6
7
-
See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install) for the OGA iGPU backend.
7
+
See [Lemonade Installation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install) for the OGA iGPU backend.
8
8
9
9
## Get models
10
10
11
-
- The oga-load tool can download models from Hugging Face and build ONNX files using OGA's `model_builder`, which can quantized and optimize models for both igpu and cpu.
11
+
- The oga-load tool can download models from Hugging Face and build ONNX files using OGA's `model_builder`, which can quantize and optimize models for both iGPU and CPU.
- Transformer model architectures supported by the model_builder tool include many popular state-of-the-art models:
20
+
- Transformer model architectures supported by the model_builder tool include many popular state-of-the-art models, such as:
21
21
- Gemma
22
22
- LLaMa
23
23
- Mistral
@@ -26,16 +26,16 @@ See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lem
26
26
- Nemotron
27
27
- For the full list of supported models, please see the [model_builder documentation](https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/README.md).
28
28
- The following quantizations are supported for automatically building ONNXRuntime GenAI model files from the Hugging Face repository:
29
-
- cpu: fp32, int4
30
-
- igpu: fp16, int4
29
+
-`cpu`: `fp32`, `int4`
30
+
-`igpu`: `fp16`, `int4`
31
31
32
32
## Directory structure:
33
33
- The model_builder tool caches Hugging Face files and temporary ONNX external data files in `<LEMONADE CACHE>\model_builder`
34
34
- The output from model_builder is stored in `<LEMONADE_CACHE>\oga_models\<MODELNAME>\<SUBFOLDER>`
35
35
-`MODELNAME` is the Hugging Face checkpoint name where any '/' is mapped to an '_' and everything is lower case.
36
-
-`SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for igpu, `cpu` for cpu, and `npu` for npu) and `DTYPE` is the datatype.
37
-
- If the --int4-block-size flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
38
-
- Other ONNX models in the format required by onnxruntime-genai can be loaded in lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
39
-
- Use the -i and --subfolder flags to specify the folder and subfolder:
36
+
-`SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for `igpu`, `cpu` for `cpu`, and `npu` for `npu`) and `DTYPE` is the datatype.
37
+
- If the `--int4-block-size` flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
38
+
- Other ONNX models in the format required by onnxruntime-genai can be loaded by Lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
39
+
- Use the `-i` and `--subfolder` flags to specify the folder and subfolder, for example:
Copy file name to clipboardexpand all lines: docs/lemonade/server_spec.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ See the [Lemonade_Server_Installer.exe instructions](lemonade_server_exe.md) to
41
41
42
42
### Python Environment
43
43
44
-
If you have `lemonade`[installed in a Python environment](getting_started.md#from-pypi), simply activate it and run the following command to start the server:
44
+
If you have Lemonade[installed in a Python environment](README.md#install), simply activate it and run the following command to start the server:
1.`cd turnkeyml` (where `turnkeyml` is the repo root of your clone)
10
+
- Note: be sure to run these installation instructions from the repo root.
11
+
1. Create and activate a [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
12
+
```bash
13
+
conda create -n lemon python=3.10
14
+
```
15
+
16
+
```bash
17
+
conda activate lemon
18
+
```
19
+
20
+
3. Install Lemonade for your backend of choice:
21
+
- [OnnxRuntime GenAI with CPU backend](ort_genai_igpu.md):
22
+
```bash
23
+
pip install -e .[llm-oga-cpu]
24
+
```
25
+
- [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](ort_genai_igpu.md):
26
+
> Note: Requires Windows and a DirectML-compatible iGPU.
27
+
```bash
28
+
pip install -e .[llm-oga-igpu]
29
+
```
30
+
- OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
31
+
> Note: Ryzen AI Hybrid requires a Windows 11 PC with an AMD Ryzen™ AI 300-series processor.
32
+
> - Ensure you have the correct driver version installed by checking [here](https://ryzenai.docs.amd.com/en/latest/inst.html#install-npu-drivers).
33
+
> - Visit the [AMD Hugging Face OGA Hybrid collection](https://huggingface.co/collections/amd/ryzenai-14-llm-hybrid-models-67da31231bba0f733750a99c) for supported checkpoints.
34
+
```bash
35
+
pip install -e .[llm-oga-hybrid]
36
+
```
37
+
38
+
```bash
39
+
lemonade-install --ryzenai hybrid
40
+
```
41
+
- Hugging Face (PyTorch) LLMs for CPU backend:
42
+
```bash
43
+
pip install -e .[llm]
44
+
```
45
+
- llama.cpp: see [instructions](llamacpp.md).
46
+
47
+
4. Use `lemonade -h` to explore the LLM tools, and see the [commands](README.md#cli-commands) and [APIs](README.md#api) in the [Lemonade SDK REAMDE](README.md).
0 commit comments