Skip to content

Commit ad5bba1

Browse files
committed
copy editing
1 parent db079e4 commit ad5bba1

File tree

2 files changed

+18
-15
lines changed

2 files changed

+18
-15
lines changed

docs/lemonade/getting_started.md

+14-11
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ The `lemonade` CLI uses the same style of syntax as `turnkey`, but with a new se
6969

7070
To chat with your LLM try:
7171

72-
OGA:
72+
OGA iGPU:
7373
```bash
7474
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 llm-prompt -p "Hello, my thoughts are"
7575
```
@@ -79,19 +79,19 @@ Hugging Face:
7979
lemonade -i facebook/opt-125m huggingface-load llm-prompt -p "Hello, my thoughts are"
8080
```
8181

82-
The LLM will run on CPU with your provided prompt, and the LLM's response to your prompt will be printed to the screen. You can replace the `"Hello, my thoughts are"` with any prompt you like.
82+
The LLM will run with your provided prompt, and the LLM's response to your prompt will be printed to the screen. You can replace the `"Hello, my thoughts are"` with any prompt you like.
8383
8484
You can also replace the `facebook/opt-125m` with any Hugging Face checkpoint you like, including LLaMA-2, Phi-2, Qwen, Mamba, etc.
8585
86-
You can also set the `--device` argument in `huggingface-load` to load your LLM on a different device.
86+
You can also set the `--device` argument in `oga-load` and `huggingface-load` to load your LLM on a different device.
8787
8888
Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about those tools.
8989
9090
## Accuracy
9191
9292
To measure the accuracy of an LLM using MMLU, try this:
9393
94-
OGA:
94+
OGA iGPU:
9595
```bash
9696
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 accuracy-mmlu --tests management
9797
```
@@ -109,7 +109,7 @@ You can run the full suite of MMLU subjects by omitting the `--test` argument. Y
109109
110110
To measure the time-to-first-token and tokens/second of an LLM, try this:
111111
112-
OGA:
112+
OGA iGPU:
113113
```bash
114114
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 oga-bench
115115
```
@@ -121,14 +121,14 @@ Hugging Face:
121121
122122
That command will run a few warmup iterations, then a few generation iterations where performance data is collected.
123123
124-
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade huggingface-bench -h`.
124+
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` `lemonade huggingface-bench -h`.
125125
126126
## Memory Usage
127127
128-
The peak memory used by the lemonade build is captured in the build output. To capture more granular
128+
The peak memory used by the `lemonade` build is captured in the build output. To capture more granular
129129
memory usage information, use the `--memory` flag. For example:
130130
131-
OGA:
131+
OGA iGPU:
132132
```bash
133133
lemonade --memory -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 oga-bench
134134
```
@@ -145,7 +145,7 @@ contains a figure plotting the memory usage over the build time. Learn more by
145145
146146
You can launch a WebSocket server for your LLM with:
147147
148-
OGA:
148+
OGA iGPU:
149149
```bash
150150
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 serve
151151
```
@@ -163,8 +163,9 @@ Lemonade is also available via API.
163163
164164
## LEAP APIs
165165
166-
The lemonade enablement platform (LEAP) API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) or backend (e.g., CPU, iGPU, Hybrid):
166+
The lemonade enablement platform (LEAP) API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid). This makes it easy to integrate lemonade LLMs into Python applications.
167167
168+
OGA iGPU:
168169
```python
169170
from lemonade import leap
170171
@@ -180,7 +181,9 @@ You can learn more about the LEAP APIs [here](https://github.com/onnx/turnkeyml/
180181
181182
## Low-Level API
182183
183-
Here's a quick example of how to benchmark an LLM using the low-level API, which calls tools one by one:
184+
The low-level API is useful for designing custom experiments, for example to sweep over many checkpoints, devices, and/or tools.
185+
186+
Here's a quick example of how to prompt a Hugging Face LLM using the low-level API, which calls the load and prompt tools one by one:
184187

185188
```python
186189
import lemonade.tools.torch_llm as tl

docs/lemonade/ort_genai_igpu.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# OnnxRuntime GenAI (OGA) for iGPU and CPU
22

3-
[onnxruntime-genai (aka OGA)](https://github.com/microsoft/onnxruntime-genai/tree/main?tab=readme-ov-file) is a new framework created by Microsoft for running ONNX LLMs
3+
[onnxruntime-genai (aka OGA)](https://github.com/microsoft/onnxruntime-genai/tree/main?tab=readme-ov-file) is a new framework created by Microsoft for running ONNX LLMs.
44

55
## Installation
66

@@ -32,9 +32,9 @@ See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lem
3232
## Directory structure:
3333
- The model_builder tool caches Hugging Face files and temporary ONNX external data files in `<LEMONADE CACHE>\model_builder`
3434
- The output from model_builder is stored in `<LEMONADE_CACHE>\oga_models\<MODELNAME>\<SUBFOLDER>`
35-
- `MODELNAME` is the Hugging Face checkpoint name where any '/' is mapped to an '_' and everything is lower case
36-
- `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for igpu, `cpu` for cpu, and `npu` for npu) and `DTYPE` is the datatype
37-
- If the --int4-block-size flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size
35+
- `MODELNAME` is the Hugging Face checkpoint name where any '/' is mapped to an '_' and everything is lower case.
36+
- `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for igpu, `cpu` for cpu, and `npu` for npu) and `DTYPE` is the datatype.
37+
- If the --int4-block-size flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
3838
- Other ONNX models in the format required by onnxruntime-genai can be loaded in lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
3939
- Use the -i and --subfolder flags to specify the folder and subfolder:
4040
- `lemonade -i my_model_name --subfolder my_subfolder --device igpu --dtype int4 oga-load`

0 commit comments

Comments
 (0)