copy editing

jeremyfowers · jeremyfowers · commit ad5bba1e829c · 2025-02-07T17:20:14.000-05:00
diff --git a/docs/lemonade/getting_started.md b/docs/lemonade/getting_started.md
@@ -69,7 +69,7 @@ The `lemonade` CLI uses the same style of syntax as `turnkey`, but with a new se
 
 To chat with your LLM try:
 
-OGA:
+OGA iGPU:
 ```bash
     lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 llm-prompt -p "Hello, my thoughts are"
 ```
@@ -79,19 +79,19 @@ Hugging Face:
     lemonade -i facebook/opt-125m huggingface-load llm-prompt -p "Hello, my thoughts are"
 ```
 
-The LLM will run on CPU with your provided prompt, and the LLM's response to your prompt will be printed to the screen. You can replace the `"Hello, my thoughts are"` with any prompt you like.
+The LLM will run with your provided prompt, and the LLM's response to your prompt will be printed to the screen. You can replace the `"Hello, my thoughts are"` with any prompt you like.
 
 You can also replace the `facebook/opt-125m` with any Hugging Face checkpoint you like, including LLaMA-2, Phi-2, Qwen, Mamba, etc.
 
-You can also set the `--device` argument in `huggingface-load` to load your LLM on a different device.
+You can also set the `--device` argument in `oga-load` and `huggingface-load` to load your LLM on a different device.
 
 Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about those tools.
 
 ## Accuracy
 
 To measure the accuracy of an LLM using MMLU, try this:
 
-OGA:
+OGA iGPU:
 ```bash
     lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 accuracy-mmlu --tests management
 ```
@@ -109,7 +109,7 @@ You can run the full suite of MMLU subjects by omitting the `--test` argument. Y
 
 To measure the time-to-first-token and tokens/second of an LLM, try this:
 
-OGA:
+OGA iGPU:
 ```bash
     lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 oga-bench
 ```
@@ -121,14 +121,14 @@ Hugging Face:
 
 That command will run a few warmup iterations, then a few generation iterations where performance data is collected.
 
-The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade huggingface-bench -h`.
+The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` `lemonade huggingface-bench -h`.
 
 ## Memory Usage
 
-The peak memory used by the lemonade build is captured in the build output.  To capture more granular
+The peak memory used by the `lemonade` build is captured in the build output. To capture more granular
 memory usage information, use the `--memory` flag.  For example:
 
-OGA:
+OGA iGPU:
 ```bash
     lemonade --memory -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 oga-bench
 ```
@@ -145,7 +145,7 @@ contains a figure plotting the memory usage over the build time.  Learn more by
 
 You can launch a WebSocket server for your LLM with:
 
-OGA:
+OGA iGPU:
 ```bash
     lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 serve
 ```
@@ -163,8 +163,9 @@ Lemonade is also available via API.
 
 ## LEAP APIs
 
-The lemonade enablement platform (LEAP) API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) or backend (e.g., CPU, iGPU, Hybrid):
+The lemonade enablement platform (LEAP) API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid). This makes it easy to integrate lemonade LLMs into Python applications.
 
+OGA iGPU:
 ```python
 from lemonade import leap
 
@@ -180,7 +181,9 @@ You can learn more about the LEAP APIs [here](https://github.com/onnx/turnkeyml/
 
 ## Low-Level API
 
-Here's a quick example of how to benchmark an LLM using the low-level API, which calls tools one by one:
+The low-level API is useful for designing custom experiments, for example to sweep over many checkpoints, devices, and/or tools.
+
+Here's a quick example of how to prompt a Hugging Face LLM using the low-level API, which calls the load and prompt tools one by one:
 
 ```python
 import lemonade.tools.torch_llm as tl
diff --git a/docs/lemonade/ort_genai_igpu.md b/docs/lemonade/ort_genai_igpu.md
@@ -1,6 +1,6 @@
 # OnnxRuntime GenAI (OGA) for iGPU and CPU
 
-[onnxruntime-genai (aka OGA)](https://github.com/microsoft/onnxruntime-genai/tree/main?tab=readme-ov-file) is a new framework created by Microsoft for running ONNX LLMs
+[onnxruntime-genai (aka OGA)](https://github.com/microsoft/onnxruntime-genai/tree/main?tab=readme-ov-file) is a new framework created by Microsoft for running ONNX LLMs.
 
 ## Installation
 
@@ -32,9 +32,9 @@ See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lem
 ## Directory structure:
 - The model_builder tool caches Hugging Face files and temporary ONNX external data files in `<LEMONADE CACHE>\model_builder`
 - The output from model_builder is stored in `<LEMONADE_CACHE>\oga_models\<MODELNAME>\<SUBFOLDER>`
-  - `MODELNAME` is the Hugging Face checkpoint name where any '/' is mapped to an '_' and everything is lower case
-  - `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for igpu, `cpu` for cpu, and `npu` for npu) and `DTYPE` is the datatype
-  - If the --int4-block-size flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size
+  - `MODELNAME` is the Hugging Face checkpoint name where any '/' is mapped to an '_' and everything is lower case.
+  - `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for igpu, `cpu` for cpu, and `npu` for npu) and `DTYPE` is the datatype.
+  - If the --int4-block-size flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
 - Other ONNX models in the format required by onnxruntime-genai can be loaded in lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
   - Use the -i and --subfolder flags to specify the folder and subfolder:
     - `lemonade -i my_model_name --subfolder my_subfolder --device igpu --dtype int4 oga-load`