onnx · Mar 24, 2025
diff --git a/‎README.md
+3-3 b/‎README.md
+3-3
diff --git a/‎docs/lemonade/getting_started.md ‎docs/lemonade/README.md
+31-35 b/‎docs/lemonade/getting_started.md ‎docs/lemonade/README.md
+31-35
diff --git a/‎docs/lemonade/ort_genai_igpu.md
+10-10 b/‎docs/lemonade/ort_genai_igpu.md
+10-10
diff --git a/‎docs/lemonade/server_spec.md
+1-1 b/‎docs/lemonade/server_spec.md
+1-1
diff --git a/‎docs/lemonade/source_installation_inst.md
+47 b/‎docs/lemonade/source_installation_inst.md
+47
diff --git a/‎docs/readme.md
+1-1 b/‎docs/readme.md
+1-1
diff --git a/‎examples/lemonade/api_oga_cpu.py
+1-1 b/‎examples/lemonade/api_oga_cpu.py
+1-1
diff --git a/‎examples/lemonade/api_oga_cpu_streaming.py
+1-1 b/‎examples/lemonade/api_oga_cpu_streaming.py
+1-1
diff --git a/‎examples/lemonade/api_oga_hybrid.py
+1-1 b/‎examples/lemonade/api_oga_hybrid.py
+1-1
diff --git a/‎examples/lemonade/api_oga_hybrid_streaming.py
+2-2 b/‎examples/lemonade/api_oga_hybrid_streaming.py
+2-2
diff --git a/‎examples/lemonade/api_oga_igpu.py
+1-1 b/‎examples/lemonade/api_oga_igpu.py
+1-1
diff --git a/‎examples/lemonade/api_oga_igpu_streaming.py
+1-1 b/‎examples/lemonade/api_oga_igpu_streaming.py
+1-1
diff --git a/‎examples/lemonade/api_oga_npu.py
+1-1 b/‎examples/lemonade/api_oga_npu.py
+1-1
diff --git a/‎examples/lemonade/api_oga_npu_streaming.py
+1-1 b/‎examples/lemonade/api_oga_npu_streaming.py
+1-1
diff --git a/‎src/lemonade/cli.py
+1-1 b/‎src/lemonade/cli.py
+1-1
diff --git a/‎src/lemonade/tools/ort_genai/oga.py
+266-198 b/‎src/lemonade/tools/ort_genai/oga.py
+266-198
diff --git a/‎src/lemonade_install/install.py
+157-55 b/‎src/lemonade_install/install.py
+157-55
diff --git a/‎src/turnkeyml/version.py
+1-1 b/‎src/turnkeyml/version.py
+1-1
@@ -5,11 +5,11 @@
 [![OS - Windows | Linux](https://img.shields.io/badge/OS-windows%20%7C%20linux-blue)](https://github.com/onnx/turnkeyml/blob/main/docs/install.md "Check out our instructions")
 [![Made with Python](https://img.shields.io/badge/Python-3.8,3.10-blue?logo=python&logoColor=white)](https://github.com/onnx/turnkeyml/blob/main/docs/install.md "Check out our instructions")
 
-We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing no-code CLIs and low-code APIs for both general ONNX workflows with `turnkey` as well as LLMs with `lemonade`.
+We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing a full SDK for LLMs with the Lemonade SDK, as well as a no-code CLIs for general ONNX workflows with `turnkey`.
 
-|                     [**Lemonade SDK**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md)                    	|                            [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)                                	|
+|                     [**Lemonade SDK**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md)                    	|                            [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)                                	|
 |:----------------------------------------------:	|:-----------------------------------------------------------------:	|
-| Serve and benchmark LLMs on CPU, GPU, and NPU. <br/>	[Click here to get started with `lemonade`.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md) | Export and optimize ONNX models for CNNs and Transformers. <br/>	[Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)	|
+| Serve and benchmark LLMs on CPU, GPU, and NPU. <br/>	[Click here to get started with Lemonade.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) | Export and optimize ONNX models for CNNs and Transformers. <br/>	[Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)	|
 | <img src="https://github.com/onnx/turnkeyml/blob/main/img/llm_demo.png?raw=true"/> | <img src="https://github.com/onnx/turnkeyml/blob/main/img/classic_demo.png?raw=true"/> |
 
 
 
@@ -1,6 +1,10 @@
-# Lemonade SDK
+# 🍋 Lemonade SDK
 
-The `lemonade` SDK provides everything needed to get up and running quickly with LLMs on OnnxRuntime GenAI (OGA). 
+*The long-term objective of the Lemonade SDK is to provide the ONNX ecosystem with the same kind of tools available in the GGUF ecosystem.*
+
+Lemonade SDK is built on top of [OnnxRuntime GenAI (OGA)](https://github.com/microsoft/onnxruntime-genai), an ONNX LLM inference engine developed by Microsoft to improve the LLM experience on AI PCs, especially those with accelerator hardware such as Neural Processing Units (NPUs).
+
+The Lemonade SDK provides everything needed to get up and running quickly with LLMs on OGA:
 
 - [Quick installation from PyPI](#install). 
 - [CLI with tools for prompting, benchmarking, and accuracy tests](#cli-commands).
@@ -9,56 +13,48 @@ The `lemonade` SDK provides everything needed to get up and running quickly with
 
 # Install
 
-You can quickly get started with `lemonade` by installing the `turnkeyml` [PyPI package](#from-pypi) with the appropriate extras for your backend, or you can [install from source](#from-source-code) by cloning and installing this repository.
+You can quickly get started with Lemonade by installing the `turnkeyml` [PyPI package](#installing-from-pypi) with the appropriate extras for your backend, [install from source](#installing-from-source) by cloning and installing this repository, or [with GUI installation for Lemonade Server](#installing-from-lemonade_server_installerexe).
 
-## From PyPI
+## Installing From PyPI
 
-To install `lemonade` from PyPI:
+To install the Lemonade SDK from PyPI:
 
 1. Create and activate a [miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
     ```bash
     conda create -n lemon python=3.10
+    ```
+
+    ```bash
     conda activate lemon
     ```
 
-3. Install lemonade for you backend of choice: 
+3. Install Lemonade for your backend of choice: 
     - [OnnxRuntime GenAI with CPU backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md): 
         ```bash
-            pip install -e turnkeyml[llm-oga-cpu]
+        pip install turnkeyml[llm-oga-cpu]
         ```
     - [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
         > Note: Requires Windows and a DirectML-compatible iGPU.
         ```bash
-            pip install -e turnkeyml[llm-oga-igpu]
+        pip install turnkeyml[llm-oga-igpu]
         ```
     - OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
-        > Note: Ryzen AI Hybrid requires a Windows 11 PC with a AMD Ryzen™ AI 9 HX375, Ryzen AI 9 HX370, or Ryzen AI 9 365 processor.
-        > - Install the [Ryzen AI driver >= 32.0.203.237](https://ryzenai.docs.amd.com/en/latest/inst.html#install-npu-drivers) (you can check your driver version under Device Manager > Neural Processors).
-        > - Visit the [AMD Hugging Face page](https://huggingface.co/collections/amd/quark-awq-g128-int4-asym-fp16-onnx-hybrid-13-674b307d2ffa21dd68fa41d5) for supported checkpoints.
-        ```bash
-            pip install -e turnkeyml[llm-oga-hybrid]
-            lemonade-install --ryzenai hybrid
-        ```
+        > Note: Ryzen AI Hybrid requires a Windows 11 PC with an AMD Ryzen™ AI 300-series processor.
+
+        - Follow the environment setup instructions [here](https://ryzenai.docs.amd.com/en/latest/llm/high_level_python.html)
     - Hugging Face (PyTorch) LLMs for CPU backend:
         ```bash
-            pip install -e turnkeyml[llm]
+            pip install turnkeyml[llm]
         ```
     - llama.cpp: see [instructions](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/llamacpp.md).
 
 4. Use `lemonade -h` to explore the LLM tools, and see the [command](#cli-commands) and [API](#api) examples below.
 
+## Installing From Source
 
-## From Source Code
-
-To install `lemonade` from source code:
-
-1. Clone: `git clone https://github.com/onnx/turnkeyml.git`
-1. `cd turnkeyml` (where `turnkeyml` is the repo root of your clone)
-    - Note: be sure to run these installation instructions from the repo root.
-1. Follow the same instructions as in the [PyPI installation](#from-pypi), except replace the `turnkeyml` with a `.`.
-    - For example: `pip install -e .[llm-oga-igpu]`
+The Lemonade SDK can be installed from source code by cloning this repository and following the instructions [here](source_installation_inst.md).
 
-## From Lemonade_Server_Installer.exe
+## Installing From Lemonade_Server_Installer.exe
 
 The Lemonade Server is available as a standalone tool with a one-click Windows installer `.exe`. Check out the [Lemonade_Server_Installer.exe guide](lemonade_server_exe.md) for installation instructions and the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the functionality.
 
@@ -76,14 +72,14 @@ lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4
 
 Can be read like this:
 
-> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), on to the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
+> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), onto the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
 
 The `lemonade -h` command will show you which options and Tools are available, and `lemonade TOOL -h` will tell you more about that specific Tool.
 
 
 ## Prompting
 
-To prompt your LLM try:
+To prompt your LLM, try one of the following:
 
 OGA iGPU:
 ```bash
@@ -101,11 +97,11 @@ You can also replace the `facebook/opt-125m` with any Hugging Face checkpoint yo
 
 You can also set the `--device` argument in `oga-load` and `huggingface-load` to load your LLM on a different device.
 
-Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about those tools.
+Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about these tools.
 
 ## Accuracy
 
-To measure the accuracy of an LLM using MMLU, try this:
+To measure the accuracy of an LLM using MMLU (Measuring Massive Multitask Language Understanding), try the following:
 
 OGA iGPU:
 ```bash
@@ -117,13 +113,13 @@ Hugging Face:
     lemonade -i facebook/opt-125m huggingface-load accuracy-mmlu --tests management
 ```
 
-That command will run just the management test from MMLU on your LLM and save the score to the lemonade cache at `~/.cache/lemonade`.
+This command will run just the management test from MMLU on your LLM and save the score to the lemonade cache at `~/.cache/lemonade`. You can also run other subject tests by replacing management with the new test subject name. For the full list of supported subjects, see the [MMLU Accuracy Read Me](mmlu_accuracy.md).
 
 You can run the full suite of MMLU subjects by omitting the `--test` argument. You can learn more about this with `lemonade accuracy-mmlu -h`.
 
 ## Benchmarking
 
-To measure the time-to-first-token and tokens/second of an LLM, try this:
+To measure the time-to-first-token and tokens/second of an LLM, try the following:
 
 OGA iGPU:
 ```bash
@@ -135,7 +131,7 @@ Hugging Face:
     lemonade -i facebook/opt-125m huggingface-load huggingface-bench
 ```
 
-That command will run a few warmup iterations, then a few generation iterations where performance data is collected.
+This command will run a few warm-up iterations, then a few generation iterations where performance data is collected.
 
 The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` or `lemonade huggingface-bench -h`.
 
@@ -173,15 +169,15 @@ You can launch an OpenAI-compatible server with:
     lemonade serve
 ```
 
-Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided.
+Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided as well as how to launch the server with more detailed informational messages enabled.
 
 # API
 
 Lemonade is also available via API. 
 
 ## High-Level APIs
 
-The high-level lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate lemonade LLMs into Python applications.
+The high-level Lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate Lemonade LLMs into Python applications.
 
 OGA iGPU:
 ```python
 
@@ -4,20 +4,20 @@
 
 ## Installation
 
-See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install) for the OGA iGPU backend.
+See [Lemonade Installation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install) for the OGA iGPU backend.
 
 ## Get models
 
-- The oga-load tool can download models from Hugging Face and build ONNX files using OGA's `model_builder`, which can quantized and optimize models for both igpu and cpu.
+- The oga-load tool can download models from Hugging Face and build ONNX files using OGA's `model_builder`, which can quantize and optimize models for both iGPU and CPU.
 - Download and build ONNX model files:
   - `lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4`
   - `lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device cpu --dtype int4`
 - The ONNX model files will be stored in the respective subfolder of the lemonade cache folder and will be reused in future oga-load calls:
   - `oga_models\microsoft_phi-3-mini-4k-instruct\dml-int4`
   - `oga_models\microsoft_phi-3-mini-4k-instruct\cpu-int4`
-- The ONNX model build process can be forced to run again, overwriting the above cache, by using the --force flag:
+- The ONNX model build process can be forced to run again, overwriting the above cache, by using the `--force` flag:
   - `lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 --force`
-- Transformer model architectures supported by the model_builder tool include many popular state-of-the-art models:
+- Transformer model architectures supported by the model_builder tool include many popular state-of-the-art models, such as:
   - Gemma
   - LLaMa
   - Mistral
@@ -26,16 +26,16 @@ See [lemonade installation](https://github.com/onnx/turnkeyml/blob/main/docs/lem
   - Nemotron
 - For the full list of supported models, please see the [model_builder documentation](https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/README.md).
 - The following quantizations are supported for automatically building ONNXRuntime GenAI model files from the Hugging Face repository:
-  - cpu: fp32, int4
-  - igpu: fp16, int4
+  - `cpu`: `fp32`, `int4`
+  - `igpu`: `fp16`, `int4`
 
 ## Directory structure:
 - The model_builder tool caches Hugging Face files and temporary ONNX external data files in `<LEMONADE CACHE>\model_builder`
 - The output from model_builder is stored in `<LEMONADE_CACHE>\oga_models\<MODELNAME>\<SUBFOLDER>`
   - `MODELNAME` is the Hugging Face checkpoint name where any '/' is mapped to an '_' and everything is lower case.
-  - `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for igpu, `cpu` for cpu, and `npu` for npu) and `DTYPE` is the datatype.
-  - If the --int4-block-size flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
-- Other ONNX models in the format required by onnxruntime-genai can be loaded in lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
-  - Use the -i and --subfolder flags to specify the folder and subfolder:
+  - `SUBFOLDER` is `<EP>-<DTYPE>`, where `EP` is the execution provider (`dml` for `igpu`, `cpu` for `cpu`, and `npu` for `npu`) and `DTYPE` is the datatype.
+  - If the `--int4-block-size` flag is used then `SUBFOLDER` is` <EP>-<DTYPE>-block-<SIZE>` where `SIZE` is the specified block size.
+- Other ONNX models in the format required by onnxruntime-genai can be loaded by Lemonade if placed in the `<LEMONADE_CACHE>\oga_models` folder.
+  - Use the `-i` and `--subfolder` flags to specify the folder and subfolder, for example:
     - `lemonade -i my_model_name --subfolder my_subfolder --device igpu --dtype int4 oga-load`
   - Lemonade will expect the ONNX model files to be located in `<LEMONADE_CACHE>\oga_models\my_model_name\my_subfolder`
@@ -41,7 +41,7 @@ See the [Lemonade_Server_Installer.exe instructions](lemonade_server_exe.md) to
 
 ### Python Environment
 
-If you have `lemonade` [installed in a Python environment](getting_started.md#from-pypi), simply activate it and run the following command to start the server:
+If you have Lemonade [installed in a Python environment](README.md#install), simply activate it and run the following command to start the server:
 
 ```bash
 lemonade serve
 
@@ -0,0 +1,47 @@
+# Installing the Lemonade SDK From Source Code
+
+The following provides the steps to install Lemonade from source code. We also provide 2 alternative ways to install Lemonade:
+
+* To install Lemonade via PyPi, see the [Lemonade README](README.md).
+* To install Lemonade Server using the standalone GUI installer, see the [Lemonade Server Installer README](lemonade_server_exe.md).
+
+1. Clone: `git clone https://github.com/onnx/turnkeyml.git`
+1. `cd turnkeyml` (where `turnkeyml` is the repo root of your clone)
+    - Note: be sure to run these installation instructions from the repo root.
+1. Create and activate a [Miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
+    ```bash
+    conda create -n lemon python=3.10
+    ```
+
+    ```bash
+    conda activate lemon
+    ```
+
+3. Install Lemonade for your backend of choice: 
+    - [OnnxRuntime GenAI with CPU backend](ort_genai_igpu.md): 
+        ```bash
+        pip install -e .[llm-oga-cpu]
+        ```
+    - [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](ort_genai_igpu.md):
+        > Note: Requires Windows and a DirectML-compatible iGPU.
+        ```bash
+        pip install -e .[llm-oga-igpu]
+        ```
+    - OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
+        > Note: Ryzen AI Hybrid requires a Windows 11 PC with an AMD Ryzen™ AI 300-series processor.
+        > - Ensure you have the correct driver version installed by checking [here](https://ryzenai.docs.amd.com/en/latest/inst.html#install-npu-drivers).
+        > - Visit the [AMD Hugging Face OGA Hybrid collection](https://huggingface.co/collections/amd/ryzenai-14-llm-hybrid-models-67da31231bba0f733750a99c) for supported checkpoints.
+        ```bash
+        pip install -e .[llm-oga-hybrid]
+        ```
+
+        ```bash
+        lemonade-install --ryzenai hybrid
+        ```
+    - Hugging Face (PyTorch) LLMs for CPU backend:
+        ```bash
+            pip install -e .[llm]
+        ```
+    - llama.cpp: see [instructions](llamacpp.md).
+
+4. Use `lemonade -h` to explore the LLM tools, and see the [commands](README.md#cli-commands) and [APIs](README.md#api) in the [Lemonade SDK REAMDE](README.md).
@@ -3,7 +3,7 @@
 ## LLMs: `lemonade` tooling
 
 The `docs/lemonade` directory has documentation for the LLM-focused `lemonade` tooling:
-- [Getting Started](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md): start here for LLMs.
+- [Getting Started](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md): start here for LLMs.
 - Accuracy tests (task performance):
   - [HumanEval](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/humaneval_accuracy.md): details of the HumanEval coding task test.
   - [MMLU](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/mmlu_accuracy.md): details of the MMLU general reasoning test.
 
@@ -5,7 +5,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from lemonade.api import from_pretrained
 
@@ -8,7 +8,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from threading import Thread
 
@@ -5,7 +5,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from lemonade.api import from_pretrained
 
@@ -1,6 +1,6 @@
 """
 This example demonstrates how to use the lemonade API to load a model for
-inference on Ryzen AI hybrid mode (NPU and iGPU together) via OnnxRuntime-GenAI 
+inference on Ryzen AI hybrid mode (NPU and iGPU together) via OnnxRuntime-GenAI
 using the oga-cpu recipe, and then use a thread to generate a streaming the
 response to a prompt.
 
@@ -9,7 +9,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from threading import Thread
 
@@ -5,7 +5,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from lemonade.api import from_pretrained
 
@@ -8,7 +8,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from threading import Thread
 
@@ -5,7 +5,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from lemonade.api import from_pretrained
 
@@ -8,7 +8,7 @@
 
 Make sure you have set up your OGA device in your Python environment.
 See for details:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md#install-onnxruntime-genai
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md#install
 """
 
 from threading import Thread
 
@@ -64,7 +64,7 @@ def main():
         description=f"""Tools for evaluating and deploying LLMs (v{version_number}).
 
 Read this to learn the command syntax:
-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md""",
+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md""",
         formatter_class=NiceHelpFormatter,
     )
 
 
@@ -13,26 +13,85 @@
 import pkg_resources
 from pathlib import Path
 from typing import Optional
+import glob
 import zipfile
 import requests
 import huggingface_hub
 
+DEFAULT_RYZEN_AI_VERSION = "1.4.0"
+
 lemonade_install_dir = Path(__file__).parent.parent.parent
 DEFAULT_AMD_OGA_NPU_DIR = os.path.join(
     lemonade_install_dir, "install", "ryzen_ai", "npu"
 )
 DEFAULT_AMD_OGA_HYBRID_DIR = os.path.join(
     lemonade_install_dir, "install", "ryzen_ai", "hybrid"
 )
-DEFAULT_AMD_OGA_HYBRID_ARTIFACTS_PARENT_DIR = os.path.join(
-    DEFAULT_AMD_OGA_HYBRID_DIR,
-    "hybrid-llm-artifacts_1.3.0_lounge",
-)
 DEFAULT_QUARK_VERSION = "quark-0.6.0"
 DEFAULT_QUARK_DIR = os.path.join(
     lemonade_install_dir, "install", "quark", DEFAULT_QUARK_VERSION
 )
 
+npu_install_data = {
+    "1.3.0": {
+        "artifacts_zipfile": "ryzen_ai_13_ga/npu-llm-artifacts_1.3.0.zip",
+        "license_file": (
+            "https://account.amd.com/content/dam/account/en/licenses/download/"
+            "amd-end-user-license-agreement.pdf"
+        ),
+        "license_tag": "Beta ",
+    },
+    "1.4.0": {
+        "artifacts_zipfile": "ryzen_ai_14_4571/npu-llm-artifacts_1.4.0.zip",
+        "license_file": (
+            "https://account.amd.com/content/dam/account/en/licenses/download/"
+            "amd-end-user-license-agreement.pdf"
+        ),
+        "license_tag": "Beta ",
+    },
+}
+
+hybrid_install_data = {
+    "1.3.0": {
+        "artifacts_parents_dir": os.path.join(
+            DEFAULT_AMD_OGA_HYBRID_DIR, "hybrid-llm-artifacts_1.3.0_lounge"
+        ),
+        "artifacts_zipfile": (
+            "https://www.xilinx.com/bin/public/openDownload?"
+            "filename=hybrid-llm-artifacts_1.3.0_012725.zip"
+        ),
+        "license_file": (
+            "https://www.xilinx.com/bin/public/openDownload?"
+            "filename=AMD%20End%20User%20License%20Agreement.pdf"
+        ),
+        "license_tag": "",
+    },
+    "1.4.0": {
+        "artifacts_parents_dir": os.path.join(
+            DEFAULT_AMD_OGA_HYBRID_DIR, "hybrid-llm-artifacts_1.4.0"
+        ),
+        "artifacts_zipfile": ("ryzen_ai_14_4571/hybrid-llm-artifacts_1.4.0.zip"),
+        "license_file": (
+            "https://account.amd.com/content/dam/account/en/licenses/download/"
+            "amd-end-user-license-agreement.pdf"
+        ),
+        "license_tag": "Beta ",
+    },
+}
+
+
+def get_oga_npu_dir():
+    return DEFAULT_AMD_OGA_NPU_DIR
+
+
+def get_oga_hybrid_artifacts_parent_dir():
+    candidates = [f.path for f in os.scandir(DEFAULT_AMD_OGA_HYBRID_DIR) if f.is_dir()]
+    if not len(candidates) == 1:
+        raise Exception(
+            f"Expecting exactly one set of hybrid artifacts at {DEFAULT_AMD_OGA_HYBRID_DIR}"
+        )
+    return candidates[0]
+
 
 class ModelManager:
 
@@ -294,8 +353,10 @@ def parser() -> argparse.ArgumentParser:
 
         parser.add_argument(
             "--ryzenai",
-            help="Install Ryzen AI software for LLMs. Requires an authentication token.",
-            choices=["npu", "hybrid", None],
+            help="Install Ryzen AI software for LLMs. Requires an authentication token. "
+            "The 'npu' and 'hybrid' choices install the default "
+            f"{DEFAULT_RYZEN_AI_VERSION} version.",
+            choices=["npu", "hybrid", "npu-1.3.0", "hybrid-1.3.0"],
         )
 
         parser.add_argument(
@@ -348,57 +409,77 @@ def run(
             model_manager.download_models(models)
 
         if ryzenai is not None:
+            version = DEFAULT_RYZEN_AI_VERSION
+            if ryzenai == "npu-1.3.0":
+                ryzenai = "npu"
+                version = "1.3.0"
+            if ryzenai == "hybrid-1.3.0":
+                ryzenai = "hybrid"
+                version = "1.3.0"
             if ryzenai == "npu":
-                file = "ryzen_ai_13_ga/npu-llm-artifacts_1.3.0.zip"
+                # Check version is valid
+                if version not in npu_install_data:
+                    raise ValueError(
+                        "Invalid version for NPU.  Valid options are "
+                        f"{list(npu_install_data.keys())}."
+                    )
+                file = npu_install_data[version].get("artifacts_zipfile", None)
                 install_dir = DEFAULT_AMD_OGA_NPU_DIR
                 wheels_full_path = os.path.join(install_dir, "amd_oga/wheels")
-                license = "https://account.amd.com/content/dam/account/en/licenses/download/amd-end-user-license-agreement.pdf"
-                license_tag = "Beta "
+                license_file = npu_install_data[version].get("license_file", None)
+                license_tag = npu_install_data[version].get("license_tag", None)
             elif ryzenai == "hybrid":
-                file = "https://www.xilinx.com/bin/public/openDownload?filename=hybrid-llm-artifacts_1.3.0_012725.zip"
+                if version not in hybrid_install_data:
+                    raise ValueError(
+                        "Invalid version for Hybrid.  Valid options are "
+                        f"{hybrid_install_data.keys()}."
+                    )
+                file = hybrid_install_data[version].get("artifacts_zipfile", None)
                 install_dir = DEFAULT_AMD_OGA_HYBRID_DIR
                 wheels_full_path = os.path.join(
-                    DEFAULT_AMD_OGA_HYBRID_ARTIFACTS_PARENT_DIR,
+                    hybrid_install_data[version]["artifacts_parents_dir"],
                     "hybrid-llm-artifacts",
                     "onnxruntime_genai",
                     "wheel",
                 )
-                license = r"https://www.xilinx.com/bin/public/openDownload?filename=AMD%20End%20User%20License%20Agreement.pdf"
-                license_tag = ""
+                license_file = hybrid_install_data[version].get("license_file", None)
+                license_tag = hybrid_install_data[version].get("license_tag", None)
             else:
                 raise ValueError(
                     f"Value passed to ryzenai argument is not supported: {ryzenai}"
                 )
 
-            if yes:
-                print(
-                    f"\nYou have accepted the AMD {license_tag}Software End User License Agreement for "
-                    f"Ryzen AI {ryzenai} by providing the `--yes` option. "
-                    "The license file is available for your review at "
-                    # pylint: disable=line-too-long
-                    f"{license}\n"
-                )
-            else:
-                print(
-                    f"\nYou must accept the AMD {license_tag}Software End User License Agreement in "
-                    "order to install this software. To continue, type the word yes "
-                    "to assert that you agree and are authorized to agree "
-                    "on behalf of your organization, to the terms and "
-                    f"conditions, in the {license_tag}Software End User License Agreement, "
-                    "which terms and conditions may be reviewed, downloaded and "
-                    "printed from this link: "
-                    # pylint: disable=line-too-long
-                    f"{license}\n"
-                )
-
-                response = input("Would you like to accept the license (yes/No)? ")
-                if response.lower() == "yes" or response.lower() == "y":
-                    pass
+            if license_file:
+                if yes:
+                    print(
+                        f"\nYou have accepted the AMD {license_tag}Software End User License "
+                        f"Agreement for Ryzen AI {ryzenai} by providing the `--yes` option. "
+                        "The license file is available for your review at "
+                        f"{license_file}\n"
+                    )
                 else:
-                    raise LicenseRejected(
-                        "Exiting because the license was not accepted."
+                    print(
+                        f"\nYou must accept the AMD {license_tag}Software End User License "
+                        "Agreement in order to install this software. To continue, type the word "
+                        "yes to assert that you agree and are authorized to agree "
+                        "on behalf of your organization, to the terms and "
+                        f"conditions, in the {license_tag}Software End User License Agreement, "
+                        "which terms and conditions may be reviewed, downloaded and "
+                        "printed from this link: "
+                        f"{license_file}\n"
                     )
 
+                    response = input("Would you like to accept the license (yes/No)? ")
+                    if response.lower() == "yes" or response.lower() == "y":
+                        pass
+                    else:
+                        raise LicenseRejected(
+                            "Exiting because the license was not accepted."
+                        )
+
+            print(f"Downloading {ryzenai} artifacts for Ryzen AI {version}.")
+            print("Wheels will be added to current activated environment.")
+
             archive_file_name = f"oga_{ryzenai}.zip"
             archive_file_path = os.path.join(install_dir, archive_file_name)
 
@@ -412,30 +493,48 @@ def run(
                 # Remove any artifacts from a previous installation attempt
                 shutil.rmtree(install_dir)
             os.makedirs(install_dir)
-            if ryzenai == "npu":
-                print(f"\nDownloading {file} from GitHub LFS to {install_dir}\n")
-                download_lfs_file(token_to_use, file, archive_file_path)
-            elif ryzenai == "hybrid":
+            if any(proto in file for proto in ["https:", "http:"]):
                 print(f"\nDownloading {file}\n")
                 download_file(file, archive_file_path)
+            elif "file:" in file:
+                local_file = file.replace("file://", "C:/")
+                print(f"\nCopying {local_file}\n")
+                shutil.copy(local_file, archive_file_path)
+            else:
+                print(f"\nDownloading {file} from GitHub LFS to {install_dir}\n")
+                download_lfs_file(token_to_use, file, archive_file_path)
 
             # Unzip the file
             print(f"\nUnzipping archive {archive_file_path}\n")
             unzip_file(archive_file_path, install_dir)
 
+            # Write artifacts filename to text file
+            with open(
+                os.path.join(install_dir, f"{version}.txt"), "w", encoding="utf-8"
+            ) as f:
+                f.write(f"Installed artifacts: {file}")
+
             # Install all whl files in the specified wheels folder
             print(f"\nInstalling wheels from {wheels_full_path}\n")
-            for file in os.listdir(wheels_full_path):
-                if file.endswith(".whl"):
-                    install_cmd = f"{sys.executable} -m pip install {os.path.join(wheels_full_path, file)}"
-
-                    print(f"\nInstalling {file} with command {install_cmd}\n")
-
-                    subprocess.run(
-                        install_cmd,
-                        check=True,
-                        shell=True,
-                    )
+            if version == "1.3.0":
+                # Install one wheel file at a time (1.3.0 npu build only works this way)
+                for file in os.listdir(wheels_full_path):
+                    if file.endswith(".whl"):
+                        install_cmd = (
+                            f"{sys.executable} -m pip install "
+                            f"{os.path.join(wheels_full_path, file)}"
+                        )
+                        print(f"\nInstalling {file} with command {install_cmd}\n")
+                        subprocess.run(install_cmd, check=True, shell=True)
+            else:
+                # Install all the wheel files together, allowing pip to work out the dependencies
+                wheel_files = glob.glob(os.path.join(wheels_full_path, "*.whl"))
+                install_cmd = [sys.executable, "-m", "pip", "install"] + wheel_files
+                subprocess.run(
+                    install_cmd,
+                    check=True,
+                    shell=True,
+                )
 
             # Delete the zip file
             print(f"\nCleaning up, removing {archive_file_path}\n")
@@ -454,7 +553,10 @@ def run(
                 package_name="quark",
             )
             # Install Quark wheel
-            wheel_url = f"https://www.xilinx.com/bin/public/openDownload?filename=quark-{quark}-py3-none-any.whl"
+            wheel_url = (
+                "https://www.xilinx.com/bin/public/openDownload?"
+                f"filename=quark-{quark}-py3-none-any.whl"
+            )
             wheel_path = os.path.join(
                 quark_install_dir, f"quark-{quark}-py3-none-any.whl"
             )
 
@@ -1 +1 @@
-__version__ = "6.0.3"
+__version__ = "6.1.0"
Original file line number	Diff line number	Diff line change
`@@ -64,7 +64,7 @@ def main():`
`64`	`64`	`description=f"""Tools for evaluating and deploying LLMs (v{version_number}).`
`65`	`65`
`66`	`66`	`Read this to learn the command syntax:`
`67`		`-https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md""",`
	`67`	`+https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md""",`
`68`	`68`	`formatter_class=NiceHelpFormatter,`
`69`	`69`	`)`
`70`	`70`
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "6.0.3"`
	`1`	`+__version__ = "6.1.0"`