Skip to content

Commit a6cdf62

Browse files
jatinwadhwa921sushraja-msftskottmckayseungtaek94co63oc
authored
Rebasing with msft commits (#607)
* Fix flash attention for GQA (Phi4) (microsoft#23850) ### Description This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause appears to be `k_start + capped_sg_id < seq_causal_length` check. This is either because, a. seq_causal_length varies per lane, so the check becomes non uniform control flow, which is having interactions with subgroupShuffle. or b. The check itself is incorrect and is wiping out values of v based on the source lane's seq_causal_length. While in actualness values of v need to be causal as per the lane that is going to multiply it with qkt. qkt is already causal because earlier values of qk for out of bounds k are set to min_value, and exp(<-4) are 0. This fix works by removing that causal check and relying on the qk being wiped out earlier. The documentation for causality behavior for GQA is missing to determine which of this reason is the true reason. Prior to this prompts with sequence length > 16 < 32 or 1k would break with Phi 4 but smaller prompts would work. Tested on Intel Alderlake, Nvidia 4070. * Model Builder API (microsoft#23223) ### Description <!-- Describe your changes. --> Supports creating a model programmatically using the ORT C or C++ API. Supports augmenting an existing model to add nodes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Fix typo: change `Upample` to `Upsample`. (microsoft#23838) ### Description <!-- Describe your changes. --> Fixed a typo in function names related to the Upsample CUDA kernel. Changed incorrect spelling Upample to Upsample across relevant functions. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is necessary to maintain consistency and prevent potential confusion caused by incorrect function names. * [doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (microsoft#23848) ### Description <!-- Describe your changes. --> Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` behavior (microsoft#23856) * Change the logic to generate the default ep context file name (microsoft#23788) Change the logic to generate the default ep context file name ### Description Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name. * Make Nuget QNN package pipeline 1ES compliant (microsoft#23805) ### Description Make [QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [js/common] allows using Uint16Array as data for float16 tensor (microsoft#23827) ### Description Resolve microsoft#23817 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [js/webgpu] Reland the optimization of ConvTranspose (microsoft#23858) This PR fixes the errors in the ConvTranspose optimization and adds tests to ensure the correctness of the implementation. * [OpenVINO] Fix a build warning (microsoft#23877) ### Description Fix a warning with std::move usage ### Motivation and Context Possibly allow building without --compile_no_warning_as_error flag * Change gsl::byte to std::byte (microsoft#23872) To be compatible with the latest GSL library. Without this fix we will get: ``` onnxruntime\core\providers\cpu\controlflow\loop.cc(247): error C4996: 'gsl::byte': Use std::byte instead. ``` * Allow using extended minimal build for several EPs (microsoft#23834) ### Description #### Background From code search, the following EPs use `onnxruntime::GetCpuPreferredNodes()` in their `GetCapabilities()` methods: - CANN - CUDA - DML - JS - ROCM - WebGPU However, the source file that implements `onnxruntime::GetCpuPreferredNodes()` is excluded when minimal build is ON: https://github.com/microsoft/onnxruntime/blob/6df0973e58ba5399fcaa98686f70ed9a9e59aaef/cmake/onnxruntime_framework.cmake#L38-L42 This means that all EPs mentioned above is not able to compile with minimal build. #### Solution The excluded file `core/framework/fallback_cpu_capability.cc` cannot build in minimal build because some of its dependencies are not included in the minimal build. However, in extended minimal build mode, all dependencies are available. This PR looses the restrict and allows to compile this file when it is extended minimal build. After this change, those EPs are able to compile in extended minimal build. * Add dawn to ThirdPartyNotices (microsoft#23876) ### Description Add `dawn` to ThirdPartyNotices. * Enable QNN EP weight sharing generation using public API (microsoft#23702) ### Description Enable QNN EP weight sharing generation using public API instead of internal interfaces, so that user can integrate into their own toolchain. The change is to share the QnnBackendManager across ORT sessions if ep.share_ep_contexts is enabled. And there is extra option to end the share so that we know when to remove the shared QnnBackendManager from the singleton. Change the tool name from onnxruntime_qnn_ctx_gen to ep_weight_sharing_ctx_gen, so that it can be shared for other EPs. * [QNN-EP]: Fix inference failures while running with htp_shared_memory (microsoft#23892) ### Description When using the enable_htp_shared_memory feature, we see that the address of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are not freed leading to memory exhaustion. ### Motivation and Context When using the enable_htp_shared_memory_allocator feature for QNN in GenAI extensions, it leads to inference failures during the second prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI use cases. Co-authored-by: Ashish Garg <ashigarg@qti.qualcomm.com> * Fix enable_pix_capture build for WebGPU (microsoft#23857) The build option --enable_pix_capture is broken. This fixes the problem. --------- Co-authored-by: wp <webgraphics@intel.com> * [WebGPU-EP Native] Add ReduceMean (microsoft#23860) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [WebGPU EP] introduce BiasAdd contrib op (microsoft#23861) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Dynamo export and improve benchmark script for SAM2 encoder (microsoft#23887) ### Description * Add dynamo export for Sam2 image encoder * Verify fp32 onnx model with CPU EP (to avoid error message from TRT EP). * Update benchmark script: - output ORT profiling - output torch compiled code and unique kernel name for compiled kernel - add an option for nightly package installation - uninstall existing ort packages before installing The node metadata of dynamo exported model can help mapping node in onnx model back to pytorch modeling script. Currently, the graph optimization is not done on dynamo exported model, so it is experimental right now. ### Motivation and Context To support profiling of torch compiled CUDA kernel. * [js/web] improve workaround for bundlers (microsoft#23902) ### Description This PR improves the workaround for bundlers in onnxruntime-web. Specifically, the following changes have been made: - Use [this workaround](xenova@9c50aa2) as suggested by @xenova in huggingface/transformers.js#1161 (comment) - Use `url > "file:" && url < "file;"` instead of `url.startsWith("file:")` to allow minifiers to remove dead code correctly. This change allows to remove unnecessary dependencies of file parsed from `new URL("ort.bundle.min.js", import.meta.url)` in Vite, and optimize code like `if("file://filepath.js".startsWith("file:")) {do_sth1(); } else {do_sth2();}` into `do_sth1()` for webpack/terser usages. Resolves huggingface/transformers.js#1161 * [webgpu] Restore MatMulNBits workgroup size for Phi-3.5 (microsoft#23349) ### Description This change restores the MatMulNBits workgroup size from (8, 8, 1) back to (16, 8, 1) to resolve a performance regression observed on Intel iGPUs during token generation (M=1). ### Motivation and Context As above. Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> * [webgpu] support Pad operator (microsoft#23141) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [WebNN] Accept Float16Array for float16 data type if it is available (microsoft#23894) Float16Array is now shipping and WebNN Chromium implementation has accepted it. We should allow it in WebNN EP as well. * Ensure that the 'cmake_minimum_required' is version 3.5 or greater (microsoft#23888) ### Description CMake 4.0 release candidate 2.0 is available, and it cannot compile all of OnnxRuntime out-of-the-box. There's portions of the OnnxRuntime codebase that specify a `cmake_minimum_required` version of 3.0, and CMake 4.0 has removed support for compatibility with CMake < 3.5 - the following error is reported: ``` CMake Error at winml_sdk_helpers.cmake:4 (cmake_minimum_required): Compatibility with CMake < 3.5 has been removed from CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway. ``` Since CMake 3.5 appears to have shipped in 2016, it seems reasonable to set that as a minimum version to fix the error. The root CMakeLists.txt does ask for a minimum version of 3.28, so we could snap to that, but I'm still ramping up on the build, so wanted to propose a minimally sufficient fix. ### Motivation and Context Being able to build with the latest CMake - when it ships - reduces the barrier to entry to building OnnxRuntime, and allows the OnnxRuntime to leverage the latest and greatest tooling. * WebGPU: Remove deprecated subgroups-f16 from WebGPU native and JS EP (microsoft#23898) This PR removes the deprecated subgroups-f16 from WebGPU native and JS EP, and also remove the unused deviceInfo in WebGPU JS EP. * [JSEP/WebGPU] Fixed error in softmax dispatch. (microsoft#23906) ### Description Fixed an error softmax dispatch ### Motivation and Context Produce expected results for LlaMA model * enable WebGPU EP in WebAssembly build (microsoft#23913) ### Description This PR is the first step for migrating the webgpu backend of onnxruntime-web from JSEP based to WebGPU EP based. In this change, we enable building WebGPU EP in a wasm build (ie. `--build_wasm` `--use_webgpu` `--use_jsep`). However, the old build flags should still keep previous behavior. * Adding OpenVINO Windows CI Pipeline (microsoft#23919) ### Description <!-- Describe your changes. --> Enable an OpenVINO Windows CI pipeline. This includes: - Downloading the OpenVINO toolkit for Windows from an external source. - Setting up OpenVINO environment variables. - Building the ONNX Runtime OpenVINO Execution Provider. - Running unit tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change is required to run checks on precommit and commit in the ONNX Runtime project. It ensures that the code is tested with the OpenVINO toolkit on Windows, improving the reliability and compatibility of the project. * [WebGPU EP] SoftMax Implementation (microsoft#23538) Increase coverage for WebGPU Op * Exclude MAUI projects from GPU C# packaging builds (microsoft#23923) ### Description <!-- Describe your changes. --> Use 'desktop only' solution in GPU C# packaging builds. We don't need to include any MAUI support for those builds. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Support all block sizes that are multiples of 32 for DP4A (microsoft#23907) ### Description Simple change 1. The DP4A shader actually supports all block sizes that are multiples of 32, relaxing the restriction and making a small tweak to support sizes other than 32. 2. Moved the shader to a separate file for maintainability. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Example custom op with output type inferencing (microsoft#23916) ### Description <!-- Describe your changes. --> Add example of a custom op that is required to do type inference for the output type for the model load to work. Also acts as an example of how to override an ONNX op with a custom implementation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> microsoft#23891 * Enabling L2+ Optimizations for EPs (microsoft#23517) There are some requirements to modify the graph which are specific to the EP/hardware. ORT has the hardcoded EP list for optimizations but that can't scale and it's hard be extended to enable EP custom optimizations. Here is the prototype to enable L2+ optimizations for EPs (The original overview is provided by @skottmckay) as well as the TRT EP implementation for the ConstantFoldingDQ optimization. Signatures for selection and optimization functions: ```` - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)> - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)> ```` GetCapability - call (new) provider bridge API to lookup pre-defined optimizer by name and get selection function - ComputeCapability.optimize_func, i.e. optimization function, would be set by the optimizer to the function that does the optimization - EP has to update the returning ComputeCapability to include the optimization ComputeCapability in nodes_to_optimize. So that later ORT can perform optimization/transformation accordingly. GraphPartitioner - After assigning the ComputeCapability to the EP and prior to Compile, if the ComputeCapability has nodes_to_optimize, iterate that list - optimization function needs to be called with - a mutable Graph instance - the ComputeCapability for the individual optimization - the overall ComputeCapability so it can be updated * fix binplace file in web pipeline (microsoft#23930) * Updated run_CIs_for_external_pr.py to support the Windows OpenVINO CI pipeline (microsoft#23931) * Fix ConvInteger handling of optional inputs. (microsoft#23935) ### Description <!-- Describe your changes. --> Fix ConvInteger handling of optional inputs. Need to check Exists() and not just the number of inputs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> microsoft#23927 * Updated ov version in pipeline (#595) (microsoft#23882) ### Description This PR updates the OpenVINO version used in the pipeline from 2024.5.0 to 2025.0.0 Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> * [AIX] External data handling (microsoft#23859) ### Description In BE system, model tensor data coming from external file is not handled properly. This was found during the debugging of (microsoft/onnxruntime-genai#1104) This PR changes do the endianness conversion of data loaded from external file in BE system. * Create a packaging pipeline for a custom nuget package (microsoft#23918) * Fix license in example test code. (microsoft#23936) * replace usage of gsl::narrow and gsl::narrow_cast in WebGPU EP (microsoft#23926) ### Description `gsl::narrow` does not work in no exception build. - use `onnxruntime::narrow` if necessary; - or change to `static_cast` if it's obviously safe. also apply the changes to usage of `gsl::narrow_cast`, which does not apply checks. * VCPKG improvement: set VCPKG_OSX_DEPLOYMENT_TARGET (microsoft#23933) ### Description 1. Set VCPKG_OSX_DEPLOYMENT_TARGET for macOS targets 2. Enable VCPKG in more pipelines. * Allow using a different version of flatbuffers when building with vcpkg (microsoft#23946) ### Description Allow using a different version of flatbuffers when building with vcpkg, so that users do not need to pin flatbuffer's version, which provides more flexibility in the build process. Delete utf8_range from the dependencies, because it is an indirect dependency of protobuf, which is already included in the build process. ### Motivation and Context * Make python package pipeline 1ES compliant (microsoft#23800) ### Description Make [Python packaging pipeline](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_build?definitionId=841) 1ES compliant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Checklist - [x] Make Onnxruntime-QNNEP-Windows-2022-CPU stateless * Delete ROCM Nuget Publishing Pipeline (microsoft#23948) * Bump SixLabors.ImageSharp from 2.1.9 to 2.1.10 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (microsoft#23924) Bumps [SixLabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.9 to 2.1.10. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">SixLabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.10</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2859">#2859</a> to release/2.1.x by <a href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2890">SixLabors/ImageSharp#2890</a></li> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy] by <a href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2891">SixLabors/ImageSharp#2891</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/SixLabors/ImageSharp/commit/d133ef99e8becfc3b924b0bb4315e63b8681d307"><code>d133ef9</code></a> Set lang version</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/5dfe5a800367581239de442cc18de659da6e9b1d"><code>5dfe5a8</code></a> Missed cache action update</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4d3a85112b03c89d2cb8616a5b747684b6e73730"><code>4d3a851</code></a> Use latest cache action</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/4cb9f40a722ab2b837157862f0320c6a652da4d0"><code>4cb9f40</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2891">#2891</a> from SixLabors/af/backport-2701</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/bb82f79db0197166271d4355b5fb5ceda370a906"><code>bb82f79</code></a> <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a> to 2.1.x [copy]</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/627b5f721f30f6d529acb50bd81f92bd3db754eb"><code>627b5f7</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2890">#2890</a> from SixLabors/af/backport-2859</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/67f7848d6e975e7956c8056823555de49a5fdf6d"><code>67f7848</code></a> try to fix LFS for *.BMP</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/44d294e06606111195152ead3006452357ef1bb9"><code>44d294e</code></a> 8.0.x is not needed</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/adb85d9e66aa3a588a86f4a4ef9a0539a8502117"><code>adb85d9</code></a> Another attempt for a Linux-specific skip</li> <li><a href="https://github.com/SixLabors/ImageSharp/commit/efc3fc4ee15eec4e523c26f7130e786541b00df2"><code>efc3fc4</code></a> Disable BmpDecoder_CanDecode_Os2BitmapArray on Linux</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=SixLabors.ImageSharp&package-manager=nuget&previous-version=2.1.9&new-version=2.1.10)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: Jianhui Dai <jianhui.j.dai@intel.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Sushanth Rajasankar <44513542+sushraja-msft@users.noreply.github.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Seungtaek Kim <seungtaek.kim.94@gmail.com> Co-authored-by: co63oc <co63oc@users.noreply.github.com> Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: Alessio Soldano <services@soldano.it> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Ashish Garg <quic_ashigarg@quicinc.com> Co-authored-by: Ashish Garg <ashigarg@qti.qualcomm.com> Co-authored-by: Jie Chen <jie.a.chen@intel.com> Co-authored-by: wp <webgraphics@intel.com> Co-authored-by: Satya Kumar Jandhyala <satya.k.jandhyala@gmail.com> Co-authored-by: Prathik Rao <prathik.rao@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Jianhui Dai <jianhui.j.dai@intel.com> Co-authored-by: xhcao <xinghua.cao@intel.com> Co-authored-by: Wanming Lin <wanming.lin@intel.com> Co-authored-by: Mark Schofield <mschofie@microsoft.com> Co-authored-by: jiangzhaoming <zhaoming.jiang@microsoft.com> Co-authored-by: Yi-Hong Lyu <yilyu@microsoft.com> Co-authored-by: vraspar <vrajang@outlook.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: saurabh <saurabh1.kale@intel.com> Co-authored-by: Ranjit Ranjan <165394499+ranjitshs@users.noreply.github.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 parent bd32f51 commit a6cdf62

File tree

274 files changed

+10047
-3285
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

274 files changed

+10047
-3285
lines changed

ThirdPartyNotices.txt

+35
Original file line numberDiff line numberDiff line change
@@ -6045,3 +6045,38 @@ https://github.com/intel/neural-speed
60456045
terms, and open source software license terms. These separate license terms
60466046
govern your use of the third party programs as set forth in the
60476047
"THIRD-PARTY-PROGRAMS" file.
6048+
6049+
_____
6050+
6051+
dawn
6052+
6053+
https://dawn.googlesource.com/dawn
6054+
6055+
BSD 3-Clause License
6056+
6057+
Copyright 2017-2023 The Dawn & Tint Authors
6058+
6059+
Redistribution and use in source and binary forms, with or without
6060+
modification, are permitted provided that the following conditions are met:
6061+
6062+
1. Redistributions of source code must retain the above copyright notice, this
6063+
list of conditions and the following disclaimer.
6064+
6065+
2. Redistributions in binary form must reproduce the above copyright notice,
6066+
this list of conditions and the following disclaimer in the documentation
6067+
and/or other materials provided with the distribution.
6068+
6069+
3. Neither the name of the copyright holder nor the names of its
6070+
contributors may be used to endorse or promote products derived from
6071+
this software without specific prior written permission.
6072+
6073+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
6074+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
6075+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
6076+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
6077+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
6078+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
6079+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
6080+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
6081+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
6082+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

cmake/deps.txt

-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,6 @@ re2;https://github.com/google/re2/archive/refs/tags/2024-07-02.zip;646e1728269cd
5353
safeint;https://github.com/dcleblanc/SafeInt/archive/refs/tags/3.0.28.zip;23f252040ff6cb9f1fd18575b32fa8fb5928daac
5454
tensorboard;https://github.com/tensorflow/tensorboard/archive/373eb09e4c5d2b3cc2493f0949dc4be6b6a45e81.zip;67b833913605a4f3f499894ab11528a702c2b381
5555
cutlass;https://github.com/NVIDIA/cutlass/archive/refs/tags/v3.5.1.zip;e49b2b964163d27765a5002d210a2f3c73771835
56-
utf8_range;https://github.com/protocolbuffers/utf8_range/archive/72c943dea2b9240cd09efde15191e144bc7c7d38.zip;9925739c9debc0efa2adcb194d371a35b6a03156
5756
extensions;https://github.com/microsoft/onnxruntime-extensions/archive/c24b7bab0c12f53da76d0c31b03b9f0f8ec8f3b4.zip;239063aee4946a9af147b473a4c3da78ba7413b4
5857
composable_kernel;https://github.com/ROCmSoftwarePlatform/composable_kernel/archive/204da9c522cebec5220bba52cd3542ebcaf99e7a.zip;1827348efd47831c13074245274d41b7cae8a557
5958
directx_headers;https://github.com/microsoft/DirectX-Headers/archive/refs/tags/v1.613.1.zip;47653509a3371eabb156360f42faf582f314bf2e

cmake/external/onnxruntime_external_deps.cmake

+28-26
Original file line numberDiff line numberDiff line change
@@ -107,23 +107,6 @@ if(onnxruntime_USE_MIMALLOC)
107107
FetchContent_MakeAvailable(mimalloc)
108108
endif()
109109

110-
#Protobuf depends on utf8_range
111-
onnxruntime_fetchcontent_declare(
112-
utf8_range
113-
URL ${DEP_URL_utf8_range}
114-
URL_HASH SHA1=${DEP_SHA1_utf8_range}
115-
EXCLUDE_FROM_ALL
116-
FIND_PACKAGE_ARGS NAMES utf8_range
117-
)
118-
119-
set(utf8_range_ENABLE_TESTS OFF CACHE BOOL "Build test suite" FORCE)
120-
set(utf8_range_ENABLE_INSTALL OFF CACHE BOOL "Configure installation" FORCE)
121-
122-
# The next line will generate an error message "fatal: not a git repository", but it is ok. It is from flatbuffers
123-
onnxruntime_fetchcontent_makeavailable(utf8_range)
124-
# protobuf's cmake/utf8_range.cmake has the following line
125-
include_directories(${utf8_range_SOURCE_DIR})
126-
127110
# Download a protoc binary from Internet if needed
128111
if(NOT ONNX_CUSTOM_PROTOC_EXECUTABLE AND NOT onnxruntime_USE_VCPKG)
129112
# This part of code is only for users' convenience. The code couldn't handle all cases. Users always can manually
@@ -304,7 +287,7 @@ if(NOT TARGET Boost::mp11)
304287
EXCLUDE_FROM_ALL
305288
FIND_PACKAGE_ARGS NAMES Boost
306289
)
307-
onnxruntime_fetchcontent_makeavailable(mp11)
290+
onnxruntime_fetchcontent_makeavailable(mp11)
308291
if(NOT TARGET Boost::mp11)
309292
add_library(Boost::mp11 ALIAS Boost::headers)
310293
endif()
@@ -442,6 +425,9 @@ target_include_directories(safeint_interface INTERFACE ${safeint_SOURCE_DIR})
442425

443426

444427
# Flatbuffers
428+
if(onnxruntime_USE_VCPKG)
429+
find_package(flatbuffers REQUIRED)
430+
else()
445431
# We do not need to build flatc for iOS or Android Cross Compile
446432
if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten")
447433
set(FLATBUFFERS_BUILD_FLATC OFF CACHE BOOL "FLATBUFFERS_BUILD_FLATC" FORCE)
@@ -492,6 +478,7 @@ namespace std { using ::getenv; }
492478
endif()
493479
endif()
494480
endif()
481+
endif()
495482

496483
# ONNX
497484
if (NOT onnxruntime_USE_FULL_PROTOBUF)
@@ -672,17 +659,10 @@ if (onnxruntime_USE_WEBGPU)
672659

673660
# disable things we don't use
674661
set(DAWN_DXC_ENABLE_ASSERTS_IN_NDEBUG OFF)
675-
set(DAWN_ENABLE_DESKTOP_GL OFF CACHE BOOL "" FORCE)
676-
set(DAWN_ENABLE_OPENGLES OFF CACHE BOOL "" FORCE)
677-
set(DAWN_SUPPORTS_GLFW_FOR_WINDOWING OFF CACHE BOOL "" FORCE)
678-
set(DAWN_USE_GLFW OFF CACHE BOOL "" FORCE)
679-
set(DAWN_USE_WINDOWS_UI OFF CACHE BOOL "" FORCE)
680662
set(DAWN_USE_X11 OFF CACHE BOOL "" FORCE)
681663

682664
set(TINT_BUILD_TESTS OFF CACHE BOOL "" FORCE)
683665
set(TINT_BUILD_CMD_TOOLS OFF CACHE BOOL "" FORCE)
684-
set(TINT_BUILD_GLSL_WRITER OFF CACHE BOOL "" FORCE)
685-
set(TINT_BUILD_GLSL_VALIDATOR OFF CACHE BOOL "" FORCE)
686666
set(TINT_BUILD_IR_BINARY OFF CACHE BOOL "" FORCE)
687667
set(TINT_BUILD_SPV_READER OFF CACHE BOOL "" FORCE) # don't need. disabling is a large binary size saving
688668
set(TINT_BUILD_WGSL_WRITER ON CACHE BOOL "" FORCE) # needed to create cache key. runtime error if not enabled.
@@ -732,7 +712,29 @@ if (onnxruntime_USE_WEBGPU)
732712
# # if we need to apply patches in the future, we can uncomment the following line.
733713
#
734714
# The dawn.patch contains the following changes:
735-
# - https://dawn-review.googlesource.com/c/dawn/+/225514
715+
#
716+
# - (public) CMake fix to support Emscripten v4.0.3+
717+
# This change allows Dawn to find the file "gen_struct_info.py" in the correct location.
718+
# https://dawn-review.googlesource.com/c/dawn/+/225514
719+
#
720+
# - (public) Fix emwgpu C++ implementation for buffer destroy
721+
# In native implementation, wgpuBufferRelease will trigger the buffer destroy (if refcount decreased to 0). But
722+
# in emwgpu implementation, the buffer destroy won't happen. This change fixes the bug.
723+
# https://dawn-review.googlesource.com/c/dawn/+/226315
724+
#
725+
# - (private) Allow "external" buffer in emwgpu C++ implementation
726+
# This change allows WGPUBufferImpl to destroy the buffer when the refcount decreased to 0 only for non-external
727+
# buffer.
728+
# "external buffer" means the GPUBuffer instance created in JavaScript and imported to C++ by `importJsBuffer`.
729+
#
730+
# - (private) Remove hard-coded CMAKE_OSX_DEPLOYMENT_TARGET in Dawn's CMake files
731+
# https://github.com/microsoft/onnxruntime/pull/23729
732+
#
733+
# - (private) Fix external ref count for "external" device in emwgpu C++ implementation
734+
# This change fixes the incorrect external ref count for class WGPUDeviceImpl when used with "external" device.
735+
# "external device" means the GPUDevice instance created in JavaScript and imported to C++ by `importJsDevice`.
736+
#
737+
#
736738
PATCH_COMMAND ${Patch_EXECUTABLE} --binary --ignore-whitespace -p1 < ${PROJECT_SOURCE_DIR}/patches/dawn/dawn.patch
737739
EXCLUDE_FROM_ALL
738740
)

cmake/nuget_helpers.cmake

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright (c) Microsoft Corporation. All rights reserved.
22
# Licensed under the MIT License.
33

4-
cmake_minimum_required(VERSION 3.0)
4+
cmake_minimum_required(VERSION 3.5)
55

66
# Determines the version of a native nuget package from the root packages.config.
77
#

cmake/onnxruntime_framework.cmake

+1-4
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,7 @@ elseif(onnxruntime_ENABLE_TRITON)
3636
endif()
3737

3838
if (onnxruntime_MINIMAL_BUILD)
39-
set(onnxruntime_framework_src_exclude
40-
"${ONNXRUNTIME_ROOT}/core/framework/fallback_cpu_capability.h"
41-
"${ONNXRUNTIME_ROOT}/core/framework/fallback_cpu_capability.cc"
42-
)
39+
set(onnxruntime_framework_src_exclude)
4340

4441
# custom ops support must be explicitly enabled in a minimal build. exclude if not.
4542
if (NOT onnxruntime_MINIMAL_BUILD_CUSTOM_OPS)

cmake/onnxruntime_optimizer.cmake

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ if (onnxruntime_MINIMAL_BUILD)
99
list(APPEND onnxruntime_optimizer_src_patterns
1010
"${ONNXRUNTIME_INCLUDE_DIR}/core/optimizer/graph_transformer.h"
1111
"${ONNXRUNTIME_ROOT}/core/optimizer/graph_transformer.cc"
12+
"${ONNXRUNTIME_ROOT}/core/optimizer/graph_optimizer_registry.cc"
1213
)
1314

1415
if (onnxruntime_EXTENDED_MINIMAL_BUILD)

cmake/onnxruntime_providers_js.cmake

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# Copyright (c) Microsoft Corporation. All rights reserved.
22
# Licensed under the MIT License.
33

4+
if (onnxruntime_MINIMAL_BUILD AND NOT onnxruntime_EXTENDED_MINIMAL_BUILD)
5+
message(FATAL_ERROR "JSEP can not be used in a basic minimal build. Please build with '--minimal_build extended'")
6+
endif()
7+
48
add_compile_definitions(USE_JSEP=1)
59

610
file(GLOB_RECURSE onnxruntime_providers_js_cc_srcs
@@ -18,4 +22,4 @@
1822
onnxruntime_common onnxruntime_framework onnx onnx_proto ${PROTOBUF_LIB} flatbuffers Boost::mp11 Eigen3::Eigen
1923
)
2024

21-
add_dependencies(onnxruntime_providers_js ${onnxruntime_EXTERNAL_DEPENDENCIES})
25+
add_dependencies(onnxruntime_providers_js ${onnxruntime_EXTERNAL_DEPENDENCIES})

cmake/onnxruntime_python.cmake

+1-1
Original file line numberDiff line numberDiff line change
@@ -1029,7 +1029,7 @@ if (onnxruntime_USE_QNN)
10291029
add_custom_command(
10301030
TARGET onnxruntime_pybind11_state POST_BUILD
10311031
COMMAND ${CMAKE_COMMAND} -E copy
1032-
$<TARGET_FILE:onnxruntime_qnn_ctx_gen>
1032+
$<TARGET_FILE:ep_weight_sharing_ctx_gen>
10331033
$<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/capi/
10341034
)
10351035
if (EXISTS "${onnxruntime_QNN_HOME}/Qualcomm AI Hub Proprietary License.pdf")

cmake/onnxruntime_session.cmake

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ endif()
2222
if (onnxruntime_MINIMAL_BUILD)
2323
set(onnxruntime_session_src_exclude
2424
"${ONNXRUNTIME_ROOT}/core/session/provider_bridge_ort.cc"
25+
"${ONNXRUNTIME_ROOT}/core/session/model_builder_c_api.cc"
2526
)
2627

2728
list(REMOVE_ITEM onnxruntime_session_srcs ${onnxruntime_session_src_exclude})

cmake/onnxruntime_unittests.cmake

+26-17
Original file line numberDiff line numberDiff line change
@@ -236,14 +236,14 @@ function(AddTest)
236236
)
237237
endif()
238238
# Set test timeout to 3 hours.
239-
set_tests_properties(${_UT_TARGET} PROPERTIES TIMEOUT 7200)
239+
set_tests_properties(${_UT_TARGET} PROPERTIES TIMEOUT 10800)
240240
else()
241241
add_test(NAME ${_UT_TARGET}
242242
COMMAND ${_UT_TARGET} ${TEST_ARGS}
243243
WORKING_DIRECTORY $<TARGET_FILE_DIR:${_UT_TARGET}>
244244
)
245245
# Set test timeout to 3 hours.
246-
set_tests_properties(${_UT_TARGET} PROPERTIES TIMEOUT 7200)
246+
set_tests_properties(${_UT_TARGET} PROPERTIES TIMEOUT 10800)
247247
endif()
248248
endif()
249249
endfunction(AddTest)
@@ -503,6 +503,7 @@ set (onnxruntime_shared_lib_test_SRC
503503

504504
if (NOT onnxruntime_MINIMAL_BUILD)
505505
list(APPEND onnxruntime_shared_lib_test_SRC ${ONNXRUNTIME_SHARED_LIB_TEST_SRC_DIR}/test_inference.cc)
506+
list(APPEND onnxruntime_shared_lib_test_SRC ${ONNXRUNTIME_SHARED_LIB_TEST_SRC_DIR}/test_model_builder_api.cc)
506507
endif()
507508

508509
if(onnxruntime_RUN_ONNX_TESTS)
@@ -1288,31 +1289,34 @@ if (NOT onnxruntime_ENABLE_TRAINING_TORCH_INTEROP)
12881289

12891290
if(onnxruntime_USE_QNN)
12901291
#qnn ctx generator
1291-
set(onnxruntime_qnn_ctx_gen_src_dir ${TEST_SRC_DIR}/qnn_ctx_gen)
1292-
set(onnxruntime_qnn_ctx_gen_src_patterns
1293-
"${onnxruntime_qnn_ctx_gen_src_dir}/*.cc"
1294-
"${onnxruntime_qnn_ctx_gen_src_dir}/*.h")
1292+
set(ep_weight_sharing_ctx_gen_src_dir ${TEST_SRC_DIR}/ep_weight_sharing_ctx_gen)
1293+
set(ep_weight_sharing_ctx_gen_src_patterns
1294+
"${ep_weight_sharing_ctx_gen_src_dir}/*.cc"
1295+
"${ep_weight_sharing_ctx_gen_src_dir}/*.h")
12951296

1296-
file(GLOB onnxruntime_qnn_ctx_gen_src CONFIGURE_DEPENDS
1297-
${onnxruntime_qnn_ctx_gen_src_patterns}
1297+
file(GLOB ep_weight_sharing_ctx_gen_src CONFIGURE_DEPENDS
1298+
${ep_weight_sharing_ctx_gen_src_patterns}
12981299
)
1299-
onnxruntime_add_executable(onnxruntime_qnn_ctx_gen ${onnxruntime_qnn_ctx_gen_src})
1300-
target_include_directories(onnxruntime_qnn_ctx_gen PRIVATE ${onnx_test_runner_src_dir} ${ONNXRUNTIME_ROOT}
1301-
${onnxruntime_graph_header} ${onnxruntime_exec_src_dir}
1302-
${CMAKE_CURRENT_BINARY_DIR})
1300+
onnxruntime_add_executable(ep_weight_sharing_ctx_gen ${ep_weight_sharing_ctx_gen_src})
1301+
target_include_directories(ep_weight_sharing_ctx_gen PRIVATE ${ONNXRUNTIME_ROOT} ${CMAKE_CURRENT_BINARY_DIR})
13031302
if (WIN32)
1304-
target_compile_options(onnxruntime_qnn_ctx_gen PRIVATE ${disabled_warnings})
1303+
target_compile_options(ep_weight_sharing_ctx_gen PRIVATE ${disabled_warnings})
13051304
if (NOT DEFINED SYS_PATH_LIB)
13061305
set(SYS_PATH_LIB shlwapi)
13071306
endif()
13081307
endif()
13091308

1310-
if(WIN32)
1311-
target_link_libraries(onnxruntime_qnn_ctx_gen PRIVATE debug dbghelp advapi32)
1309+
if (onnxruntime_BUILD_SHARED_LIB)
1310+
set(ep_weight_sharing_ctx_gen_libs onnxruntime_common onnxruntime ${onnxruntime_EXTERNAL_LIBRARIES} ${GETOPT_LIB_WIDE})
1311+
target_link_libraries(ep_weight_sharing_ctx_gen PRIVATE ${ep_weight_sharing_ctx_gen_libs})
1312+
if (WIN32)
1313+
target_link_libraries(ep_weight_sharing_ctx_gen PRIVATE debug dbghelp advapi32)
1314+
endif()
1315+
else()
1316+
target_link_libraries(ep_weight_sharing_ctx_gen PRIVATE onnxruntime_session ${onnxruntime_test_providers_libs} ${onnxruntime_EXTERNAL_LIBRARIES} ${GETOPT_LIB_WIDE})
13121317
endif()
1313-
target_link_libraries(onnxruntime_qnn_ctx_gen PRIVATE onnx_test_runner_common onnxruntime_test_utils onnxruntime_common onnxruntime_graph onnxruntime_session onnxruntime_providers onnxruntime_framework onnxruntime_util onnxruntime_mlas onnxruntime_optimizer onnxruntime_flatbuffers onnx_test_data_proto ${onnxruntime_test_providers_libs} ${onnxruntime_EXTERNAL_LIBRARIES} ${GETOPT_LIB_WIDE} ${SYS_PATH_LIB} ${CMAKE_DL_LIBS})
13141318

1315-
set_target_properties(onnxruntime_qnn_ctx_gen PROPERTIES FOLDER "ONNXRuntimeTest")
1319+
set_target_properties(ep_weight_sharing_ctx_gen PROPERTIES FOLDER "ONNXRuntimeTest")
13161320
endif()
13171321

13181322
# shared lib
@@ -1359,14 +1363,19 @@ if (NOT onnxruntime_ENABLE_TRAINING_TORCH_INTEROP)
13591363
LIBS ${onnxruntime_shared_lib_test_LIBS}
13601364
DEPENDS ${all_dependencies}
13611365
)
1366+
1367+
target_include_directories(onnxruntime_shared_lib_test PRIVATE ${ONNXRUNTIME_ROOT})
1368+
13621369
if (onnxruntime_USE_CUDA)
13631370
target_include_directories(onnxruntime_shared_lib_test PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
13641371
target_sources(onnxruntime_shared_lib_test PRIVATE ${ONNXRUNTIME_SHARED_LIB_TEST_SRC_DIR}/cuda_ops.cu)
13651372
endif()
1373+
13661374
if (onnxruntime_USE_ROCM)
13671375
target_include_directories(onnxruntime_shared_lib_test PRIVATE ${onnxruntime_ROCM_HOME}/include)
13681376
target_compile_definitions(onnxruntime_shared_lib_test PRIVATE __HIP_PLATFORM_AMD__)
13691377
endif()
1378+
13701379
if (CMAKE_SYSTEM_NAME STREQUAL "Android")
13711380
target_sources(onnxruntime_shared_lib_test PRIVATE
13721381
"${ONNXRUNTIME_ROOT}/core/platform/android/cxa_demangle.cc"

cmake/onnxruntime_webassembly.cmake

+28-9
Original file line numberDiff line numberDiff line change
@@ -211,10 +211,14 @@ else()
211211
target_link_libraries(onnxruntime_webassembly PRIVATE tensorboard)
212212
endif()
213213

214+
set(onnxruntime_webassembly_script_deps "${ONNXRUNTIME_ROOT}/wasm/pre.js")
215+
216+
set(EXPORTED_FUNCTIONS "_malloc,_free")
214217
if (onnxruntime_USE_JSEP)
215-
set(EXPORTED_FUNCTIONS "_malloc,_free,_JsepOutput,_JsepGetNodeName")
216-
else()
217-
set(EXPORTED_FUNCTIONS "_malloc,_free")
218+
string(APPEND EXPORTED_FUNCTIONS ",_JsepOutput,_JsepGetNodeName")
219+
endif()
220+
if (onnxruntime_USE_WEBGPU)
221+
string(APPEND EXPORTED_FUNCTIONS ",_wgpuBufferRelease,_wgpuCreateInstance")
218222
endif()
219223

220224
if (onnxruntime_ENABLE_WEBASSEMBLY_MEMORY64)
@@ -312,13 +316,15 @@ else()
312316
target_compile_options(noexcep_operators PRIVATE ${SMEMORY_FLAG} -Wno-experimental)
313317
endif()
314318
target_link_options(onnxruntime_webassembly PRIVATE
315-
--post-js "${ONNXRUNTIME_ROOT}/wasm/js_post_js_64.js"
319+
"SHELL:--post-js \"${ONNXRUNTIME_ROOT}/wasm/js_post_js_64.js\""
316320
)
321+
list(APPEND onnxruntime_webassembly_script_deps "${ONNXRUNTIME_ROOT}/wasm/js_post_js_64.js")
317322
else ()
318323
set(MAXIMUM_MEMORY "4294967296")
319324
target_link_options(onnxruntime_webassembly PRIVATE
320-
--post-js "${ONNXRUNTIME_ROOT}/wasm/js_post_js.js"
325+
"SHELL:--post-js \"${ONNXRUNTIME_ROOT}/wasm/js_post_js.js\""
321326
)
327+
list(APPEND onnxruntime_webassembly_script_deps "${ONNXRUNTIME_ROOT}/wasm/js_post_js.js")
322328
endif ()
323329

324330
target_link_options(onnxruntime_webassembly PRIVATE
@@ -372,7 +378,6 @@ jsepDownload:_pp_")
372378
"SHELL:-s SIGNATURE_CONVERSIONS='${SIGNATURE_CONVERSIONS}'"
373379
)
374380
endif ()
375-
set_target_properties(onnxruntime_webassembly PROPERTIES LINK_DEPENDS ${ONNXRUNTIME_ROOT}/wasm/pre.js)
376381

377382
if (onnxruntime_USE_JSEP)
378383
# NOTE: "-s ASYNCIFY=1" is required for JSEP to work with WebGPU
@@ -382,10 +387,8 @@ jsepDownload:_pp_")
382387
target_compile_definitions(onnxruntime_webassembly PRIVATE USE_JSEP=1)
383388
target_link_options(onnxruntime_webassembly PRIVATE
384389
"SHELL:--pre-js \"${ONNXRUNTIME_ROOT}/wasm/pre-jsep.js\""
385-
"SHELL:-s ASYNCIFY=1"
386-
"SHELL:-s ASYNCIFY_STACK_SIZE=65536"
387390
)
388-
set_target_properties(onnxruntime_webassembly PROPERTIES LINK_DEPENDS ${ONNXRUNTIME_ROOT}/wasm/pre-jsep.js)
391+
list(APPEND onnxruntime_webassembly_script_deps "${ONNXRUNTIME_ROOT}/wasm/pre-jsep.js")
389392

390393
if (onnxruntime_ENABLE_WEBASSEMBLY_MEMORY64)
391394
target_link_options(onnxruntime_webassembly PRIVATE
@@ -397,6 +400,20 @@ jsepDownload:_pp_")
397400

398401
if (onnxruntime_USE_WEBGPU)
399402
target_compile_definitions(onnxruntime_webassembly PRIVATE USE_WEBGPU=1)
403+
target_link_options(onnxruntime_webassembly PRIVATE
404+
"SHELL:--post-js \"${ONNXRUNTIME_ROOT}/wasm/post-webgpu.js\""
405+
)
406+
list(APPEND onnxruntime_webassembly_script_deps "${ONNXRUNTIME_ROOT}/wasm/post-webgpu.js")
407+
endif()
408+
409+
if (onnxruntime_USE_JSEP OR onnxruntime_USE_WEBGPU OR onnxruntime_USE_WEBNN)
410+
# if any of the above is enabled, we need to use the asyncify library
411+
target_link_options(onnxruntime_webassembly PRIVATE
412+
"SHELL:--pre-js \"${ONNXRUNTIME_ROOT}/wasm/pre-async.js\""
413+
"SHELL:-s ASYNCIFY=1"
414+
"SHELL:-s ASYNCIFY_STACK_SIZE=65536"
415+
)
416+
list(APPEND onnxruntime_webassembly_script_deps "${ONNXRUNTIME_ROOT}/wasm/pre-async.js")
400417
endif()
401418

402419
if (onnxruntime_EMSCRIPTEN_SETTINGS)
@@ -458,6 +475,8 @@ jsepDownload:_pp_")
458475
)
459476
endif()
460477

478+
set_target_properties(onnxruntime_webassembly PROPERTIES LINK_DEPENDS "${onnxruntime_webassembly_script_deps}")
479+
461480
set(target_name_list ort)
462481

463482
if (onnxruntime_ENABLE_TRAINING_APIS)

0 commit comments

Comments
 (0)