[Executorch][llama] Change runner to decouple prompt length from sequence length #9594

kimishpatel · 2025-03-25T19:26:44Z

Stack from ghstack (oldest at bottom):

-> [Executorch][llama] Change runner to decouple prompt length from sequence length #9594

length

Following previous diff now we can utilize entire kv cache to generate more
tokens than max prompt length allowed.

Differential Revision: D69073908

…ence length Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/) [ghstack-poisoned]

pytorch-bot · 2025-03-25T19:26:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9594

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit f4f089e with merge base 644b7dd ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
backends/xnnpack/test/passes/test_convert_to_linear.py::TestConvertToLinear::test_fp32_convert_to_linear

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ence length Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/) ghstack-source-id: 273982703 Pull Request resolved: #9594

facebook-github-bot · 2025-03-25T19:26:55Z

This pull request was exported from Phabricator. Differential Revision: D69073908

…h from sequence length" length Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/) [ghstack-poisoned]

…ence length Pull Request resolved: #9594 Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/) ghstack-source-id: 274018812

facebook-github-bot · 2025-03-25T21:12:49Z

This pull request was exported from Phabricator. Differential Revision: D69073908

kimishpatel requested review from lucylq and jackzhxng as code owners March 25, 2025 19:26

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2025

facebook-github-bot added the fb-exported label Mar 25, 2025

kimishpatel added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Mar 25, 2025

jackzhxng approved these changes Mar 25, 2025

View reviewed changes

facebook-github-bot merged commit dcfa538 into gh/kimishpatel/162/base Mar 26, 2025
81 of 83 checks passed

facebook-github-bot deleted the gh/kimishpatel/162/head branch March 26, 2025 16:29

facebook-github-bot temporarily deployed to cherry-pick-bot March 26, 2025 16:29 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Mar 26, 2025

[Executorch][llama] Change runner to decouple prompt length from sequence length #9650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Executorch][llama] Change runner to decouple prompt length from sequence length #9594

[Executorch][llama] Change runner to decouple prompt length from sequence length #9594

kimishpatel commented Mar 25, 2025 •

edited

Loading

pytorch-bot bot commented Mar 25, 2025 •

edited

Loading

facebook-github-bot commented Mar 25, 2025

facebook-github-bot commented Mar 25, 2025

[Executorch][llama] Change runner to decouple prompt length from sequence length #9594

[Executorch][llama] Change runner to decouple prompt length from sequence length #9594

Conversation

kimishpatel commented Mar 25, 2025 • edited Loading

pytorch-bot bot commented Mar 25, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9594

❌ 1 New Failure

facebook-github-bot commented Mar 25, 2025

facebook-github-bot commented Mar 25, 2025

kimishpatel commented Mar 25, 2025 •

edited

Loading

pytorch-bot bot commented Mar 25, 2025 •

edited

Loading