Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

HabanaAI / vllm-fork Public

forked from vllm-project/vllm

Notifications
Fork 81
Star 62

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: HabanaAI/vllm-fork

Labels 17 Milestones 0

Labels 17 Milestones 0

New pull request New

Clear current search query, filters, and sorts

53 Open 862 Closed

53 Open 862 Closed

Author

Filter by author

Loading

Label

Filter by label

Loading

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Loading

Milestones

Filter by milestone

Loading

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Loading

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[PoC] Add max padding ratio to padding aware scheduler habana

Issues or PRs submitted by Habana Labs

#407 opened Oct 18, 2024 by kzawora-intel • Draft

Add models-tiny CI step with Llama3.2-1B habana

Issues or PRs submitted by Habana Labs

#440 opened Oct 28, 2024 by kzawora-intel • Draft

[WIP] Add HPU support to vLLM v1

#487 opened Nov 12, 2024 by kzawora-intel • Draft

19 of 23 tasks

1

Multi models support for upstream

#590 opened Dec 4, 2024 by xuechendi • Draft

1

Add in Dockerfile.hpu.ubi external

Issues or PRs submitted by external users

#602 opened Dec 9, 2024 by Xaenalt

Loading…

[WIP] Add HPU support to vLLM v1 - cont. stale

#609 opened Dec 10, 2024 by kzawora-intel

Loading…

21 of 23 tasks

1

Add exponential bucketing integration

#642 opened Dec 17, 2024 by kzawora-intel

Loading…

6

add renormalize param for FusedMOE

#671 opened Jan 9, 2025 by tangleintel • Draft

4

make benchmark_throughput static support single image input

#718 opened Jan 22, 2025 by yma11 • Draft

Pipeline Parallelism implementation.

#731 opened Jan 23, 2025 by jmaksymczuk • Draft

6

[DO NOT MERGE][PoC] Mark dynamic shapes in torch.compile mode

#755 opened Jan 29, 2025 by kzawora-intel • Draft

1

Enable roberta embedding

#786 opened Feb 5, 2025 by yeonsily

Loading…

[DEEPSEEK_V3/R1] includes features of fp8 dequant, MLA, Expert parallelism

#792 opened Feb 6, 2025 by xuechendi

Loading…

7

Update documentation to reflect current bucket defaults

#817 opened Feb 12, 2025 by nngokhale

Loading…

6

Resolve Speculative Decode RTE

#823 opened Feb 13, 2025 by tannervoas742

Loading…

4

enable multi-modal embedding for TIGER-Lab/VLM2Vec-Full T+I on HPU

#854 opened Feb 20, 2025 by libinta

Loading…

3

Update requirements-hpu.txt for open telemetry tracing support

#857 opened Feb 21, 2025 by louie-tsai

Loading…

5

[CI] Add APC tests

#866 opened Feb 25, 2025 by kzawora-intel

Loading…

[DO NOT MERGE] Add possibility to execute with LoRA adapter for lm_eval

#879 opened Feb 28, 2025 by mkrze • Draft

Cherry-pick of "Selective merged prefill #643"

#893 opened Mar 6, 2025 by kamil-kaczor

Loading…

Add LoRA parameter to test_lm_eval_correctness

#894 opened Mar 6, 2025 by mkrze • Draft

1

Synchronize vLLM flags to support cross-node inference

#897 opened Mar 7, 2025 by IT-Forrest

Loading…

4

Fix spec decoding warmup

#906 opened Mar 11, 2025 by yangw1234

Loading…

6

[Draft] [DNM]PD distributed support for deepseek r1

#912 opened Mar 14, 2025 by jikunshang • Draft

1

Split qkv and gate_up for llama; TP overlap with mlp

#915 opened Mar 14, 2025 by tianmu-li

Loading…

7

Previous 1 2 3 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-03-01.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.