Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU][POC] clDNN gemv optimization for LLM second token #28976
base: master
Are you sure you want to change the base?
[GPU][POC] clDNN gemv optimization for LLM second token #28976
Changes from 19 commits
c6a8d8d
c4d9e84
1b03afd
a646496
0e0786e
8d02fe5
668b480
8c3909d
d1d7c53
17d1e29
5bbe1f9
d9db6bf
870ae43
509ccb0
f5b40ec
80357ed
2e8998e
00103d8
30198d0
4f86428
25e55c2
1f178c0
3bbd10b
d9a70bb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this logic? Optimized static impl should be selected at 1) step before this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we need it to switch gemv kernel for second token, let's double check the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm it works well if run LLM model without this logic, but in case of dynamic shape it will choose fc_fb_tiled kernel rather than gemv kernel for single batch input. @sshlyapn Is there better solution to solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to set priority value in
GetKernelsPriority()
lower than for bf_tiled kernel, something likeFORCE_PRIORITY_3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it doesn't work, as we see gemv impl is only for input with single batch, and for dynamic shape case input batch is not decided before choose fc impl, so it will first select fc_bf_tiled impl. Once input shape is set, there is no chance to re-choose new fc impl, we have to add above logic to make it can re-choose fc impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sshlyapn great help to solve the dynamic shape issue!