Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tokens after EOS for draft model for speculative decoding #1951

Merged
merged 4 commits into from
Mar 25, 2025

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Mar 20, 2025

@sbalandi sbalandi requested a review from iefode March 20, 2025 18:21
@github-actions github-actions bot added category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding labels Mar 20, 2025
@@ -337,5 +337,16 @@ void ContinuousBatchingPipeline::ContinuousBatchingForSpeculativeDecodingImpl::m
to_generate |= request->can_generate_tokens();
}
}

for (auto& request : m_requests) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we initially ignore EOS tokens for draft models, why are they removed here? It should not affect results of main model, should they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was decided to not add part after EOS for draft model according to this ticket https://jira.devtools.intel.com/browse/CVS-164477 . It affects results of main model. What I saw is that if we have a stop_token, the generation result can contain it and some tokens after it, with these changes it will be nothing after stop token

@iefode
Copy link
Contributor

iefode commented Mar 21, 2025

Have discussed offline how to implement in best way

@shira-g
Copy link

shira-g commented Mar 23, 2025

@sbalandi
I tried this fix and I am getting the following error:
Check 'content_length <= prompt_ids.size() + m_generated_ids.size()' failed at C:\Users\sdp\openvino.genai\src\cpp\src\sequence_group.cpp:32

@iefode iefode self-assigned this Mar 24, 2025
@iefode iefode marked this pull request as ready for review March 24, 2025 15:46
Copy link
Contributor

@iefode iefode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you please fix one comment?

@ilya-lavrenov ilya-lavrenov added this to the 2025.2 milestone Mar 25, 2025
@ilya-lavrenov ilya-lavrenov added the bug Something isn't working label Mar 25, 2025
@@ -851,6 +853,12 @@ SequenceGroupSamplingInfo Sampler::sample_from_sequence_group(SequenceGroup::Ptr
// to exit from sampling in case of failed token validation
if (!is_validation_passed) {
break;
} else {
auto sampling_params = sequence_group->get_sampling_parameters();
if (is_stop_token_id_hit(sampled_token.m_index, sampling_params.stop_token_ids) && !sampling_params.ignore_eos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, looks like is_stop_token_id_hit is equal to simple find :D

@iefode iefode added this pull request to the merge queue Mar 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 25, 2025
@sbalandi sbalandi added this pull request to the merge queue Mar 25, 2025
Merged via the queue into openvinotoolkit:master with commit bcdf67b Mar 25, 2025
54 checks passed
@sbalandi sbalandi deleted the sd_eos branch March 25, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants