Remove tokens after EOS for draft model for speculative decoding #1951

sbalandi · 2025-03-20T18:21:14Z

ilya-lavrenov · 2025-03-21T15:53:38Z

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

@@ -337,5 +337,16 @@ void ContinuousBatchingPipeline::ContinuousBatchingForSpeculativeDecodingImpl::m
            to_generate |= request->can_generate_tokens();
        }
    }
+
+    for (auto& request : m_requests) {


I suppose we initially ignore EOS tokens for draft models, why are they removed here? It should not affect results of main model, should they?

It was decided to not add part after EOS for draft model according to this ticket https://jira.devtools.intel.com/browse/CVS-164477 . It affects results of main model. What I saw is that if we have a stop_token, the generation result can contain it and some tokens after it, with these changes it will be nothing after stop token

iefode · 2025-03-21T19:41:15Z

Have discussed offline how to implement in best way

shira-g · 2025-03-23T11:29:42Z

@sbalandi
I tried this fix and I am getting the following error:
Check 'content_length <= prompt_ids.size() + m_generated_ids.size()' failed at C:\Users\sdp\openvino.genai\src\cpp\src\sequence_group.cpp:32

iefode

LGTM. Could you please fix one comment?

src/cpp/src/sampler.hpp

…raft model output

iefode · 2025-03-25T10:08:48Z

src/cpp/src/sampler.cpp

@@ -851,6 +853,12 @@ SequenceGroupSamplingInfo Sampler::sample_from_sequence_group(SequenceGroup::Ptr
                // to exit from sampling in case of failed token validation
                if (!is_validation_passed) {
                    break;
+                } else {
+                    auto sampling_params = sequence_group->get_sampling_parameters();
+                    if (is_stop_token_id_hit(sampled_token.m_index, sampling_params.stop_token_ids) && !sampling_params.ignore_eos) {


Minor, looks like is_stop_token_id_hit is equal to simple find :D

sbalandi requested a review from iefode March 20, 2025 18:21

github-actions bot added category: sampling Sampling / Decoding algorithms category: speculative decoding Speculative decoding labels Mar 20, 2025

ilya-lavrenov reviewed Mar 21, 2025

View reviewed changes

sbalandi force-pushed the sd_eos branch from 6e4ea48 to 781b3d2 Compare March 24, 2025 15:40

iefode self-assigned this Mar 24, 2025

iefode marked this pull request as ready for review March 24, 2025 15:46

iefode approved these changes Mar 24, 2025

View reviewed changes

src/cpp/src/sampler.hpp Outdated Show resolved Hide resolved

sbalandi force-pushed the sd_eos branch from 14eb53e to b8b674a Compare March 24, 2025 18:28

sbalandi added 3 commits March 24, 2025 19:26

Remove tokens after EOS for draft model for speculative decoding

670eb31

Add check and finishing generation if stop token is in the midle of d…

9f4aeee

…raft model output

fix

b8b674a

sbalandi added the do_not_merge label Mar 24, 2025

align with greedy

ed3582c

ilya-lavrenov added this to the 2025.2 milestone Mar 25, 2025

ilya-lavrenov added the bug Something isn't working label Mar 25, 2025

sbalandi removed the do_not_merge label Mar 25, 2025

iefode reviewed Mar 25, 2025

View reviewed changes

iefode added this pull request to the merge queue Mar 25, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 25, 2025

sbalandi added this pull request to the merge queue Mar 25, 2025

Merged via the queue into openvinotoolkit:master with commit bcdf67b Mar 25, 2025
54 checks passed

sbalandi deleted the sd_eos branch March 25, 2025 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove tokens after EOS for draft model for speculative decoding #1951

Remove tokens after EOS for draft model for speculative decoding #1951

sbalandi commented Mar 20, 2025 •

edited

Loading

ilya-lavrenov Mar 21, 2025

sbalandi Mar 21, 2025

iefode commented Mar 21, 2025

shira-g commented Mar 23, 2025

iefode left a comment

iefode Mar 25, 2025

Remove tokens after EOS for draft model for speculative decoding #1951

Remove tokens after EOS for draft model for speculative decoding #1951

Conversation

sbalandi commented Mar 20, 2025 • edited Loading

ilya-lavrenov Mar 21, 2025

Choose a reason for hiding this comment

sbalandi Mar 21, 2025

Choose a reason for hiding this comment

iefode commented Mar 21, 2025

shira-g commented Mar 23, 2025

iefode left a comment

Choose a reason for hiding this comment

iefode Mar 25, 2025

Choose a reason for hiding this comment

sbalandi commented Mar 20, 2025 •

edited

Loading