Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

michalkulakowski · 2024-12-20T09:11:55Z

… generation

src/cpp/src/sampler.hpp

ilya-lavrenov

Not all places inside src/cpp are changed

mzegla · 2024-12-20T09:48:11Z

One more thing - I believe max_length is now loaded from generation config. Isn't it a model property that is not meant to be a per generation configuration? @michalkulakowski I know you have logic to read that value in OVMS now. Maybe we could move it here and make it a pipeline member. This way it could be used in both OVMS and standalone GenAI app.

michalkulakowski · 2025-01-02T09:52:21Z

One more thing - I believe max_length is now loaded from generation config. Isn't it a model property that is not meant to be a per generation configuration? @michalkulakowski I know you have logic to read that value in OVMS now. Maybe we could move it here and make it a pipeline member. This way it could be used in both OVMS and standalone GenAI app.

That makes sense to me. @ilya-lavrenov what do you think?

ilya-lavrenov · 2025-01-04T08:23:33Z

One more thing - I believe max_length is now loaded from generation config. Isn't it a model property that is not meant to be a per generation configuration? @michalkulakowski I know you have logic to read that value in OVMS now. Maybe we could move it here and make it a pipeline member. This way it could be used in both OVMS and standalone GenAI app.

I suppose it depends on the model:

For example for meta-llama/Meta-Llama-3-8B-Instruct we have this value in generation config https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/generation_config.json#L6
For other models like default value for max_length is used, which is 20. See https://github.com/huggingface/transformers/blob/e5fd865ebae062b7cf03a81b8c6affeb39f30bec/src/transformers/generation/configuration_utils.py#L127-L129

Looks like max_model_length (which is config.max_position_embeddings, example is https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/config.json#L13) and max_length from generation_config.json are different things, are not they?

Maybe we can have similar behavior for GenAI and add some defaults similar to HF?

@Wovchena @pavel-esir @as-suvorov what is your opinion?

michalkulakowski · 2025-01-14T06:28:22Z

@Wovchena @pavel-esir @as-suvorov please share your opinion

ilya-lavrenov · 2025-01-14T06:45:18Z

@Wovchena @pavel-esir @as-suvorov please share your opinion

I think even w/o default value for max new tokens, we can go with other changes which will respect max_length

… generation

src/cpp/src/continuous_batching_impl.cpp

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp

src/cpp/src/sampler.cpp

src/cpp/src/sequence_group.hpp

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

src/cpp/src/sampler.hpp

src/cpp/src/sampler.cpp

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp

ilya-lavrenov · 2025-03-04T11:48:15Z

build_jenkins

ilya-lavrenov · 2025-03-04T11:56:28Z

Please, fix compilation

ilya-lavrenov · 2025-03-05T13:54:09Z

build_jenkins

github-actions bot added the category: sampling Sampling / Decoding algorithms label Dec 20, 2024

michalkulakowski force-pushed the mkulakow/max_length branch from 402bba1 to 5d527f0 Compare December 20, 2024 09:14

mzegla reviewed Dec 20, 2024

View reviewed changes

src/cpp/src/sampler.hpp Outdated Show resolved Hide resolved

ilya-lavrenov requested changes Dec 20, 2024

View reviewed changes

ilya-lavrenov assigned ilya-lavrenov, Wovchena and mzegla Dec 20, 2024

michalkulakowski force-pushed the mkulakow/max_length branch from 9efa361 to c1d1672 Compare January 23, 2025 08:03

github-actions bot added category: continuous batching Continuous batching category: speculative decoding Speculative decoding category: prompt lookup labels Jan 23, 2025

as-suvorov mentioned this pull request Jan 23, 2025

Whisper pipeline: use Sampler #1615

Merged

Use get_max_new_tokens() insted of max_new_tokens field when stopping…

4a9ca61

… generation

michalkulakowski force-pushed the mkulakow/max_length branch from caa5675 to 4a9ca61 Compare February 14, 2025 16:31

michalkulakowski requested a review from ilya-lavrenov February 14, 2025 16:31

fix

02c86bc

as-suvorov reviewed Feb 14, 2025

View reviewed changes

src/cpp/src/continuous_batching_impl.cpp Outdated Show resolved Hide resolved

Addressing review comment

4862030

michalkulakowski requested review from as-suvorov and mzegla February 25, 2025 07:45

mzegla approved these changes Feb 28, 2025

View reviewed changes

as-suvorov reviewed Feb 28, 2025

View reviewed changes

fix

fa350ad

michalkulakowski requested a review from as-suvorov March 4, 2025 08:08

as-suvorov reviewed Mar 4, 2025

View reviewed changes

src/cpp/src/sampler.cpp Outdated Show resolved Hide resolved

fix

4b153d9

github-actions bot removed the category: sampling Sampling / Decoding algorithms label Mar 4, 2025

michalkulakowski requested a review from as-suvorov March 4, 2025 08:45

as-suvorov approved these changes Mar 4, 2025

View reviewed changes

ilya-lavrenov reviewed Mar 4, 2025

View reviewed changes

src/cpp/src/prompt_lookup/continuous_batching_for_prompt_lookup.cpp Outdated Show resolved Hide resolved

iefode reviewed Mar 4, 2025

View reviewed changes

src/cpp/src/speculative_decoding/continuous_batching_for_speculative_decoding_impl.cpp Outdated Show resolved Hide resolved

address review comments

060783e

iefode approved these changes Mar 4, 2025

View reviewed changes

fix

b72b59b

ilya-lavrenov added this to the 2025.1 milestone Mar 4, 2025

ilya-lavrenov approved these changes Mar 4, 2025

View reviewed changes

michalkulakowski added 2 commits March 4, 2025 16:03

fix

4cd4fe6

fix

5075eb7

ilya-lavrenov enabled auto-merge March 5, 2025 13:54

ilya-lavrenov added this pull request to the merge queue Mar 5, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 5, 2025

ilya-lavrenov merged commit 0214ba8 into openvinotoolkit:master Mar 5, 2025
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

michalkulakowski commented Dec 20, 2024

ilya-lavrenov left a comment

mzegla commented Dec 20, 2024

michalkulakowski commented Jan 2, 2025

ilya-lavrenov commented Jan 4, 2025 •

edited

Loading

michalkulakowski commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

ilya-lavrenov commented Mar 4, 2025

ilya-lavrenov commented Mar 4, 2025

ilya-lavrenov commented Mar 5, 2025

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

Conversation

michalkulakowski commented Dec 20, 2024

ilya-lavrenov left a comment

Choose a reason for hiding this comment

mzegla commented Dec 20, 2024

michalkulakowski commented Jan 2, 2025

ilya-lavrenov commented Jan 4, 2025 • edited Loading

michalkulakowski commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

ilya-lavrenov commented Mar 4, 2025

ilya-lavrenov commented Mar 4, 2025

ilya-lavrenov commented Mar 5, 2025

ilya-lavrenov commented Jan 4, 2025 •

edited

Loading