Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tokenizers] Tokenizer fix decode #767

Merged
merged 162 commits into from
Nov 29, 2023

Conversation

ilya-lavrenov
Copy link
Contributor

No description provided.

slyalin and others added 30 commits April 26, 2023 03:07
…h and without string support in OV core. Moved StringTensorUnpack and reworked it to be aligned with the new approach. Reworked sentece piece op and translation code to be compatible with several variants of string tensor representation and the plugin wrapping hack.
…ranch to contrib in form compatible with both master and the branch with string tensors support. Added CaseFoldUTF8 from that branch.
…pty constants, register StringTensorPack and StringTensorUnpack as OV operations to be able to read IRs with those operations
…den Const translator for TF to intercept string constants
…r conditional compilation based on available features in OpenVINO
…combination of WordpieceTokenizeWithOffsets and LookupTableFindV2 from TensorFlow
…ute initialization optional (needed for core.make_node)
…n and RegexSplit based on paddle fast_tokenizer lib. Limited implementation, not all of the features of ops and TF translated ops are implemented.
… necessary steps to complete HF bert preprocessing conversion (not validated)
…kenizer and main model is fixed partially (still produces topologically incorrect model)
…uts, now Bert and its tokenizer are connected together correctly
@ilya-lavrenov
Copy link
Contributor Author

/azp run openvino_contrib-mac

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ilya-lavrenov ilya-lavrenov force-pushed the tokenizer-fix-decode branch 6 times, most recently from 13fdedd to f738fcb Compare November 28, 2023 07:03
@ilya-lavrenov ilya-lavrenov merged commit dcc05cb into openvinotoolkit:master Nov 29, 2023
@ilya-lavrenov ilya-lavrenov deleted the tokenizer-fix-decode branch November 29, 2023 17:43
@ilya-lavrenov ilya-lavrenov mentioned this pull request Nov 29, 2023
2 tasks
@ilya-lavrenov ilya-lavrenov changed the title Tokenizer fix decode [Tokenizers] Tokenizer fix decode Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CI OpenVINO public CI category: custom operations OpenVINO Runtime Extension with custom operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants