Skip to content

Releases: AI21Labs/ai21-tokenizer

v0.8.0

03 Jan 09:26
Compare
Choose a tag to compare

v0.8.0 (2024-01-03)

Chore

  • chore(deps-dev): bump pytest from 7.2.1 to 7.4.4 (#75)

Bumps pytest from 7.2.1 to 7.4.4.


updated-dependencies:

  • dependency-name: pytest
    dependency-type: direct:development
    update-type: version-update:semver-minor
    ...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: asafgardin <147075902+asafgardin@users.noreply.github.com> (081dda3)

Feature

  • feat: Add start_of_line to decode (#77)

  • feat: Add start_of_line param to decode

  • test: added unittest with start_of_line=True and False (182a8d1)

v0.7.0

02 Jan 15:46
Compare
Choose a tag to compare

v0.7.0 (2024-01-02)

Feature

  • feat: Init tokenizer from filehandle (#76)

  • feat: allow creating JurassicTokenizer from model file handle

  • fix: Add default for model_path and model_file_handle

  • feat: Add JurassicTokenizer.from_file_path classmethod

  • fix: remove model_path=None in JurassicTokenizer.from_file_handle

  • fix: rename _assert_exactly_one to _validate_init and make it not static

  • refactor: semantics

  • test: Added tests


Co-authored-by: Asaf Gardin <asafg@ai21.com> (dcb73a7)

v0.6.0

28 Dec 14:28
Compare
Choose a tag to compare

v0.6.0 (2023-12-28)

Chore

  • chore: add test case for encode with is_start=False (#74)

  • chore: add test case for encode with is_start=False

  • fix: split is_start=False to a different testcase (77c0a39)

Feature

  • feat: Add decode with offsets (#73)

  • feat: Add decode_with_offsets() to JurassicTokenizer

  • refactor: remove kwargs from decode_with_offsets since it's not used

  • chore: Add unittest for decode and for offsets

  • fix: test only decode_with_offsets

  • fix: dummy for returned offsets in decode_with_offsets (a5a7bb4)

  • feat: Add the is_start parameter to JurassicTokenizer.encode() (#72)

  • feat: Add the is_start parameter to JurassicTokenizer.encode()

  • refactor: take 'is_start' from kwargs (296bda5)

v0.5.0

28 Dec 11:31
Compare
Choose a tag to compare

v0.5.0 (2023-12-28)

Feature

  • feat: Add more special tokens (#71)

  • fix: commitizen tag starts with "v"

  • feat: add eos_id

  • feat: Add newline_id

  • fix: typo "_newline_piece" instead of "newline_piece"

  • fix: newline_id already existed as "private". Just make it "public"

  • fix: forgot to rename everywhere (9a9e1a8)

Fix

  • fix: commitizen tag starts with "v" (#70) (cf495ad)

v0.4.0

28 Dec 07:45
Compare
Choose a tag to compare

v0.4.0 (2023-12-28)

Feature

  • feat: add pad_id and bos_id to jurassic_tokenizer (#69) (ffb2ce3)

v0.3.11

27 Dec 14:34
Compare
Choose a tag to compare

v0.3.11 (2023-12-27)

Fix

v0.3.10

27 Dec 12:39
Compare
Choose a tag to compare

v0.3.10 (2023-12-27)

Chore

  • chore(deps-dev): bump safety from 2.3.4 to 2.3.5 (#64)

Bumps safety from 2.3.4 to 2.3.5.


updated-dependencies:

  • dependency-name: safety
    dependency-type: direct:development
    update-type: version-update:semver-patch
    ...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (95696bb)

  • chore(deps-dev): bump ruff from 0.0.285 to 0.1.8 (#63)

Bumps ruff from 0.0.285 to 0.1.8.


updated-dependencies:

  • dependency-name: ruff
    dependency-type: direct:development
    update-type: version-update:semver-minor
    ...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (81123d3)

  • chore(deps-dev): bump black from 22.12.0 to 23.3.0 (#61)

Bumps black from 22.12.0 to 23.3.0.


updated-dependencies:

  • dependency-name: black
    dependency-type: direct:development
    update-type: version-update:semver-major
    ...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (7190d28)

  • chore(deps-dev): bump safety from 2.3.4 to 2.3.5 (#60)

Bumps safety from 2.3.4 to 2.3.5.


updated-dependencies:

  • dependency-name: safety
    dependency-type: direct:development
    update-type: version-update:semver-patch
    ...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (2fa7bef)

Fix

Refactor

  • refactor: Added all in init (#65)

  • refactor: Added all in init

  • fix: tests

  • refactor: added version to all (c0d9286)

  • refactor: sentencepiece version to support all patch versions (#66) (845008c)

v0.3.9

27 Nov 09:08
Compare
Choose a tag to compare

v0.3.9 (2023-11-27)

Chore

  • chore: add github badges (#58) (821455c)

  • chore(deps-dev): bump urllib3 from 2.0.4 to 2.0.7 (#57)

Bumps urllib3 from 2.0.4 to 2.0.7.


updated-dependencies:

  • dependency-name: urllib3
    dependency-type: indirect
    ...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (93ef6d6)

Fix

  • fix: Modify badges (#59)

  • docs: fixed url

  • fix: inline

  • fix: README.md (20b7090)

v0.3.8

26 Nov 08:08
Compare
Choose a tag to compare

v0.3.8 (2023-11-26)

Fix

v0.3.7

23 Nov 11:54
Compare
Choose a tag to compare

v0.3.7 (2023-11-23)

Ci

  • ci: workflow dispatch for release (#54) (dbf5609)

  • ci: Automate pypi publish (#53)

  • ci: Automate pypi publish on new release

  • fix: Remove comment

  • fix: title of action

  • fix: title of action (7c04fda)

Fix

  • fix: Examples in readme (#55)

  • ci: workflow dispatch for release

  • docs: Updated readme with more examples

  • docs: Added docs to base class (94f3a3c)