Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNDB-13074: Reject analysis options on frozen collections #1610

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adelapena
Copy link

It's not allowed to create an index with non-Lucene analysis options on a frozen collection. For example:

CREATE TABLE t(k int PRIMARY KEY, v frozen<set<text>>);
CREATE CUSTOM INDEX ON t(FULL(v)) 
   USING 'StorageAttachedIndex' 
   WITH OPTIONS = {'case_sensitive': false}; -- InvalidRequestException: CQL type frozen<set<text>> cannot be analyzed

This IMO makes sense because it won't support any meaningful operator. However, we don't get that rejection if we try to specify an index_analyzer. For example:

CREATE TABLE t(k int PRIMARY KEY, v frozen<set<text>>);
INSERT INTO %s (k, v) VALUES (0, {'apples'})
CREATE CUSTOM INDEX ON t(FULL(v)) 
   USING 'StorageAttachedIndex' 
   WITH OPTIONS = {'index_analyzer':'STANDARD'}; -- Accepted
SELECT k FROM %s WHERE v CONTAINS 'ABC'; -- Column 'v' has an index but does not support the operators specified in the query.
SELECT k FROM %s WHERE v = {'apple'}; -- Accepted, but results are erratic

In this case, the entire serialized collection is treated as a single string and analyzed. This interpretation of the serialized collection as a string includes the metadata at the beginning, it doesn't use any kind of token separators between fields, etc., so it ends up as a bunch of non-printable characters.

This PR simply rejects creating indexes with analysis options on frozen collections, given that we don't have a way to correctly index them, and we don't support querying either.

@adelapena adelapena self-assigned this Feb 26, 2025
Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1610 rejected by Butler


1 new test failure(s) in 1 builds
See build details here


Found 1 new test failures

Test Explanation Branch history Upstream history
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴 🔵🔵🔵🔵🔵🔵🔵

Found 1 known test failures

@adelapena
Copy link
Author

PR for CNDB: https://github.com/riptano/cndb/pull/13152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants