Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update logic for LIKE operator for Case Insensitive collation #3439

Open
wants to merge 35 commits into
base: BABEL_5_X_DEV
Choose a base branch
from

Conversation

ahmed-shameem
Copy link
Contributor

@ahmed-shameem ahmed-shameem commented Jan 30, 2025

Description

We have supported LIKE from TSQL side for case insensitive collation by transforming LIKE to ILIKE (this is a PG operator which sorts strings case insensitively based on active locale). We have also introduced optimisation in place for the case when the rightop of the LIKE operator is a Const node and has prefix. Eg:

-- CASE 1

SELECT COL FROM TAB WHERE COL LIKE 'ab'

-- CASE 2

SELECT COL FROM TAB WHERE COL LIKE 'a%'

This is the optimisation that we follow:

expression LIKE pattern

will become if pattern has exact match

expression = pattern

will become if pattern has prefix match

expression ILIKE pattern COLLATE cs_as AND
expression BETWEEN patternConstPrefix AND patternConstPrefix || E'\uFFFF 

E’\uFFFF is the highest sort key. We use the BETWEEN operator as in PG we can NOT use index scan for LIKE/ILIKE operator. Whereas introducing BETWEEN will enable us to do so. And the CS_AS collation is used to store the locale info for ILIKE.

For supporting case insensitive accent insensitive collations like latin1_general_ci_ai , we transform the node firstly by adding a FuncExpr (remove_accents_internal) which removes the accent and then we apply the above logic.

Currently we have observed that we do not comply with the transformation above. Rather than COLLATE CS_AS, we observe COLLATE “default” in query plan. Consider the following query:

-- query (column a is collated with latin1_genetal_ci_as)

select a from varchar_tst where a collate chinese_prc_ci_as LIKE 'ab%'

-- query plan

Filter: (((a)::text ~~* 'ab%'::text COLLATE "default") 
AND ((a)::text >= 'ab'::text COLLATE "default") AND ((a)::text < 'ab?'::text))

Even though we observe COLLATE "default" in the query plan, it picks up the correct collation oid during execution.
However, if we update the collation oid in typeTypeCollation function to pickup database collation oid for string literals in parse tree, we may observe upgrade/restore failure.

We use this task as first step to modify and incorporate improvements for such cases.

Issues Resolved

Task: BABEL-5608
Signed-off-by: Shameem Ahmed shmeeh@amazon.com

Test Scenarios Covered

  • Use case based -

  • Boundary conditions -

  • Arbitrary inputs -

  • Negative test cases -

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coveralls
Copy link
Collaborator

coveralls commented Jan 30, 2025

Pull Request Test Coverage Report for Build 13695695834

Details

  • 25 of 25 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.009%) to 75.021%

Totals Coverage Status
Change from base Build 13673774583: 0.009%
Covered Lines: 47169
Relevant Lines: 62874

💛 - Coveralls

@@ -607,3 +607,4 @@ sys_sql_logins
sys-login-property
sys-fn-varbintohexsubstring
BABEL_OBJECT_RESOLUTION_IN_ROUTINES
BABEL-5608
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BABEL-5608 is only schedule in the latest version upgrade test ? Why ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in other upgrade paths as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants