Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake 0.11 driver incorrectly wrapping table names in double quotes #1721

Closed
davlee1972 opened this issue Apr 15, 2024 · 7 comments · Fixed by #1738
Closed

Snowflake 0.11 driver incorrectly wrapping table names in double quotes #1721

davlee1972 opened this issue Apr 15, 2024 · 7 comments · Fixed by #1738
Labels
Type: bug Something isn't working

Comments

@davlee1972
Copy link

What happened?

This was working fine in 0.10 and the GO 1.72 driver.

Starting in 0.11 and the GO 1.80 driver table names are being incorrectly wrapped in double quotes.
This is breaking SQL selects now which can't find the table..

The ANSI standard below should be followed.

https://forum.knime.com/t/inquiry-regarding-double-quotes-in-table-name-creation/77438/2

Double quotes will be applied around the name of a table when the table name is not all uppercase, or contains spaces and possibly some other characters

If the table name is purely upper case and underscores, the DDL statement that snowflake shows won’t contain double quotes:

How can we reproduce the bug?

No response

Environment/Setup

No response

@davlee1972 davlee1972 added the Type: bug Something isn't working label Apr 15, 2024
@lidavidm
Copy link
Member

CC @zeroshade I thought we did something weird here because Snowflake has weird casing rules

@zeroshade
Copy link
Member

@lidavidm Snowflake's way of handling things is that if it's not in quotes, everything gets upper-cased automatically. Because our CI tests were using a lower-cased table name, we were running into an issue where some queries used quotes and others didn't in the validation suite. We then surrounded every query with quotes in the validation suite, but the table name itself was not surrounded with quotes for ingestion so we ended up with flaky CI tests where it might or might not find the table. #1561 fixed the flakiness of the tests by making all of the casing consistent - what you put in is what you get - by wrapping identifiers in quotes if they weren't already.

Double quotes will be applied around the name of a table when the table name is not all uppercase, or contains spaces and possibly some other characters

If the table name is purely upper case and underscores, the DDL statement that snowflake shows won’t contain double quotes

In theory, that's essentially how the current 0.11.0 version of the driver works. If you provide a string for a table name where the characters aren't all uppercase or contains spaces / possibly other characters, then because we wrap it in double quotes, the table name will be handled correctly as provided. If the table name is provided with all of the characters already uppercased and without any special characters, then the fact that we add quotes to it won't affect its usage at all.

So this pretty much comes down to a question of whether or not we force users to perform the upper-casing themselves (the current version) or force users to include their own double quotes like they would if they were writing the raw SQL query themselves. There isn't a programmatic way for us to be able to tell what the user wants before-hand, the only alternative would be to make the auto-quoting an option they can toggle (and then we need to decide on a default).

@davlee1972
Copy link
Author

Well my existing tables are all upper caps with some underscores and they are still getting created with double quotes when ADBC is set to create or create append mode..

We’re also using sqlglot to generate sql and it does not add quotes when using dialect=snowflake

@davlee1972
Copy link
Author

@davlee1972
Copy link
Author

davlee1972 commented Apr 16, 2024

https://github.com/tobymao/sqlglot/blob/main/sqlglot/dialects/snowflake.py

Sqlglot reference..

  1. UPPERCASE = auto()
    """Unquoted identifiers are uppercased."""

  2. Otherwise a quoted_identifier function is called.. Looks like there is a special case for “dual”

@davlee1972
Copy link
Author

davlee1972 commented Apr 16, 2024

Here are my testing results between 0.10 and 0.11. Will test dummy, DUMMY, "dummy" and "DUMMY" with and without fully qualified schema..

I think we should follow Snowflake and SQLGLOT expected behavior as odd as it is to avoid any type of confusion.

If the table has no quotes then auto UPPERCASE the name.
If the table has quotes then use it as is with quotes included.

Some additional work might be needed for apply the logic to components like db and schema independently. "MY_DB".my_schema,"my_table" example should resolve to MY_DB.MY_SCHEMA.my_table.

input table table > sql create statement > final target table name

adbc 0.10.0 results:

dummy > CREATE TABLE  IF NOT EXISTS dummy > Table DUMMY successfully created.
DUMMY > CREATE TABLE  IF NOT EXISTS DUMMY > Table DUMMY successfully created.
"dummy" > CREATE TABLE  IF NOT EXISTS "dummy" > Table dummy successfully created.
"DUMMY" > CREATE TABLE  IF NOT EXISTS "DUMMY" > Table DUMMY successfully created.
my_db.my_schema.dummy > CREATE TABLE  IF NOT EXISTS my_db.my_schema.dummy > Table DUMMY successfully created.
MY_DB.MY_SCHEMA.DUMMY > CREATE TABLE  IF NOT EXISTS MY_DB.MY_SCHEMA.DUMMY > Table DUMMY successfully created.
"my_db.my_schema.dummy"> CREATE TABLE  IF NOT EXISTS my_db.my_schema.dummy > Table my_db.my_schema.dummy successfully created.
"MY_DB.MY_SCHEMA.DUMMY" > CREATE TABLE  IF NOT EXISTS MY_DB.MY_SCHEMA.DUMMY > Table MY_DB.MY_SCHEMA.DUMMY successfully created.

*Note - The last two examples created tables using the literal name.
So MY_DB.MY_SCHEMA."my_db.my_schema.dummy" is the actual table..

adbc 0.11.0 results: Only two out of eight examples produces reasonable results. Half of them have hard errors.

dummy > CREATE TABLE  IF NOT EXISTS "dummy" > Table dummy successfully created.
DUMMY > CREATE TABLE  IF NOT EXISTS "DUMMY" > Table DUMMY successfully created.
"dummy" > CREATE TABLE  IF NOT EXISTS "\"dummy\"" >  SQL compilation error: syntax error line 1 at position ??? 
"DUMMY" > CREATE TABLE  IF NOT EXISTS "\"DUMMY\"" >  SQL compilation error: syntax error line 1 at position ??? 
my_db.my_schema.dummy > CREATE TABLE  IF NOT EXISTS "my_db.my_schema.dummy" > Table my_db.my_schema.dummy successfully created.
MY_DB.MY_SCHEMA.DUMMY > CREATE TABLE  IF NOT EXISTS "MY_DB.MY_SCHEMA.DUMMY" > Table MY_DB.MY_SCHEMA.DUMMY successfully created.
"my_db.my_schema.dummy"> CREATE TABLE  IF NOT EXISTS "\"my_db.my_schema.dummy\"" > SQL compilation error: syntax error line 1 at position ??? unexpected '.'
"MY_DB.MY_SCHEMA.DUMMY" > CREATE TABLE  IF NOT EXISTS "\"MY_DB.MY_SCHEMA.DUMMY\"" >  SQL compilation error: syntax error line 1 at position ??? unexpected '.'

*Note - wrapping table names in quotes breaks fully qualified names for examples 5 and 6.
The tables are created with the literal string value in examples 5 and 6.
Examples 3, 4, 7 and 8 result in hard errors.

In my use case I'm passing in a fully qualified name with db, schema and table all in UPPERCASE with some underscores.

@zeroshade
Copy link
Member

@davlee1972 Thanks for the examples here, it's definitely a problem if we're breaking when someone passes quotes in as part of the string. I'll take a look and see what our best solution here is, most likely the best choice is going to be to go back to putting the onus on the caller to add the quotes and make some modifications in our validation suite.

I'll do a first pass at this tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants