Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(java/driver/flight-sql): implement getObjects #1517

Merged
merged 18 commits into from
Feb 18, 2024

Conversation

jduo
Copy link
Member

@jduo jduo commented Feb 5, 2024

  • Implement getObjects and each depth (CATALOGS, DB_SCHEMAS, TABLES, and ALL).
  • Handle the cases where the catalog is empty or a schema within a catalog has no tables.
  • Unify readers for queries, getInfo(), and getObjects() so that all code paths can correctly get data when the data is not available on the same location as the original connection, can handle multiple roots from the same stream, and can handle multiple partitions.
  • Rework getInfo() request to lazily issue an RPC call only if a request code was used that needs information from the Flight server.

Fixes #745

@jduo
Copy link
Member Author

jduo commented Feb 5, 2024

This incorporates the work from #605 . However I'm examining the Go Flight SQL ADBC driver to see if I can port some of its implementation here to cover the cases missing (empty catalogs or schemas).

@tokoko
Copy link
Contributor

tokoko commented Feb 6, 2024

@jduo yeah, either that or you will have to preprocess to get those missing cases and add them in correctly sorted order during normal processing. Good thing is that you won't have to keep much of it in some local object in-memory unlike go implementation but the code might get really ugly.

@jduo jduo self-assigned this Feb 7, 2024
@jduo
Copy link
Member Author

jduo commented Feb 8, 2024

For this iteration I'm going to assume the server returns data in the correct order and doesn't do anything crazy such as split the columns from one table across separate partitions.

@jduo jduo force-pushed the java-flight-sql-objects branch from bc44112 to dfff096 Compare February 9, 2024 20:53
@jduo
Copy link
Member Author

jduo commented Feb 14, 2024

I'm almost done writing the implementation of this. It turned out to be a fairly complicated ordeal to compress/denormalize the flattened data from Flight SQL to ADBC's nested lists of structs.

A couple of things:

  • I'm assuming we should not use SqlQuirks to map Arrow types to type names for Flight SQL, since Flight SQL is already Arrow-designed and this should probably have happened in the Flight SQL Producer already.
  • If the user uses Depth ALL to get column metadata, and supplies a column pattern that filters out all columns, I am assuming the result should be that the table is still listed but the columns list is non-null but empty (as opposed to the table being filtered out entirely or the table being listed but the columns list being null).

tokoko and others added 11 commits February 17, 2024 06:20
* Change GetSqlInfo and GetObjects to use the same reader as
FlightInfoReader. This fixes bugs where locations aren't used
for these metadata calls and also makes them use resources
lazily.
* Fix a bug in the FlightInfoReader path where the first stream
is assumed to be from the current client rather than using
a location in the first endpoint.
* Fix handling of empty schemas or empty collections when
using GetObjects.
Fix NPEs and not propagating credentials for some calls
@jduo jduo force-pushed the java-flight-sql-objects branch from 50051ba to c90881d Compare February 17, 2024 14:20
@jduo jduo marked this pull request as ready for review February 17, 2024 14:47
@jduo jduo requested a review from lidavidm as a code owner February 17, 2024 14:47
@jduo
Copy link
Member Author

jduo commented Feb 17, 2024

This PR supports getting catalogs, schemas, tables, and columns. Tests include:

  • catalogs with multiple schemas.
  • schemas with multiple tables
  • empty catalogs
  • catalogs with empty schemas
  • filtering by catalog name
  • filtering by column name
  • filtering by catalog such that an empty result is returned
  • filtering by column name such that no columns are returned
  • skipping filtering by catalog name and column name.

jduo added 3 commits February 17, 2024 07:14
Also call root.allocateNew() in GetInfoReader
<!-- Helpers for mapping Arrow types to ANSI SQL types and building test servers -->
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>flight-sql-jdbc-core</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a heavy dependency to have to pull in, unfortunately

Comment on lines +66 to +68
// Checker framework is considering Arrow functions such as FlightStream.next() as potentially
// altering the state
// and able to change currentStream or schema fields to null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, usually I've gotten around this by doing the equivalent of

if (field == null) {
  throw;
}
T localField = field;  // never null now

Comment on lines +169 to +170
// TODO: actual version
setStringValue(dstIndex++, "0.0.1".getBytes(StandardCharsets.UTF_8));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file an issue for this? (I see it was there originally)

Comment on lines +785 to +787
if (JAVA_REGEX_SPECIALS.indexOf(currentChar) >= 0) {
javaPattern.append('\\');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you only want to do this if it's not a _ or %?

Also, you could just always apply Pattern.quote instead of hardcoding the potential special characters?

Comment on lines +629 to +631
// This is taken from the JDBC driver, but seems wrong that all three branches write the same
// value.
// Float should probably be 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file an issue?

Comment on lines +132 to +133
@SuppressWarnings(
"method.invocation") // Checker Framework does not like the ensureInitialized call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be properly annotated upstream (CC @davisusanibar)

Comment on lines +61 to +64
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.ClassRule;
import org.junit.Test;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using JUnit5, can we stick to that? Or does the TestRule force us onto JUnit 4? (If so, can we file issues to fix this?)

@lidavidm lidavidm merged commit adfb70a into apache:main Feb 18, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

java: bring Flight SQL driver on par with Go Flight SQL driver
3 participants