Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_database_uri fails with "ValueError: arrow2" because of recent connectorx update #21274

Closed
2 tasks done
NicolasLacroix opened this issue Feb 14, 2025 · 2 comments
Closed
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@NicolasLacroix
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
df = pl.read_database_uri(query="SELECT * FROM mytable", uri="mysql://...")

Log output

ValueError                                Traceback (most recent call last)
File ~/.local/share/virtualenvs/profilingPipelines-dT1tPwgB/lib/python3.12/site-packages/polars/io/database/_utils.py:65, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
     64 try:
---> 65     tbl = cx.read_sql(
     66         conn=connection_uri,
     67         query=query,
     68         return_type="arrow2",
     69         partition_on=partition_on,
     70         partition_range=partition_range,
     71         partition_num=partition_num,
     72         protocol=protocol,
     73     )
     74 except BaseException as err:
     75     # basic sanitisation of /user:pass/ credentials exposed in connectorx errs

File ~/.local/share/virtualenvs/profilingPipelines-dT1tPwgB/lib/python3.12/site-packages/connectorx/__init__.py:426, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col, strategy, pre_execution_query)
    425 else:
--> 426     raise ValueError(return_type)
    428 return df

ValueError: arrow2

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In[12], line 2
      1 res = (
----> 2     pl.read_database_uri(query=query, uri="mysql://user:password@localhost:3306/mydatabase")
      3         .filter((pl.col("s1_min") - pl.col("s0_max") == 1) &
      4                 (pl.col("s3_min") - pl.col("s2_max") == 1) &
      5                 (pl.col("s5_min") - pl.col("s4_max") == 1))
      6 )

File ~/.local/share/virtualenvs/profilingPipelines-dT1tPwgB/lib/python3.12/site-packages/polars/io/database/functions.py:434, in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options)
    432         msg = "the 'connectorx' engine does not support use of `execute_options`"
    433         raise ValueError(msg)
--> 434     return _read_sql_connectorx(
    435         query,
    436         connection_uri=uri,
    437         partition_on=partition_on,
    438         partition_range=partition_range,
    439         partition_num=partition_num,
    440         protocol=protocol,
    441         schema_overrides=schema_overrides,
    442     )
    443 elif engine == "adbc":
    444     if not isinstance(query, str):

File ~/.local/share/virtualenvs/profilingPipelines-dT1tPwgB/lib/python3.12/site-packages/polars/io/database/_utils.py:77, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
     74 except BaseException as err:
     75     # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
     76     errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
---> 77     raise type(err)(errmsg) from err
     79 return from_arrow(tbl, schema_overrides=schema_overrides)

ValueError: arrow2

Issue description

Calling the read_database_uri function with default connectorx driver raises an exception.
This seems to be caused to a recent change in the connectorx API v0.4.2 where they removed the "arrow2" parameter value for return_type which is still used here in polars.

Expected behavior

Calling the read_database_uri function with default connectorx driver shouldn't raise an exception.

Installed versions

--------Version info---------
Polars:              1.22.0
Index type:          UInt32
Platform:            Linux-6.8.0-52-generic-x86_64-with-glibc2.39
Python:              3.12.6 (main, Sep 17 2024, 16:32:19) [GCC 11.4.0]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           0.4.2
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           3.10.0
numpy                2.2.1
openpyxl             <not installed>
pandas               2.2.3
pyarrow              <not installed>
pydantic             2.10.6
pyiceberg            <not installed>
sqlalchemy           2.0.38
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed
@NicolasLacroix NicolasLacroix added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 14, 2025
@NicolasLacroix
Copy link
Author

For a temporary workaround, downgrading connectorx to v0.4.1 works fine.

@ritchie46
Copy link
Member

fixed by #21277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants