Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C] Research Turbodbc/Arrowdantic for developing ODBC-wrapping driver #72

Open
lidavidm opened this issue Aug 19, 2022 · 14 comments
Open

Comments

@lidavidm
Copy link
Member

lidavidm commented Aug 19, 2022

Arrowdantic: https://github.com/jorgecarleitao/arrowdantic/
Turbodbc: https://github.com/blue-yonder/turbodbc/
arrow-odbc: https://github.com/pacman82/arrow-odbc

@pacman82
Copy link

Hi there 👋 . I am the author of arrow-odbc and a typo braught me here. A quick heads up:

Turbodbc:

  • Uses ODBC C Interface directly from C++ and fills arrow (C++ official implementation) arrays in C++.
  • pyarrow is backed by the C++ arrow implementation. Python C API is used for interfacing
  • For all things to work together, C++ ABI, Python C-API, boost version, arrow version must match. Somewhat frickle build process.
  • Scope: Complies with Python Database API Specification 2.0 (PEP 249)

Arrowdantic (at the best of my knowledge):

  • Uses ODBC from a Rust crate (odbc-api) which and fills arrow2(Rust crate) arrays directly in Rust
  • Provides Python bindings for arrow2.
  • Scope: More an alternative to pyarrow with built-in ODBC support

arrow-odbc

  • Uses ODBC from a Rust crate (odbc-api) which talks to Python via C-Interface.
  • Uses arrow (Rust crate, official implementation) and Arrow-C Interface to interface with pyarrow
  • Scope: Read and write pyarrow arrays with ODBC from and to databases.

Cheers,
Markus

@lidavidm
Copy link
Member Author

Hi, sorry for typosquatting 🙂

Thanks for the breakdown! The scope here would be lower level than any of these. I suppose I'm mostly curious about how each project achieves their speed objectives. Also, the plan would be to use nanoarrow to avoid bringing in dependencies on libarrow, Boost, or anything like that.

@pacman82
Copy link

Hi, sorry for typosquatting 🙂

I don't mind.

Also, the plan would be to use nanoarrow to avoid bringing in dependencies on libarrow, Boost, or anything like that.

Yeah, building that is a pain. Personally I would recommend using one of the Rust implementations (either arrow or arrow2), since Rust links everything static by default, and cargo is way more fun than any C/C++ based build system. You do you, though.

Cheers, Markus

@pacman82
Copy link

Yeah, building that is a pain

To clarify: I was referring the dependencies. I've no experience or knowledge about/with nanoarrow.

@lidavidm lidavidm added this to the 0.2.0 milestone Dec 13, 2022
@lidavidm lidavidm removed this from the ADBC Libraries 0.2.0 milestone Feb 2, 2023
@lidavidm
Copy link
Member Author

An additional snag is that Unix platforms need unixodbc, which is LGPL, and so I'm not sure we can take a dependency on that from an Apache project.

@lidavidm
Copy link
Member Author

Turbodbc indeed requires Boost, but it appears to have a C++ interface, which means we could pull it via ExternalProject/FetchContent. On the other hand, arrow-odbc + our relatively new Rust API definitions is tempting. For me it boils down to whether Turbodbc's optimizations put it ahead of arrow-odbc or not. I think both libraries use SQLBindCol?

@pacman82
Copy link

An additional snag is that Unix platforms need unixodbc, which is LGPL, and so I'm not sure we can take a dependency on that from an Apache project.

I would recommend to link dynamically against the ODBC driver Manager used by the System. As such you would not package unixODBC, but it would be installed seperatly via e.g. the package manager of the Linux distribution.

I think both libraries use SQLBindCol?

Both turbodbc and arrow-odbc use column wise block cursors via SQLBindCol.

Best, Markus

@avhz
Copy link

avhz commented Feb 26, 2025

Coming from #2542. An ODBC bridge would be nice for sure, and more generic than SAP HANA support (my original proposal), so I assume more people will benefit.

If there is some way I can help, I would be interested in learning about Arrow in depth.

@lidavidm
Copy link
Member Author

It needs someone to do the work, broadly. Or someone to sponsor the work, possibly.

@lidavidm
Copy link
Member Author

Oh, and for posterity my current preference is to build on top of arrow-odbc and set up the infra to distribute Rust-based drivers for Python et al (though I'm not looking forward to the CMake part of that). Though I'd be curious if there's a performance comparison between Turbodbc and arrow-odbc

@WillAyd
Copy link
Contributor

WillAyd commented Feb 27, 2025

If CMake is not a hard requirement it might be easier to attempt that build system through Meson, since that natively supports Python and Rust

@lidavidm
Copy link
Member Author

I think we'd want to ship CMake definitions and the rest of the build infra is all still CMake based, unfortunately

@WillAyd
Copy link
Contributor

WillAyd commented Feb 27, 2025

Sounds good. I'm half committed, but I might just prototype with Meson first and come back to CMake later if I get to the point of something stable.

Do our Rust libraries support C/C++ as well or would this just be a Rust + Python driver?

@lidavidm
Copy link
Member Author

All the drivers work via C interop so they should support C/C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants