Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: allow user to use a boto3-native credential provider/session for s3 etc. #15838

Closed
jwhitaker-gridcog opened this issue Apr 22, 2024 · 9 comments
Assignees
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature

Comments

@jwhitaker-gridcog
Copy link

jwhitaker-gridcog commented Apr 22, 2024

Description

It's nice that Polars has native support for s3, however we have issues with using it with less common credential setups.

In short, I would really like it it I could just rely on boto3 to manage credentials for me - there's a long tail of things that the rust-side object_store seems to struggle with. (for context, when I talk about polars i mean py-polars).

Examples I have hit personally:

  • passing profile_name in storage_options (workaround: open a boto client and export with get_frozen_credentials)
  • handling assumed roles (workaround: open a boto client, assume the role, export the sts response; or, use an approach like https://github.com/boto/boto3/pull/3253/files to make the boto client assume to role for you + handle refreshing etc).
  • you have probably hit more - I guess "making aws creds work like users expect from boto" is a recurring thing that pops up.

In particular, the workarounds are a problem when credentials expire. A boto session/credential handles refreshing for you. However if I'm passing a session token to polars myself via the above hacks, then I have to handle credential expiration myself, which I can't really do in a foolproof way - what happens if I have a long-lived lazyframe loaded with a 15-minute aws session, for example?

If I could just set up a boto3 session myself, and then pass it to polars and say "hey use this", this would be great from a UX perspective.

This could be general enough to go into a how to read from s3 101 doc, which could avoid the traps new users hit ("scan_parquet('s3://...') doesn't work, oh what's this, i have to go setting session tokens myself? ugh").

From your perspective, I wonder if it could simplify things a lot as well - instead of needing to provide an endless set of hooks to configure the rust side "kind of like a boto session", you could literally rely on boto itself.

On the engineering side though, this would require some glue from object_store back up to the python layer when it needs to fetch/refresh credentials - I don't know if this could be done cleanly/acceptably.

@jwhitaker-gridcog jwhitaker-gridcog added the enhancement New feature or an improvement of an existing feature label Apr 22, 2024
@brianrienecker
Copy link

+1

2 similar comments
@cbartram
Copy link

cbartram commented Jul 3, 2024

+1

@moorthy156
Copy link

+1

@jacoblgoodman
Copy link

This would be a huge help +1!

@fdosani
Copy link

fdosani commented Jul 16, 2024

+1 too

@jkc1
Copy link

jkc1 commented Jul 16, 2024

+1 from me as well

@tustvold
Copy link

I've filed #18979 which should allow for this, by exposing the API object_store already has for this

@nameexhaustion
Copy link
Collaborator

nameexhaustion commented Jan 30, 2025

Closing as the requested functionality is now possible using the credential_provider parameter to the I/O functions. This can accept arbitrary Python functions for retrieving / refreshing credentials.

Available in the current release (1.21.0):

Available in the next release:

  • Passing aws_profile in storage_options - feat(python): Support passing aws_profile in storage_options #20965
    • Before then, you can set the AWS_PROFILE environment variable and ensure boto3 is installed. To ensure this is working, set POLARS_VERBOSE=1 in the environment and ensure you observe that CredentialProviderAWS gets initialized in the logs.

@jwhitaker-gridcog
Copy link
Author

You guys rock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

9 participants