Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching of invalid profiles #938

Open
hutch3232 opened this issue Feb 9, 2025 · 9 comments
Open

caching of invalid profiles #938

hutch3232 opened this issue Feb 9, 2025 · 9 comments

Comments

@hutch3232
Copy link
Contributor

When I am writing functions that use s3fs, I find myself using skip_instance_cache a lot. Doing so seems to noticeably worsen performance, which makes sense. The reason I am skipping caching of the instances is because with the cache, I have found that if I have certain issues (like ProfileNotFound in ~/.aws/credentials or PermissionError) then even if I subsequently generate the credentials, s3fs will continue to throw the same errors. Ultimately I have to restart my kernel (I think this is the solution) in order to reset things.

It would be nice if s3fs automatically removed profiles from the cache if they get some kind of access / permission error like above.

Alternatively, or perhaps in addition to, it would be cool if s3fs.S3FileSystem() had like a check argument that could (optionally) test the credentials during instantiation. For example, running aws sts get-caller-identity, per: https://stackoverflow.com/a/66874437/9244371

More context on the problem I'm trying to solve:
In my environment, I have to generate new keys on a regular basis that provide access to a particular bucket with a known profile name. I will write functions that other users will run to interact with certain buckets (so I know the profile). I skip instance caching because I don't yet know if they have generated valid keys. I can then catch some exception and generate keys and then try again.

@martindurant
Copy link
Member

You bring up a valid case.

I would suggest, that the correct thing to do, is not to disable caching of the instances, but to catch the "token expired" exceptions and in that case remake the instances session attribute. Suppose we can reliably catch somehow, I think that would solve everything. Doing this for any "unauthorised" exception is probably the wrong thing to do, since those can occur for many occasions where the credentials are still valid.

You might find useful that there is .clear_instance_cache() class method which you can use to control when you will start with new fresh instances, rather than blanket skipping the cache.

The cache itself could also be made smarter, such that instances only last a certain while - that would be relatively easy to code.

@hutch3232
Copy link
Contributor Author

Thanks for this information!

Two follow-up questions I have: Since .clear_instance_cache() is a classmethod I think that means all instances' in the cache will be cleared? I wonder if that could be instance-specific or if there was a reconnect method (or something).

Oh, maybe that is what connect(refresh=True) does?

@martindurant
Copy link
Member

maybe that is what connect(refresh=True) does?

Yes! I had forgotten you can call it like this; it's exactly what I meant by remaking the session. Note that you might call _set_session if in async mode, and I don't suppose there's any thread-safety guarantee here.

that means all instances' in the cache will be cleared

Yes. There is no documented way to kick out one specific instance.

@hutch3232
Copy link
Contributor Author

I tried messing around with this a bit, still not sure it's exactly what I was hoping for.

import s3fs

# i started out by connecting to a profile that did not yet exist
# my plan was to try "reconnecting" after creating the profile
s3 = s3fs.S3FileSystem(profile="my-profile")

s3.ls("s3://my-bucket/my-prefix")

Traceback to the ProfileNotFound error:


---------------------------------------------------------------------------
ProfileNotFound                           Traceback (most recent call last)
/mnt/code/python/test_s3fs.py in line 1
----> 8 s3.ls("s3://my-bucket/my-prefix")

File /mnt/code/python/.venv/lib/python3.9/site-packages/fsspec/asyn.py:118, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    115 @functools.wraps(func)
    116 def wrapper(*args, **kwargs):
    117     self = obj or args[0]
--> 118     return sync(self.loop, func, *args, **kwargs)

File /mnt/code/python/.venv/lib/python3.9/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
    101     raise FSTimeoutError from return_result
    102 elif isinstance(return_result, BaseException):
--> 103     raise return_result
    104 else:
    105     return return_result

File /mnt/code/python/.venv/lib/python3.9/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
     54     coro = asyncio.wait_for(coro, timeout=timeout)
     55 try:
---> 56     result[0] = await coro
     57 except Exception as ex:
     58     result[0] = ex

File /mnt/code/python/.venv/lib/python3.9/site-packages/s3fs/core.py:1019, in S3FileSystem._ls(self, path, detail, refresh, versions)
   1017     files = await self._lsbuckets(refresh)
   1018 else:
-> 1019     files = await self._lsdir(path, refresh, versions=versions)
   1020     if not files and "/" in path:
   1021         try:

File /mnt/code/python/.venv/lib/python3.9/site-packages/s3fs/core.py:734, in S3FileSystem._lsdir(self, path, refresh, max_items, delimiter, prefix, versions)
    732 dirs = []
    733 files = []
--> 734 async for c in self._iterdir(
    735     bucket,
    736     max_items=max_items,
    737     delimiter=delimiter,
    738     prefix=prefix,
    739     versions=versions,
    740 ):
    741     if c["type"] == "directory":
    742         dirs.append(c)

File /mnt/code/python/.venv/lib/python3.9/site-packages/s3fs/core.py:765, in S3FileSystem._iterdir(self, bucket, max_items, delimiter, prefix, versions)
    761 if versions and not self.version_aware:
    762     raise ValueError(
    763         "versions cannot be specified if the filesystem is not version aware"
    764     )
--> 765 await self.set_session()
    766 s3 = await self.get_s3(bucket)
    767 if self.version_aware:

File /mnt/code/python/.venv/lib/python3.9/site-packages/s3fs/core.py:551, in S3FileSystem.set_session(self, refresh, kwargs)
    547 else:
    548     s3creator = self.session.create_client(
    549         "s3", config=conf, **init_kwargs, **client_kwargs
    550     )
--> 551     self._s3 = await s3creator.__aenter__()
    553 self._s3creator = s3creator
    554 # the following actually closes the aiohttp connection; use of privates
    555 # might break in the future, would cause exception at gc time

File /mnt/code/python/.venv/lib/python3.9/site-packages/aiobotocore/session.py:25, in ClientCreatorContext.__aenter__(self)
     24 async def __aenter__(self) -> AioBaseClient:
---> 25     self._client = await self._coro
     26     return await self._client.__aenter__()

File /mnt/code/python/.venv/lib/python3.9/site-packages/aiobotocore/session.py:144, in AioSession._create_client(self, service_name, region_name, api_version, use_ssl, verify, endpoint_url, aws_access_key_id, aws_secret_access_key, aws_session_token, config)
    141 elif default_client_config is not None:
    142     config = default_client_config
--> 144 region_name = self._resolve_region_name(region_name, config)
    146 # Figure out the verify value base on the various
    147 # configuration options.
    148 if verify is None:

File /mnt/code/python/.venv/lib/python3.9/site-packages/botocore/session.py:1021, in Session._resolve_region_name(self, region_name, config)
   1019         region_name = config.region_name
   1020     else:
-> 1021         region_name = self.get_config_variable('region')
   1023 validate_region_name(region_name)
   1024 # For any client that we create in retrieving credentials
   1025 # we want to create it using the same region as specified in
   1026 # creating this client. It is important to note though that the
   (...)
   1031 # the credentials returned at regional endpoints are valid across
   1032 # all regions in the partition.

File /mnt/code/python/.venv/lib/python3.9/site-packages/botocore/session.py:323, in Session.get_config_variable(self, logical_name, methods)
    319 if methods is not None:
    320     return self._get_config_variable_with_custom_methods(
    321         logical_name, methods
    322     )
--> 323 return self.get_component('config_store').get_config_variable(
    324     logical_name
    325 )

File /mnt/code/python/.venv/lib/python3.9/site-packages/botocore/configprovider.py:488, in ConfigValueStore.get_config_variable(self, logical_name)
    486     return None
    487 provider = self._mapping[logical_name]
--> 488 return provider.provide()

File /mnt/code/python/.venv/lib/python3.9/site-packages/botocore/configprovider.py:694, in ChainProvider.provide(self)
    687 """Provide the value from the first provider to return non-None.
    688
    689 Each provider in the chain has its provide method called. The first
    690 one in the chain to return a non-None value is the returned from the
    691 ChainProvider. When no non-None value is found, None is returned.
    692 """
    693 for provider in self._providers:
--> 694     value = provider.provide()
    695     if value is not None:
    696         return self._convert_type(value)

File /mnt/code/python/.venv/lib/python3.9/site-packages/botocore/configprovider.py:781, in ScopedConfigProvider.provide(self)
    779 def provide(self):
    780     """Provide a value from a config file property."""
--> 781     scoped_config = self._session.get_scoped_config()
    782     if isinstance(self._config_var_name, tuple):
    783         section_config = scoped_config.get(self._config_var_name[0])

File /mnt/code/python/.venv/lib/python3.9/site-packages/botocore/session.py:422, in Session.get_scoped_config(self)
    417     return profile_map.get('default', {})
    418 elif profile_name not in profile_map:
    419     # Otherwise if they specified a profile, it has to
    420     # exist (even if it's the default profile) otherwise
    421     # we complain.
--> 422     raise ProfileNotFound(profile=profile_name)
    423 else:
    424     return profile_map[profile_name]

ProfileNotFound: The config profile (my-profile) could not be found
# add the profile to ~/.aws/credentials and ~/.aws/config

s3.ls("s3://my-bucket/my-prefix")

# of course returns same error due to cached values

# so let's try reconnecing
s3.connect(refresh=True)

s3.ls("s3://my-bucket/my-prefix")

# unfortunately still same error

s3.clear_instance_cache()

s3.ls("s3://my-bucket/my-prefix")

# still same error


# restarting kernel
# now the profile exists (but intentionally gave it bad keys)

s3 = s3fs.S3FileSystem(profile="my-profile")

s3.ls("s3://my-bucket/my-prefix")

# PermissionError: The AWS access key Id you provided does not exist in our records.

s3.clear_instance_cache()
s3.connect(refresh=True)

# Did not fix it.

s3.clear_instance_cache()
s3 = s3fs.S3FileSystem(profile="my-profile")
s3.ls("s3://my-bucket/my-prefix")

# did fix it but seems less ideal because I break all my instances instead of reconnecting on just one
# and I need to reinitialize the instance

@martindurant
Copy link
Member

I should clarify: the instance cache only makes a difference when making new instances, i.e., calling S3FileSystem() directly or via fsspec.open, fsspec.filesystem. It won't affect instances you have as local variables.

Within a session, the _s3 attribute is the client session, and all operations call .set_session() to make sure it exists. (connect is an alias for set_session)

s3.connect(refresh=True)

this should really fix it. It remakes all of the kwargs from initial values and makes new client objects, just like a new instance would, so we should figure out why it's not doing what you require.

@hutch3232
Copy link
Contributor Author

Thanks for the tips!

I looked at set_session and see this:

s3fs/s3fs/core.py

Lines 491 to 502 in 023aecf

client_kwargs = self.client_kwargs.copy()
init_kwargs = dict(
aws_access_key_id=self.key,
aws_secret_access_key=self.secret,
aws_session_token=self.token,
endpoint_url=self.endpoint_url,
)
init_kwargs = {
key: value
for key, value in init_kwargs.items()
if value is not None and value != client_kwargs.get(key)
}

I am only instantiating my object with profile, so client_kwargs, .key, .secret, etc. are None.

Then I saw this line:

s3fs/s3fs/core.py

Lines 525 to 526 in 023aecf

if self.session is None:
self.session = aiobotocore.session.AioSession(**self.kwargs)

In my code, after creating an instance that gets the ProfileNotFound, I generated valid keys then ran:

s3.session = None
s3.connect(refresh=True)

Then it worked!

Would it make sense for that to also have or refresh to force create one?

@martindurant
Copy link
Member

You are quite right, the session should not persist when refresh=True is set.

@hutch3232
Copy link
Contributor Author

Awesome, may I submit a PR with that change? I might add a bit to the docstring as well if that is okay.

@martindurant
Copy link
Member

Yes please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants