Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP adds new highly concurrent tablet location cache #5311

Draft
wants to merge 2 commits into
base: 2.1
Choose a base branch
from

Conversation

keith-turner
Copy link
Contributor

This is a draft change that contains the basic incomplete structure of a new highly concurrent tablet location cache. For 2.1 this new implementation could be added along side the existing battle tested implementation with an option to use the new one. The current structure with an abstract class for TabletLocator would make it easy to switch the implementation based on a client config property or java property.

The current tablet location cache has read/write locks that accomplish two goals.

  1. Protect the integrity of an in memory tree map from concurrent updates.
  2. Avoids N threads doing N metadata lookups on a cache miss. The write lock in the code will usually result in a single metadata lookup in the case where many threads show up and want info from the same area of the metadata table.

This changes replaces the tree map with a concurrent map, removing the need to have locks protecting the data structure. Then it proposes a mechanism in ConcurrentTabletLocator.requestLookup() to attempt to avoid many concurrent metadata lookups for the same information.

If this implementation were fleshed out, it seems like it may end up being much simpler than the existing cache. The existing code has a lot of complexity related to first trying with a read lock and then switching to a write lock on cache miss, these changes do not have any of that complexity.

This is a draft change that contains the basic incomplete structure of a
new highly concurrent tablet location cache.  For 2.1 this new
implementation could be added along side the existing battle tested
implementation with an option to use the new one.  The current structure
with an abstract class for TabletLocator would make it easy to switch the
implementation based on a client config property or java property.

The current tablet location cache has read/write locks that accomplish
two goals.

 1. Protect the integrity of an in memory tree map from concurrent
    updates.
 2. Avoids N threads doing N metadata lookups on a cache miss.
    The write lock in the code will usually result in a single metadata
    lookup in the case where many threads show up and want info from the
    same area of the metadata table.

This changes replaces the tree map with a concurrent map, removing the
need to have locks protecting the data structure.  Then it proposes a
mechanism in `ConcurrentTabletLocator.requestLookup()` to attempt to
avoid many concurrent metadata lookups for the same information.

If this implementation were fleshed out, it seems like it may end up
being much simpler than the existing cache.  The existing code has a lot
of complexity related to first trying with a read lock and then
switching to a write lock on cache miss, these changes do not have any of
that complexity.
@keith-turner keith-turner added this to the 2.1.4 milestone Feb 6, 2025
@keith-turner
Copy link
Contributor Author

Was able to run mvn verify -Psunny w/ the changes in e205ca7 which used the new cache impl for all test.

@keith-turner
Copy link
Contributor Author

Tried running all ITs w/ these changes and that passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant