Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable auto merging of tablets #5353

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

cshannon
Copy link
Contributor

This change adds support for periodically scanning tables to find ranges of tables that can be automatically merged. The thread that runs uses the recently added TabletMergeabilty column as well as total file counts/sizes to compute if a range can be merged. By default, tablets can be merged if the total size is less than 25% of the split threshold so we don't immediately merge only to split again.

There is a new thread in the Manager class that performs the computation and will submit fate jobs for merging on tablet ranges. There is a new fate operation during merge that will validate the range is still ok to merge if the merge was submitted as a system merge.

This closes #5014

I am making this a draft because while it's feature complete, I still think there's some more work to do on the tests and tweaks to make, but wanted to get some feedback/suggestions.

Outstanding issues/note/questions:

  1. For user created tables, the default tablet was previously set to never merge, but I think we should change this to always so i switched the default tablet to be a mergeability of always in this PR. The reason is we can split the default tablet, so we should be able to merge other tablets back to it. It will never go away because the end row is null. One reason why I did this is there is no way in the API right now to change the mergeability for the default tablet (end row is null which you can't put into a map as a key) so I figure we should just make it always mergeable and any other splits would not be mergeable by default for user created tablets. This would be easy to set back to never by default if desired, but if keep the default tablet as never then we can never merge back to a single tablet again.
  2. The default tables (metadata, fate, scanref) are configured with their initital tablet(s) to have a mergeability setting of never. Should we change this? I am thinking I probably want to switch the default tablet to be mergeable just like I did for user tables for consistency. However, pre-splitting is ever done the tablets would need to be made sure to be marked as never merge so they don't get merged away due to a size of 0.
  3. What thread pool should we use for FindMergeableRangeTask? What interval by default? For now I put in a todo marker and just used the default context scheduler and set a run time of every 24 hours.
  4. As mentioned, the tests could definitely be improved with more edge cases and complexity. The TabletMergeabilityIT especially only has 4 tests for now to demonstrate but there's a lot more tests we could write to verify the checks work like max files, sizes, etc work and we get the exepcted merges done.

This change adds support for periodically scanning tables to find ranges
of tables that can be automatically merged. The thread that runs uses
the recently added TabletMergeabilty column as well as total file
counts/sizes to compute if a range can be merged. By default, tablets
can be merged if the total size is less than 25% of the split threshold
so we don't immediately merge only to split again.

There is a new thread in the Manager class that performs the computation
and will submit fate jobs for merging on tablet ranges. There is a new
fate operation during merge that will validate the range is still ok to
merge if the merge was submitted as a system merge.

This closes apache#5014
@cshannon cshannon force-pushed the accumulo-5014-merge-thread branch from 0e8e73b to c089d87 Compare February 23, 2025 20:32
@cshannon cshannon self-assigned this Feb 23, 2025
@cshannon cshannon added this to the 4.0.0 milestone Feb 23, 2025
@cshannon
Copy link
Contributor Author

@keith-turner and @dlmarion - I gave details in the description but even though this is marked as a draft it's still ready for review and feedback.

@cshannon
Copy link
Contributor Author

cshannon commented Feb 24, 2025

Looks like there's a couple tests to fix which I can do when I get a chance, they are boken cause the PR changes the default tablet mergeability setting to always instead of never as mentioned in the description.

org.apache.accumulo.test.functional.CreateInitialSplitsIT.testCreateInitialSplitsWithMergeability
org.apache.accumulo.test.functional.SplitIT.tabletShouldSplit

Edit: Fixed tests in fc0dbec

@@ -1292,6 +1293,13 @@ boolean canSuspendTablets() {
ThreadPools.watchCriticalScheduledTask(context.getScheduledExecutor()
.scheduleWithFixedDelay(() -> ScanServerMetadataEntries.clean(context), 10, 10, MINUTES));

// TODO - create new threadpool?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this cannot go into the TabletGroupWatcher because it needs to look at more than one Tablet? I think using the general scheduled executor threadpool should be fine. However, we may want to consider bumping the default value of GENERAL_THREADPOOL_SIZE from 1 to something larger. Probably good for a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TGW doesn't work in this case because because the task needs to look at multiple tablets together and not just one tablet at a time. Also, the task likely doesn't need to run very often to look for mergeable ranges where as TGW runs frequently

@dlmarion
Copy link
Contributor

Is there work to do in the upgrade code, or has that already been done?

@cshannon
Copy link
Contributor Author

Is there work to do in the upgrade code, or has that already been done?

There is still upgrade work to do, we should go through and mark all existing tablets as "never" merge on upgrade. If I recall the default will be never if the column is absent but I think it's better to be explicit and mark the tablets.

long totalFiles = 0;
int totalUnMergeable = 0;

for (var tabletMetadata : tablets) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first saw this code I though maybe it could use the MergeableRange code and try to add all tablets in the range to that. However that would not give the reason that a tablet could not be added, so it almost works but not quite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same thing, I have/had been trying to figure out a good way to share the same logic between both but it's just different enough there wasn't really a good way to share the code but i'll keep thinking about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if MergeableRange.add() returned an enum instead of a boolean if that could work. The enum would indicate if added or if not added then why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored things to share MergeableRange between the task and verify in 3c84e85


manager.fate(type).seedTransaction(FateOperation.SYSTEM_MERGE, fateId,
manager.fate(type).seedTransaction(FateOperation.SYSTEM_MERGE, fateKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will wait for the data to be written to the walog of the fate table for each operation seeded. W/ the changes in #5160 this is another place where waiting on the walog per operation can be avoided.

@@ -683,8 +689,12 @@ protected void testListFateKeys(FateStore<TestEnv> store, ServerContext sctx) th
var fateKey3 = FateKey.forCompactionCommit(cid1);
var fateKey4 = FateKey.forCompactionCommit(cid2);

// use one overlapping extent and one different
var fateKey5 = FateKey.forMerge(extent1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a change for this PR. Was looking around to see if FateKey had any unit test that should be updated and did not see any. Could be useful to add some unit test to check equals, hashcode, serder, etc in another PR.

@cshannon cshannon marked this pull request as ready for review March 8, 2025 18:33
@cshannon
Copy link
Contributor Author

cshannon commented Mar 9, 2025

I ran the full IT build and it passed so this is ready for another review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add identifier to split points for automated vs user created split points
3 participants