Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATE Threads Changes #5301

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

kevinrr888
Copy link
Member

This PR:

  • Changes FATE from having one pool of workers per FATE type (META and USER) to having a set of pools per FATE type which each work on their own subset of FATE operations:
    • This is accomplished through a new class FateExecutor. Each FATE has a set of FateExecutors. Each FateExecutor is assigned to one set of FATE operations, has a WorkFinder thread, and a pool of TransactionRunner threads. The WorkFinder finds transactions this FateExecutor can work on, and assigns them to the TransactionRunners which work on them.
    • Fate class now only handles the DeadReservationCleaner thread and the FatePoolsWatcher thread (previously FatePoolWatcher). The DeadReservationCleaner is unchanged. FatePoolsWatcher now oversees all the FateExecutors/pools. The thread periodically checks for changes to the config, resizing any pools as necessary, safely stopping (if a thread is working on a transaction, it is not terminated until it finishes that work) any FateExecutors that are invalidated by config changes (i.e., its assigned set of FATE operations is no longer present in the config), and starting any new/replacement FateExecutors needed. The FatePoolsWatcher also keeps track of the number of idle threads to warn the user if a pool is experiencing a high load and should have its pool size increased (this functionality is unchanged from before, only difference is now a slightly different message is logged when warning the user).
    • The set of pools is determined by the new MANAGER_USER_FATE_CONFIG/MANAGER_META_FATE_CONFIG properties (replaces MANAGER_FATE_THREADPOOL_SIZE).
    • Another difference from this change is now USER and META FATEs can be configured differently whereas previously they shared the same property (MANAGER_FATE_THREADPOOL_SIZE)
    • These properties are json where the keys are a set of FATE operations and values are a pool size.
    • The properties are ensured to be valid json, pool size > 0, contain all FATE operations, contain no invalid FATE operations, and have no duplicate FATE operations
    • The FATE operations can be split across any number of pools of any size.
  • Tests which used the MANAGER_FATE_THREADPOOL_SIZE have now been changed to MANAGER_USER_FATE_CONFIG/MANAGER_META_FATE_CONFIG properties. To keep testing functionality the same, the properties set are json of 1 key/value where the key is all FATE ops and the value is the value originally used in the test.
  • FatePoolResizeIT/MetaFatePoolResizeIT/UserFatePoolResizeIT have been renamed to FatePoolsWatcherIT/MetaFatePoolsWatcherIT/UserFatePoolsWatcherIT since they now test other functionality of the FatePoolsWatcher not just pool resizing. The tests have also been completely changed since resizing has completely changed.

I also ensured all FATE tests still pass and sunny day still passes. I have not yet run the full set of ITs for this.

closes #5130

This commit:
- Changes FATE from having one pool of workers per FATE type (META and USER) to having a set of pools per FATE type which each work on their own subset of FATE operations:
	- This is accomplished through a new class `FateExecutor`. Each FATE has a set of `FateExecutor`s. Each `FateExecutor` is assigned to one set of FATE operations, has a `WorkFinder` thread, and a pool of `TransactionRunner` threads. The `WorkFinder` finds transactions this `FateExecutor` can work on, and assigns them to the `TransactionRunner`s which work on them.
	- `Fate` class now only handles the `DeadReservationCleaner` thread and the `FatePoolsWatcher` thread (previously `FatePoolWatcher`). The `DeadReservationCleaner` is unchanged. `FatePoolsWatcher` now oversees all the `FateExecutor`s/pools. The thread periodically checks for changes to the config, resizing any pools as necessary, safely stopping (if a thread is working on a transaction, it is not terminated until it finishes that work) any `FateExecutor`s that are invalidated by config changes (i.e., its assigned set of FATE operations is no longer present in the config), and starting any new/replacement `FateExecutor`s needed. The `FatePoolsWatcher` also keeps track of the number of idle threads to warn the user if a pool is experiencing a high load and should have its pool size increased (this functionality is unchanged from before, only difference is now a slightly different message is logged when warning the user).
	- The set of pools is determined by the new `MANAGER_USER_FATE_CONFIG`/`MANAGER_META_FATE_CONFIG` properties (replaces `MANAGER_FATE_THREADPOOL_SIZE`).
	- Another difference from this change is now USER and META FATEs can be configured differently whereas previously they shared the same property (`MANAGER_FATE_THREADPOOL_SIZE`)
	- These properties are json where the keys are a set of FATE operations and values are a pool size.
	- The properties are ensured to be valid json, pool size > 0, contain all FATE operations, contain no invalid FATE operations, and have no duplicate FATE operations
	- The FATE operations can be split across any number of pools of any size.
- Tests which used the `MANAGER_FATE_THREADPOOL_SIZE` have now been changed to `MANAGER_USER_FATE_CONFIG`/`MANAGER_META_FATE_CONFIG` properties. To keep testing functionality the same, the properties set are json of 1 key/value where the key is all FATE ops and the value is the value originally used in the test.
- `FatePoolResizeIT`/`MetaFatePoolResizeIT`/`UserFatePoolResizeIT` have been renamed to `FatePoolsWatcherIT`/`MetaFatePoolsWatcherIT`/`UserFatePoolsWatcherIT` since they now test other functionality of the `FatePoolsWatcher` not just pool resizing. The tests have also been completely changed since resizing has completely changed.
@kevinrr888 kevinrr888 added this to the 4.0.0 milestone Feb 4, 2025
@kevinrr888 kevinrr888 self-assigned this Feb 4, 2025
Comment on lines 432 to 453
MANAGER_USER_FATE_CONFIG("manager.user.fate.config", "{"
+ "\"TABLE_CREATE,TABLE_DELETE,TABLE_RENAME,TABLE_ONLINE,TABLE_OFFLINE,NAMESPACE_CREATE,NAMESPACE_DELETE,NAMESPACE_RENAME,TABLE_TABLET_AVAILABILITY,SHUTDOWN_TSERVER\": 1,"
+ "\"TABLE_BULK_IMPORT2\": 2,"
+ "\"TABLE_COMPACT,TABLE_CANCEL_COMPACT,COMMIT_COMPACTION\": 4,"
+ "\"TABLE_MERGE,TABLE_DELETE_RANGE,TABLE_SPLIT,SYSTEM_SPLIT,TABLE_CLONE,TABLE_IMPORT,TABLE_EXPORT\": 2"
+ "}", PropertyType.USER_FATE_CONFIG,
"The number of threads used to run user-initiated fault-tolerant "
+ "executions (FATE). These are primarily table operations like merge. Each key/value "
+ "of the provided JSON corresponds to one thread pool. Each key is a list of one or "
+ "more FATE operations and each value is the number of threads that will be assigned "
+ "to the pool.",
"4.0.0"),
MANAGER_META_FATE_CONFIG("manager.meta.fate.config",
"{\"TABLE_COMPACT,TABLE_CANCEL_COMPACT,COMMIT_COMPACTION\": 4,"
+ "\"TABLE_MERGE,TABLE_DELETE_RANGE,TABLE_SPLIT,SYSTEM_SPLIT\": 2}",
PropertyType.META_FATE_CONFIG,
"The number of threads used to run system-initiated fault-tolerant "
+ "executions (FATE). These are primarily table operations like merge. Each key/value "
+ "of the provided JSON corresponds to one thread pool. Each key is a list of one or "
+ "more FATE operations and each value is the number of threads that will be assigned "
+ "to the pool.",
"4.0.0"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments here:

  • This is my best attempt as to what fate ops are valid META fate ops. I may have excluded some that should have been included or vice-versa.
  • What is the proper way to deprecate the old MANAGER_FATE_THREADPOOL_SIZE property
  • These default pool sizes and default pools probably aren't optimal. Input into how the FATE ops should be split and how many threads for each pool would be appreciated. This was just my best attempt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the proper way to deprecate the old MANAGER_FATE_THREADPOOL_SIZE property

I think you can just mark it deprecated in the 3.1 branch. I don't think you want to use the ReplacedBy annotation on it, but instead make a comment in the 3.1 code that it's going to be removed in 4.0 and replaced by these new properties.. In the 4.0 branch, remove the property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure its good deprecate this prop in 3.1 because there is no replacement in 3.1. So if set it may generate warnings, but there is no action that could be taken to avoid the warning in 3.1.

Could just deprecate it in 4.0 and ignore it, maybe log a warning if its set that its no longer used. Or drop it in 4.0 w/o any deprecation. However its done, it would be really nice to log a message in 4.0 if the old prop is set that informs the user that this new property needs to be used instead. Providing the breadcrumbs that helps someone figure out what they should do instead is really helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked as deprecated in 4.0 and log a warning if the property is set mentioning the replacement props.

Copy link
Contributor

@keith-turner keith-turner Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a change for this PR. Wondering if we should change how we deal with property changes. In general I don't think the current ReplacedBy mechanism is working that well. The replacement cases are not well tested and introduce a lot of complexity in configuration. Maybe a new model would be better for property changes, something like the following.

  • Remove replaced by annotation
  • Add a new internal static set of dropped properties that has information on when dropped and guidance on actions to take.
  • The generated documentation can include the information about dropped properties.
  • Property validation will fail if a property is not in the current set of active properties. However if its seen in the set of dropped properties then some helpful guidance is logged prior to failure.

This would be cleaner and would remove some potential bugs w/ untested replacement property usage. However this model may further complicate upgrade re #5277.

final var poolName = fateExecutor.getPoolName();
final var runningTxRunners = fateExecutor.getRunningTxRunners();
final int configured = poolConfigs.get(fateExecutor.getFateOps());
ThreadPools.resizePool(pool, () -> configured, poolName);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet explored changing this into a cached thread pool (discussed in #5263 (comment) thread). The changes were already pretty large and didn't want to introduce more. Writing this as a reminder to do as a potential follow on.

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Comment on lines 432 to 453
MANAGER_USER_FATE_CONFIG("manager.user.fate.config", "{"
+ "\"TABLE_CREATE,TABLE_DELETE,TABLE_RENAME,TABLE_ONLINE,TABLE_OFFLINE,NAMESPACE_CREATE,NAMESPACE_DELETE,NAMESPACE_RENAME,TABLE_TABLET_AVAILABILITY,SHUTDOWN_TSERVER\": 1,"
+ "\"TABLE_BULK_IMPORT2\": 2,"
+ "\"TABLE_COMPACT,TABLE_CANCEL_COMPACT,COMMIT_COMPACTION\": 4,"
+ "\"TABLE_MERGE,TABLE_DELETE_RANGE,TABLE_SPLIT,SYSTEM_SPLIT,TABLE_CLONE,TABLE_IMPORT,TABLE_EXPORT\": 2"
+ "}", PropertyType.USER_FATE_CONFIG,
"The number of threads used to run user-initiated fault-tolerant "
+ "executions (FATE). These are primarily table operations like merge. Each key/value "
+ "of the provided JSON corresponds to one thread pool. Each key is a list of one or "
+ "more FATE operations and each value is the number of threads that will be assigned "
+ "to the pool.",
"4.0.0"),
MANAGER_META_FATE_CONFIG("manager.meta.fate.config",
"{\"TABLE_COMPACT,TABLE_CANCEL_COMPACT,COMMIT_COMPACTION\": 4,"
+ "\"TABLE_MERGE,TABLE_DELETE_RANGE,TABLE_SPLIT,SYSTEM_SPLIT\": 2}",
PropertyType.META_FATE_CONFIG,
"The number of threads used to run system-initiated fault-tolerant "
+ "executions (FATE). These are primarily table operations like merge. Each key/value "
+ "of the provided JSON corresponds to one thread pool. Each key is a list of one or "
+ "more FATE operations and each value is the number of threads that will be assigned "
+ "to the pool.",
"4.0.0"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the proper way to deprecate the old MANAGER_FATE_THREADPOOL_SIZE property

I think you can just mark it deprecated in the 3.1 branch. I don't think you want to use the ReplacedBy annotation on it, but instead make a comment in the 3.1 code that it's going to be removed in 4.0 and replaced by these new properties.. In the 4.0 branch, remove the property.

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only partially looked at this, will look more later.

Comment on lines 432 to 453
MANAGER_USER_FATE_CONFIG("manager.user.fate.config", "{"
+ "\"TABLE_CREATE,TABLE_DELETE,TABLE_RENAME,TABLE_ONLINE,TABLE_OFFLINE,NAMESPACE_CREATE,NAMESPACE_DELETE,NAMESPACE_RENAME,TABLE_TABLET_AVAILABILITY,SHUTDOWN_TSERVER\": 1,"
+ "\"TABLE_BULK_IMPORT2\": 2,"
+ "\"TABLE_COMPACT,TABLE_CANCEL_COMPACT,COMMIT_COMPACTION\": 4,"
+ "\"TABLE_MERGE,TABLE_DELETE_RANGE,TABLE_SPLIT,SYSTEM_SPLIT,TABLE_CLONE,TABLE_IMPORT,TABLE_EXPORT\": 2"
+ "}", PropertyType.USER_FATE_CONFIG,
"The number of threads used to run user-initiated fault-tolerant "
+ "executions (FATE). These are primarily table operations like merge. Each key/value "
+ "of the provided JSON corresponds to one thread pool. Each key is a list of one or "
+ "more FATE operations and each value is the number of threads that will be assigned "
+ "to the pool.",
"4.0.0"),
MANAGER_META_FATE_CONFIG("manager.meta.fate.config",
"{\"TABLE_COMPACT,TABLE_CANCEL_COMPACT,COMMIT_COMPACTION\": 4,"
+ "\"TABLE_MERGE,TABLE_DELETE_RANGE,TABLE_SPLIT,SYSTEM_SPLIT\": 2}",
PropertyType.META_FATE_CONFIG,
"The number of threads used to run system-initiated fault-tolerant "
+ "executions (FATE). These are primarily table operations like merge. Each key/value "
+ "of the provided JSON corresponds to one thread pool. Each key is a list of one or "
+ "more FATE operations and each value is the number of threads that will be assigned "
+ "to the pool.",
"4.0.0"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure its good deprecate this prop in 3.1 because there is no replacement in 3.1. So if set it may generate warnings, but there is no action that could be taken to avoid the warning in 3.1.

Could just deprecate it in 4.0 and ignore it, maybe log a warning if its set that its no longer used. Or drop it in 4.0 w/o any deprecation. However its done, it would be really nice to log a message in 4.0 if the old prop is set that informs the user that this new property needs to be used instead. Providing the breadcrumbs that helps someone figure out what they should do instead is really helpful.

@kevinrr888
Copy link
Member Author

Changes:

  • Moved the resizing task from Fate.FatePoolsWatcher into FateExecutor. This cleaned up the code a bit getting rid of some getters in FateExecutor
  • When looking at the new MANAGER_FATE_<META/USER>_CONFIG props again realized the description I wrote wasn't correct. Updated
  • When tinkering with MANAGER_FATE_META_CONFIG and running the sunny day build realized that the config could be blank and still pass the sunny day build (meaning no FATE operations would be able to run for the Accumulo system tables). I assumed the sunny day build would catch something like this, but it doesn't appear that is the case. Perhaps we should create a sunny day test which tests fate operations on Accumulo system tables or maybe this would be caught in the full build and that is sufficient (I have not run the full build to see if that's true)... Regardless, this made me less confident in the correctness of my original attempt to write the META config (and looking at it again, I don't think it's correct). I changed the config to allow and expect all FateOperations, instead of trying to determine what subset would be allowed and expected for Accumulo system tables.
  • Minor changes to PropertyTypeTest testTypeFATE_<META/USER>_CONFIG to make the test easier to understand
  • Added new test to FatePoolsWatcherIT as described

@keith-turner - Could you take a look at these changes? Most of the changes to Fate and FateExecutor were copy-paste so those can be ignored. Mostly curious what you think about the changes to Property and the new test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dedicating threads to specific types of Fate transactions
3 participants