Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Specify a custom value for ErrorMaxFastDelay and ErrorMaxSlowDelay #4583

Open
antonioquintavalle1A opened this issue Feb 14, 2025 · 2 comments
Labels
new-feature waiting-on-user-response Waiting on more information from the original user before progressing.

Comments

@antonioquintavalle1A
Copy link

antonioquintavalle1A commented Feb 14, 2025

Let's consider the following scenario:
1 - Create a CR in Azure Service Operator to create a resource in Azure
2 - A lock on the resource previously created is added
3 - CR previously created is being deleted
4 - Azure Service Operator attempts to reconcile the remote resource performing a deletion action.
5 - Deletion of CR and remote resource will be not effective until the lock is removed.

As of now, we ASO uses an object, calculator to generate an interval that should be waiting before performing once again the reconciliation (RequeueAfter).

Upon failure, the interval is being increased using a backoff approach, following this sequence:

result={"Requeue":false,"RequeueAfter":1000000000} - 0.016667 min result={"Requeue":false,"RequeueAfter":2000000000} - 0.033333 min result={"Requeue":false,"RequeueAfter":4000000000} - 0.066667 min result={"Requeue":false,"RequeueAfter":8000000000} - 0.133333 min result={"Requeue":false,"RequeueAfter":16000000000} - 0.266667 min result={"Requeue":false,"RequeueAfter":32000000000} - 0.533333 min result={"Requeue":false,"RequeueAfter":64000000000} - 1.066667 min result={"Requeue":false,"RequeueAfter":128000000000} - 2.133333 min result={"Requeue":false,"RequeueAfter":180000000000} - 3 min HARD LIMIT

Would be interesting to have an additional parameter to increase the max slow and fast counters defined in here?

@theunrepentantgeek
Copy link
Member

The current value is a compromise between responsiveness and ARM throttling limits - we don't want to end up blocking removal of the lock by triggering ARM throttling.

Can you please expand on the actual issue you're seeing?

  • Do you want ASO to proceed with the delete more quickly once the lock is removed?
  • Do you want ASO to back off for a longer period while waiting for lock removal?
  • Are you seeing too many logs generated by ASO?
  • Something else?

We have an outstanding request (#3756) for a deep integration with Azure Locks, where ASO is aware of their semantics. Maybe this would cover your needs? Or is your request for something more general than just working well with locks?

@theunrepentantgeek theunrepentantgeek added waiting-on-user-response Waiting on more information from the original user before progressing. and removed needs-triage 🔍 labels Feb 24, 2025
@antonioquintavalle1A
Copy link
Author

Hello @theunrepentantgeek, I was thinking of having longer back-off to avoid useless calls to ARM, which is could be a requirements when locks are not set on resources.
On the other hand, if we enhance ASO to deal with locks we'll automatically avoid useless calls for resources that cannot be modified and/or deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature waiting-on-user-response Waiting on more information from the original user before progressing.
Projects
Development

No branches or pull requests

2 participants