-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IWF-208: Optimize executingStateIdMode for state transition #460
IWF-208: Optimize executingStateIdMode for state transition #460
Conversation
shouldSkipUpsertingSAs := true | ||
for _, s := range stateTransition.next { | ||
if !s.StateOptions.GetSkipWaitUntil() { | ||
shouldSkipUpsertingSAs = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps I'm missing some context or understanding of what we want to achieve here - what happens when the first StateMovement in the next array skips waitUntil, but subsequent ones do not? If I'm reading this correctly, with this change if any next StateMovement does not have a waitUntil, the SearchAttributeExecutingStateIds
will not be updated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... I have that wrong don't I... if any of the next states do have a waitUntil (i.e. they don't skip wait until) then do continue on to UpsertSearchAttributes??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if any of the next states don't skip waitUntil, then the SAs will be upserted. Also, if I'm not mistaken, nextStates are the states that are executed in the state transition, example:
current = A
next = [B, C, D]
means that B, C, and D are executed at the same time after A is completed.
} | ||
} | ||
|
||
return e.updateStateIdSearchAttribute() | ||
} | ||
|
||
func determineIfShouldSkipUpsert(currentState iwfidl.StateMovement, nextStates []iwfidl.StateMovement) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the signature would have pass ExecutingStateIdMode
.
I think the currentState can be omitted.
func determineIfShouldSkipUpsert(mode ExecutingStateIdMode, nextStates ) bool{
// based on the mode
// For ENABLED_FOR_ALL, return true as long as any of next states is going to a next state(not closing state)
// For ENABLED_FOR_STATES_WITH_WAIT_UNTIL, in addition to above, also check the state has waitUntil.
}
StateMovement may not go to any state, because we are doing a hack in iwf (I really regret that) 😨
We use some special StateIDs to prepresent gracefulComplete, deadEnd etc. See https://github.com/indeedeng/iwf/blob/main/service/const.go#L68
And how it's being used in https://github.com/indeedeng/iwf/blob/main/service/interpreter/workflowImpl.go#L373
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the signature would have pass ExecutingStateIdMode.
I'm not passing it as I only invoke determineIfShouldSkipUpsert when needed based on ExecutingStateIdMode
currentState is needed if we want to determine when a state is looping back, I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentState is needed if we want to determine when a state is looping back, I believe
for that case, I belive what you have in updateStateIdSearchAttribute
baesd on comparing with GetSearchAttributes will cover it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, if MarkStateExecutionCompleted
not delete the SA, next time when MarkStateExecutingIfNotYet
will callupdateStateIdSearchAttribute
which will then skip it because the SA is still there
0358c1b
to
f27d960
Compare
@@ -84,15 +97,17 @@ func (e *StateExecutionCounter) MarkStateIdExecutingIfNotYet(stateReqs []StateRe | |||
case iwfidl.DISABLED: | |||
// do nothing | |||
case iwfidl.ENABLED_FOR_ALL: | |||
if e.increaseStateIdCurrentlyExecutingCounts(s) { | |||
e.increaseStateIdCurrentlyExecutingCounts(s) | |||
if !slices.Contains(currentSAs.StringArrayValue, s.StateId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reflecting into this for a while, I found issues with this partial comparison using Contains
. I think just checking containing is probably not correct, it should be a full comparison.
Consider this case (assuming using ENABLED_FOR_ALL so that we don't care about the waitUntil):
- S1->S2+S1
- S2->S1
So there are 3 S1 executions. Assuming the first S1-1 completed, then goes to S2-1 and S1-2. But S1-2 didn't complete yet, then S2-1 completed and decided to go to S1-3.
Now S1-3 will call thisMarkStateIdExecutingIfNotYet
: - Because of the optimization in
MarkStateExecutionCompleted
,S2
is not deleted and still in the search attribute. So the SA contains both: [S1, S2]. (Assuming you are implementing the MarkStateExecutionCompleted optimization as we discussed today -- skip deleting the stateID from SA if there is a next state going to refresh the whole SA) - Now for S1-3 calling
MarkStateIdExecutingIfNotYet
, it will see the SA contains S1 already, but it also contains S2. Therefore, it should still refresh the SA based onstateIdCurrentlyExecutingCounts
(Note: it may be a lot better to renameupdateStateIdSearchAttribute
->refreshIwfExecutingStateIdSearchAttribute
to be more clear.
I remember you were doing the full comparison when you put it in the updateStateIdSearchAttribute
, not sure why you changed your mind 🤣
I think you may still run into NDE if putting it into that method, but instead, maybe you can create another method checkIfNeedToRefreshIwfExecutingStateIdSearchAttribute
before calling refreshIwfExecutingStateIdSearchAttribute
.
I will find sometime later to dig into the NDE again. It's very weird to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I see the point and it makes sense.
I pushed another commit that should address this issue. Let's discuss it later today
76f6f15
to
0c83e1a
Compare
4d0f7c5
to
3b7e065
Compare
if ok { | ||
currentSAsValues = currentSAs.StringArrayValue | ||
} else { | ||
e.provider.GetLogger(e.ctx).Error("search attribute IwfExecutingStateIds is not found", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's epxected to have empty/nil value to start with, so we probably just ignore this case, or just assign a empty slice so that DeepEqual
can return true
when comparing with another empty slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -135,25 +164,56 @@ func (e *StateExecutionCounter) MarkStateExecutionCompleted(state iwfidl.StateMo | |||
case iwfidl.DISABLED: | |||
return nil | |||
case iwfidl.ENABLED_FOR_ALL: | |||
e.decreaseStateIdCurrentlyExecutingCounts(state) | |||
e.decreaseStateIdCurrentlyExecutingCounts(currentState) | |||
shouldSkipUpsert := determineIfShouldSkipRefreshOnCompleted(nextStates, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: shouldSkipUpsert
-> shouldSkipRefresh
. I feel like one of the confusion is that this is not really upserting new stateID to the SA, but deleting statID by refreshing the whole map (keys of stateIdCurrentlyExecutingCounts)
} | ||
} else { | ||
for _, s := range nonClosingNextStates { | ||
if !s.StateOptions.GetSkipWaitUntil() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be !compatibility.GetSkipWaitUntilApi(&options)
like you did in #453
Btw, this double negative is hard to follow. maybe you can make it compatibility.HasWaitUntilApi(&options)
which will make it more clear in this statement -- when any of the next states has waitUntil, return true to skip refresh (because the next state will refresh immediately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need !compatibility.GetSkipWaitUntilApi(&options)
? This check is only for the newest version so I thought using s.StateOptions.GetSkipWaitUntil()
was safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it, but would like to understand better why not use s.StateOptions.GetSkipWaitUntil()
in this case. Unless it's only a safety measure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is only for the newest version so I thought using s.StateOptions.GetSkipWaitUntil() was safe
Oh, you are right. It's not needed because of the versioning
Looks good now. Thanks for the hard work! |
daa3bd8
to
f831664
Compare
executingStateIds = append(executingStateIds, sid) | ||
} | ||
|
||
if reflect.DeepEqual(currentSAsValues, executingStateIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we should use something like elementMatch to compare ? Because the order could be different https://stackoverflow.com/questions/36000487/check-for-equality-on-slices-without-order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort the slice (line 170)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I agree doing elementMatch would be easier. I learned about it after I added sorting... 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Sorry I didn’t read your latest code yet. I just thinking about it and wondering why you had NDE yesterday. This could be the reasons. Especially doing it in the update/refresh method for both MarkExecuting and MarkCompleted, it increases a lot the likelihood of NDE because of the randomness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you push the latest code? I don’t see the sorting ( do you mean sorting the keys of the map, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see the confusion. I was focusing on the tests... my bad. Yes, I should add sorting or elementMatch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now with this sorting, WDYT try move the whole optimization back to refreshIwfExecutingStateIdSearchAttribute
again? I think if this solves the NDE, it will be also help the issue of #460 (comment)
assertions.Equal([]string{"S3", "S6", "S7"}, historyEventSAs(upsertSAEvents[5])) | ||
assertions.Equal([]string{"S3", "S6"}, historyEventSAs(upsertSAEvents[6])) | ||
// TODO: This is unexpected; should not upsert the same SAs -- happens after "_SYS_GRACEFUL_COMPLETING_WORKFLOW" | ||
assertions.Equal([]string{"S3"}, historyEventSAs(upsertSAEvents[7])) | ||
assertions.Equal([]string{"S3"}, historyEventSAs(upsertSAEvents[8])) | ||
assertions.Equal([]string{"null"}, historyEventSAs(upsertSAEvents[9])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we should not upsert "S3" twice. Seems that we always upsert on _SYS_GRACEFUL_COMPLETING_WORKFLOW
- SAs -> "S3", "S6", "S7"
- S7-1 -> _SYS_GRACEFUL_COMPLETING_WORKFLOW
- SAs -> "S3", "S6"
- S6-1 -> _SYS_GRACEFUL_COMPLETING_WORKFLOW
- SAs -> "S3"
- S3-1 -> _SYS_GRACEFUL_COMPLETING_WORKFLOW
- SAs -> "S3"
- S3-2 -> _SYS_GRACEFUL_COMPLETING_WORKFLOW
- SAs -> null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that because of we miss an existing optimization— on MarkCompletion, we should only refresh when any stateID reaches to zero (needs to delete)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this should be the last thing to tackle. I'll take care of it in the morning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest thought: If the NDE is actually because of sorting and we can fix it and then move the logic back to the refresh method, then I guess it should also solve this issue of Upsert duplicate S3 -- we don’t need to check if a StateId count reaches zero like previous version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this issue now and tests are passing as expected. We can try moving back to the refresh method
Description
Checklist
Related Issue
Closes #454