-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: extend API request queue's memory of prior requests #880
Conversation
Today the server's API request queue decides whether to accept or reject incoming requests by storing a fixed disposition for each kind of request and adjusting those dispositions as new requests are queued or as a VM's state changes. This is not always enough information to maintain the proper dispositions. Consider the following example: 1. A VM begins to start. Reboot requests are denied in this state (since a VM that hasn't booted yet can't be rebooted). 2. A caller queues a request to stop the VM after it starts. All subsequent reboot requests should now be denied because the VM will halt before they can be serviced. 3. The VM successfully starts. The correct reboot disposition is still "deny," because there's a pending stop request, but there's no way to discern this from the prior reboot disposition (it was also "deny" before the stop request was queued!). To address this sort of problem, and to pave the way for better handling of Crucible VCR replacements during instance start, refactor the queue as follows: - Remember what kinds of outstanding requests have not been processed and dispose of requests based on this state and the queue's notion of the instance's overall state. - Break external API requests into "state change" and "component change" requests. This will allow them to be dequeued independently when the state driver wants to handle component changes without changing the order of state change requests. - Tweak the language the state driver uses to communicate with the queue: instead of saying "the VM is now in state X," the driver says "I handled (or failed to handle) a request of type Y." Tests: cargo test, PHD.
/// Pops the next request off of the queue. If the queue contains both state | ||
/// change and component change requests, the next state change request is | ||
/// popped first (even if it arrived later in time than the next component | ||
/// change request). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also drafted a version of this code that pops the oldest request regardless of its kind, but maintains a separate index of component change requests so that the state driver can ask to dequeue the oldest such request while leaving the state change requests alone. (The point of the exercise is to give the driver a way to pop VCR replacement requests while starting the VM without having to buffer any state change requests; see #873 for more discussion.)
I ended up not going with this because (a) it's simpler not to have the separate change request index, and (b) it also seemed to me to make some sense to privilege external API requests (like requests to stop) over internal housekeeping operations like VCR replacement. But I'd like to hear if others feel strongly here (it's not that hard to bring that code back).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with your rationale here. I don't know if the property of having one total ordering across state-change and component-change requests is actually important, given that the whole point of this change is to allow component change requests to be serviced whilst a state change is in progress, and the disposition to subsequent component change requests based on in-flight state changes is now tracked outside of the actual queue, rather than based on the order of requests in the queue. So I think the simpler solution of just having two queues is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks great to me! I left some fairly minor notes, but I'm down with the overall design here.
This was the request queue's previous behavior. It ensures that a request to stop drives the current Propolis to a halted state, but does not guarantee that the "logical" VM isn't running somewhere else (i.e. in a migration target).
Today the server's API request queue decides whether to accept or reject incoming requests by storing a fixed disposition for each kind of request and adjusting those dispositions as new requests are queued or as a VM's state changes. This is not always enough information to maintain the proper dispositions. Consider the following example:
To address this sort of problem, and to pave the way for better handling of Crucible VCR replacements during instance start, refactor the queue as follows:
Tests: cargo test, PHD.
Fixes #879.