-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollback functionality of Unikernels #111
base: main
Are you sure you want to change the base?
Conversation
Great to see this PR. I struggle with the naming "current_build"and "latest_build". Can we have some more descriptive terms? Why should we decode and encode the config to/from json, if albatross already provides asn1 decoders and encoders for that data? You now store a list of "unikernel_updates" in the user configuration. Is this list ever getting smaller? I'm not sure what we did for cookies, but I'm a bit scared of ever-growing lists that are stored to persistent storage. Semantics wise: I'm not sure what is expected to happen when you update twice? I guess I don't understand when a rollback is triggered, and whether in the list of updates stored persistently a unikernel identified by its name can be present twice and if yes why. |
Thank you
Yes we can change the names But just to explain a bit, by
when storing the metadata for a unikernel before updating it, i figured its important to store the config so we can always rollback using the same configurations as before. I'll look more into using the asn1 decoders/encoders and update the pr :)
For this list, only one instance of a unikernel can be stored. so only one instance of a unikernel can exist in the list. If we update twice, the list will contain data related to the latest update only. Also a rollback is triggered automatically if an update fails because of an unavoidable error. By this I mean, for instance if the user provides a config which will mismatch the manifest, no rollback will be performed, since the |
`OK | ||
| Error (`Msg _err, _status) -> | ||
unikernel_rollback ~unikernel_name albatross store | ||
http_client reqd user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this here means that if anything in process_unikernel_change
fails, we do a rollback. Is this desired?
From reading the code, I have the feeling that process_unikernel_change
does what it tells: quite a lot, including "updating the store" (which may fail), a failed HTTP request to builds.robur.coop, manifest devices match, force_create, Albatross_json.res.
Now, only a failed force_create should potentially lead to a rollback, no? Because if there's an issue in updating the store, a failed HTTP request, when the manifest mismatches -- we should present the error to the user -- and since nothing has been done in respect to albatross, we should leave the existing unikernel running. Or am I getting this wrong?
Maybe it makes sense to separate the preconditions for deploying a unikernel from the actual deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you have it correctly.
I will separate the function into different purpose-focused functions.
The final rollback should only be triggered if force_create
fails.
What about
We can as well do that in a later PR. My main question is whether there's any reason to use json for everyhing.
Ok. But the list will never be cleared when the update was successful.
I don't find what you say here in the code. I commented on the code where I think that a rollback is triggered even though it shouldn't. |
These names are fine, much better than what I used :)
No we don't have to use json. I mainly used it since most of the other processes already rely on json so I wanted to reuse as much as I can
I will update the PR to clear the list :) |
Hmm, and now I struggle with: rollback being triggered when unikernel_force_create fails. But when does it fail, and why should a rollback then be beneficial? Isn't there a lot checking done before the unikernel_force_create, so that it is unlikely to fail? i.e. manifest, we could as well look into resource usage / policies. Wasn't the purpose of a rollback that either there's a service not responding or an operator figures shortly after the update that something is fishy? I currently think this "automatic rollback" based on the failure of "unikernel_force_create" is not super-useful. |
Another thing is that we need to ensure to never rollback while we do a rollback - to avoid (infinite) rollback loops. |
To be fair, I haven't had a single update fail that required a rollback. So in this case it's correct that rolling back during Indeed all the checks are done before the upgrade is done. |
Ok, cool. So what about in a first version to have a button next to "update" that is labelled "rollback" (in case there was an update within the last 15 minutes or so)? Let's do the automation than subsequently. |
This PR implements Rollback functionality for unikernel updates.
The methodology is that, if we try to update a unikernel, we store some metadata about the running unikernel to disk and then attempt the update. If this fails, we call the rollback function with the previous running build.