Rollback functionality of Unikernels #111

PizieDust · 2025-02-25T12:41:48Z

This PR implements Rollback functionality for unikernel updates.
The methodology is that, if we try to update a unikernel, we store some metadata about the running unikernel to disk and then attempt the update. If this fails, we call the rollback function with the previous running build.

hannesm · 2025-03-02T23:19:57Z

Great to see this PR.

I struggle with the naming "current_build"and "latest_build". Can we have some more descriptive terms?

Why should we decode and encode the config to/from json, if albatross already provides asn1 decoders and encoders for that data?

You now store a list of "unikernel_updates" in the user configuration. Is this list ever getting smaller? I'm not sure what we did for cookies, but I'm a bit scared of ever-growing lists that are stored to persistent storage.

Semantics wise: I'm not sure what is expected to happen when you update twice? I guess I don't understand when a rollback is triggered, and whether in the list of updates stored persistently a unikernel identified by its name can be present twice and if yes why.

PizieDust · 2025-03-03T09:15:54Z

Thank you

Great to see this PR.

I struggle with the naming "current_build"and "latest_build". Can we have some more descriptive terms?

Yes we can change the names But just to explain a bit, by current_build I mean the version of the unikernel currently running and by latest_build I mean the version that we want to upgrade to.

Why should we decode and encode the config to/from json, if albatross already provides asn1 decoders and encoders for that data?

when storing the metadata for a unikernel before updating it, i figured its important to store the config so we can always rollback using the same configurations as before. I'll look more into using the asn1 decoders/encoders and update the pr :)

You now store a list of "unikernel_updates" in the user configuration. Is this list ever getting smaller? I'm not sure what we did for cookies, but I'm a bit scared of ever-growing lists that are stored to persistent storage.

Semantics wise: I'm not sure what is expected to happen when you update twice? I guess I don't understand when a rollback is triggered, and whether in the list of updates stored persistently a unikernel identified by its name can be present twice and if yes why.

For this list, only one instance of a unikernel can be stored. so only one instance of a unikernel can exist in the list. If we update twice, the list will contain data related to the latest update only.

Also a rollback is triggered automatically if an update fails because of an unavoidable error. By this I mean, for instance if the user provides a config which will mismatch the manifest, no rollback will be performed, since the force-create-unikernel was not triggered. The rollback will mostly only be triggered in cases were force-create-unikernel fails.

storage.ml

hannesm · 2025-03-03T09:29:24Z

unikernel.ml

+                          `OK
+                    | Error (`Msg _err, _status) ->
+                        unikernel_rollback ~unikernel_name albatross store
+                          http_client reqd user)


Having this here means that if anything in process_unikernel_change fails, we do a rollback. Is this desired?

From reading the code, I have the feeling that process_unikernel_change does what it tells: quite a lot, including "updating the store" (which may fail), a failed HTTP request to builds.robur.coop, manifest devices match, force_create, Albatross_json.res.

Now, only a failed force_create should potentially lead to a rollback, no? Because if there's an issue in updating the store, a failed HTTP request, when the manifest mismatches -- we should present the error to the user -- and since nothing has been done in respect to albatross, we should leave the existing unikernel running. Or am I getting this wrong?

Maybe it makes sense to separate the preconditions for deploying a unikernel from the actual deployment?

Yes you have it correctly.
I will separate the function into different purpose-focused functions.

The final rollback should only be triggered if force_create fails.

hannesm · 2025-03-03T09:37:13Z

I struggle with the naming "current_build"and "latest_build". Can we have some more descriptive terms?

Yes we can change the names But just to explain a bit, by current_build I mean the version of the unikernel currently running and by latest_build I mean the version that we want to upgrade to.

What about currently_running_unikernel and to_be_updated_unikernel (I'm not too happy with these names).

Why should we decode and encode the config to/from json, if albatross already provides asn1 decoders and encoders for that data?

when storing the metadata for a unikernel before updating it, i figured its important to store the config so we can always rollback using the same configurations as before. I'll look more into using the asn1 decoders/encoders and update the pr :)

We can as well do that in a later PR. My main question is whether there's any reason to use json for everyhing.

You now store a list of "unikernel_updates" in the user configuration. Is this list ever getting smaller? I'm not sure what we did for cookies, but I'm a bit scared of ever-growing lists that are stored to persistent storage.
Semantics wise: I'm not sure what is expected to happen when you update twice? I guess I don't understand when a rollback is triggered, and whether in the list of updates stored persistently a unikernel identified by its name can be present twice and if yes why.

For this list, only one instance of a unikernel can be stored. so only one instance of a unikernel can exist in the list. If we update twice, the list will contain data related to the latest update only.

Ok. But the list will never be cleared when the update was successful.

Also a rollback is triggered automatically if an update fails because of an unavoidable error. By this I mean, for instance if the user provides a config which will mismatch the manifest, no rollback will be performed, since the force-create-unikernel was not triggered. The rollback will mostly only be triggered in cases were force-create-unikernel fails.

I don't find what you say here in the code. I commented on the code where I think that a rollback is triggered even though it shouldn't.

PizieDust · 2025-03-03T09:41:41Z

What about currently_running_unikernel and to_be_updated_unikernel (I'm not too happy with these names).

These names are fine, much better than what I used :)

We can as well do that in a later PR. My main question is whether there's any reason to use json for everyhing.

No we don't have to use json. I mainly used it since most of the other processes already rely on json so I wanted to reuse as much as I can

Ok. But the list will never be cleared when the update was successful.

I will update the PR to clear the list :)

hannesm · 2025-03-03T09:53:07Z

Hmm, and now I struggle with: rollback being triggered when unikernel_force_create fails. But when does it fail, and why should a rollback then be beneficial?

Isn't there a lot checking done before the unikernel_force_create, so that it is unlikely to fail? i.e. manifest, we could as well look into resource usage / policies.

Wasn't the purpose of a rollback that either there's a service not responding or an operator figures shortly after the update that something is fishy? I currently think this "automatic rollback" based on the failure of "unikernel_force_create" is not super-useful.

hannesm · 2025-03-03T09:54:10Z

Another thing is that we need to ensure to never rollback while we do a rollback - to avoid (infinite) rollback loops.

PizieDust · 2025-03-03T09:56:37Z

Hmm, and now I struggle with: rollback being triggered when unikernel_force_create fails. But when does it fail, and why should a rollback then be beneficial?

Isn't there a lot checking done before the unikernel_force_create, so that it is unlikely to fail? i.e. manifest, we could as well look into resource usage / policies.

Wasn't the purpose of a rollback that either there's a service not responding or an operator figures shortly after the update that something is fishy? I currently think this "automatic rollback" based on the failure of "unikernel_force_create" is not super-useful.

To be fair, I haven't had a single update fail that required a rollback. So in this case it's correct that rolling back during unikernel-force-create is not super useful.

Indeed all the checks are done before the upgrade is done.

hannesm · 2025-03-03T10:07:46Z

Ok, cool. So what about in a first version to have a button next to "update" that is labelled "rollback" (in case there was an update within the last 15 minutes or so)?

Let's do the automation than subsequently.

PizieDust requested review from hannesm and reynir February 25, 2025 12:41

PizieDust added 4 commits February 28, 2025 02:15

add unikernel_updates to store unikernel metadata before updates

0f1b427

add api version and add function to store updates to user list

abeca80

pass current build uuid in update post request

9d66da1

add rollback functionality

7d9c655

PizieDust force-pushed the rollback branch from 1d9e571 to 7d9c655 Compare February 28, 2025 01:24

PizieDust added 2 commits March 2, 2025 21:59

add functions for config_to_json

08cde7d

make config string before parsing

362f2a1

hannesm reviewed Mar 3, 2025

View reviewed changes

storage.ml Outdated Show resolved Hide resolved

use explicit comparison

75580e3

hannesm reviewed Mar 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollback functionality of Unikernels #111

Rollback functionality of Unikernels #111

PizieDust commented Feb 25, 2025

hannesm commented Mar 2, 2025

PizieDust commented Mar 3, 2025 •

edited

Loading

hannesm Mar 3, 2025

PizieDust Mar 3, 2025

hannesm commented Mar 3, 2025

PizieDust commented Mar 3, 2025

hannesm commented Mar 3, 2025

hannesm commented Mar 3, 2025

PizieDust commented Mar 3, 2025

hannesm commented Mar 3, 2025

Rollback functionality of Unikernels #111

Are you sure you want to change the base?

Rollback functionality of Unikernels #111

Conversation

PizieDust commented Feb 25, 2025

hannesm commented Mar 2, 2025

PizieDust commented Mar 3, 2025 • edited Loading

hannesm Mar 3, 2025

Choose a reason for hiding this comment

PizieDust Mar 3, 2025

Choose a reason for hiding this comment

hannesm commented Mar 3, 2025

PizieDust commented Mar 3, 2025

hannesm commented Mar 3, 2025

hannesm commented Mar 3, 2025

PizieDust commented Mar 3, 2025

hannesm commented Mar 3, 2025

PizieDust commented Mar 3, 2025 •

edited

Loading