Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a caboose EPOC tag used by RoT update_server for rollback protection #33

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lzrd
Copy link
Contributor

@lzrd lzrd commented Jun 10, 2024

Issue #31

Comment on lines 686 to 700
// Hubris has an epoch in the app.toml but Bootleby does not.
// CLI arg overrides app.toml, value will default to zero.
let epoc: u32 = if let Some(param) = epoch {
param.parse::<u32>().map_err(|_| Error::InvalidEpoch(param.to_string()))?
} else if let Some(param) = manifest.as_table().ok_or(Error::BadTomlType)?.get("epoch") {
let toml_int = param.as_integer().ok_or(Error::BadTomlType)?;
u32::try_from(toml_int).map_err(|_| Error::InvalidEpoch(format!("epoch from toml = {}", toml_int)))?
} else {
// Default is zero if missing from app.toml
0u32
};
// EPOC in the caboose is a u32, zero-padded 10-digit ascii number.
// Missing EPOC or all-zeros is equivalent to EPOC zero.
// Anything else represents some future scheme or a typo in toml file.
let caboose_epoc = format!("{:010}", epoc).to_owned();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epoch was added in the app.toml as a random thought. Is there value in keeping it there vs getting surprised later?

I'm also not a huge fan of the parsing then re-formatting. I'd rather require the user to specify the epoch exactly and then reject it if it isn't in the expected form

Copy link
Contributor Author

@lzrd lzrd Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the line "epoch = 0" in app.toml files which Toml returns as an i64.
I could remove the leading ASCII zeros. I had thought originally to compare EPOC byte by byte, but I wound up converting to u32 to stay consistent with ImageHeader's representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: "The epoch was added in app.toml as a random thought..."

The ImageHeader version isn't expected to be used, but the epoch field has a very nice property for rollback protection: it is available to update_server before any flash is erased.

This allows us to reject updates early and keeps us from wearing out flash if some person or process keeps retrying because they don't understand that rollback protection is keeping the update from being applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about our release flow. We manage version strings manually and epoch could be the same.
That is very helpful for SP hubris which is compiled differently for the different target boards. We might have some dangerous PSC SP release that we want to ban with rollback protection. We would increment epoch there but wouldn't have to for the other platforms. Using the app.toml to track epoch works there because of the different app.toml files per target.

We don't have an app.toml for bootleby. I would like to find a git-controlled place to put it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we currently build a minimal fake app.toml under hubtools/bootleby.rs. How about adding an --epoch flag to package-elf and adding the epoch to the fake toml? We could commit the epoch to our release build scripts that way.

I'd actually like to see about committing the version string and tag to git to make releases even less prone (the only runtime check would be the expected hash of the output) so we could also do the epoch there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managing the version string for releases needs more discussion.
We could put a json file in sprot-release that binds (repo, commit, version) or (repo, commit, version, epoch).
But that source of truth is redundant with artifacts that permslip stores. We could call it a release plan.
We could just store the version to be used.
We could aromatically query production permslip on the next available major or minor version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added optional --epoch u32 to --package-elf as you suggested.

@lzrd lzrd force-pushed the epoch branch 3 times, most recently from 38c7a22 to 0f8d3cf Compare August 15, 2024 22:22
@lzrd lzrd marked this pull request as ready for review August 19, 2024 18:05
@@ -93,6 +98,8 @@ pub enum Command {
board: String,
/// Git has for the hubris archive
gitc: String,
/// Epoch for rollback protection
epoch: Option<u32>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be optional here, or should it be a bare u32 (with responsibility for deciding on a default value given to the caller)?

@@ -33,6 +33,9 @@ pub enum Command {
#[clap(short, long)]
version: String,

#[clap(short, long)]
epoch: Option<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option<u32>?

Should this be optional at all, or should it be mandatory (like version)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to change this to mandatory after changes to sprot-release procedures implement reviewable release plans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...that will probably be worked into sprot-release PR#11 or a follow-on.

@mkeeter
Copy link
Collaborator

mkeeter commented Aug 22, 2024

Thinking about it last night, I feel like there's a missing design doc / RFD here that to explain implementation choices. RFD 349 mentions using epochs for rollback protection, but doesn't dive into the details.

Not understanding the overall design means that it's hard to review the individual implementation PRs. Here's a set of questions that I've come up with, which would better be answered by a RFD describing the implementation plan:

  • Where is the epoch canonically stored? Does this differ between bootleby / RoT / SP images?
    • In some cases, it appears to be stored both in the image header and in the caboose. Storing it in both places means we have to think about what happens if they disagree; could we store it in a single place instead?
  • How is the epoch populated? Must it always be present, e.g. during initial build but before releasing through permission-slip?
    • Is there a default value (it seems to be 0), and if so, what are the semantics of the default value versus an intentional epoch of 0?
      • Should any explicitly defined epoch be non-zero, to avoid confusion with a default value?
      • It's possible for the epoch to be absent in the caboose, but it must always be present in the ImageHeader. How do we handle this difference (see above question about storing it in a single place)?
    • How does the epoch change during the Hubris packaging process?
      • Is the epoch specified in app.toml canonical, or is it a placeholder to be overwritten? If it's a placeholder, then can we should remove it from app.toml to avoid confusion?
  • How is the epoch used by various stages of the system?
    • Is it checked by bootleby?
    • Is it checked by the RoT image?
    • Does bootleby check the RoT image's epoch as well as its own?
    • Is it checked by the update system, to prevent bricking machines?
    • In all of these cases, how do they read it?
    • How is it used for SP images?

@labbott
Copy link
Contributor

labbott commented Aug 22, 2024

Thinking about it last night, I feel like there's a missing design doc / RFD here that to explain implementation choices. RFD 349 mentions using epochs for rollback protection, but doesn't dive into the details.

Not understanding the overall design means that it's hard to review the individual implementation PRs. Here's a set of questions that I've come up with, which would better be answered by a RFD describing the implementation plan:

* Where is the epoch canonically stored?  Does this differ between bootleby / RoT / SP images?
  
  * In some cases, it appears to be stored both in the image header and in the caboose.  Storing it in both places means we have to think about what happens if they disagree; could we store it in a single place instead?

* How is the epoch populated?  Must it always be present, e.g. during initial build but before releasing through `permission-slip`?
  
  * Is there a default value (it seems to be 0), and if so, what are the semantics of the default value versus an intentional epoch of 0?
    
    * Should any explicitly defined epoch be non-zero, to avoid confusion with a default value?
    * It's possible for the epoch to be absent in the caboose, but it must always be present in the `ImageHeader`.  How do we handle this difference (see above question about storing it in a single place)?
  * How does the epoch change during the Hubris packaging process?
    
    * Is the epoch specified in `app.toml` canonical, or is it a placeholder to be overwritten?  If it's a placeholder, then can we should remove it from `app.toml` to avoid confusion?

* How is the epoch used by various stages of the system?
  
  * Is it checked by bootleby?
  * Is it checked by the RoT image?
  * Does bootleby check the RoT image's epoch as well as its own?
  * Is it checked by the update system, to prevent bricking machines?
  * In all of these cases, how do they read it?
  * How is it used for SP images?

I think this capturing that much of the release engineering for hubris/bootleby up until now has been a bit ad-hoc and grown as needed and as someone who has done this perhaps a bit too ad hoc I think it's worth treating this as a good checkpoint to write this down and make sure we're doing things the correct way.

Comment on lines -75 to -78
let header = header::ImageHeader {
magic: 0x64_CE_D6_CA,
total_image_len: len as u32,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an intentional choice when this was added to only add the magic and length fields to save space. It feels like a step in the wrong direction to add the full header back.

Bump hubtools version for EPOC tag in caboose
Default epoch value is zero if not otherwise specified.
Command line --epoch option.
Work with the epoch value as a u32. Do not use ImageHeader epoch field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants