`gimlet_sp` and RoT artifact kinds should probably be something different #7841

rmustacc · 2025-03-19T22:30:34Z

Today we have three different groups of artifact kinds for images that represent SP images:

gimlet_sp
psc_sp
switch_sp

These are single-unit artifacts. These artifacts have different names such as gimlet-b. When software is trying to perform an update today it matches the hubris caboose with the board name. Already today we are using a generic class of hardware for the switch and psc, but we are using a specific name for a server sled. With the forthcoming introduction of Cosmo SP images, this is only going to get more confusing. My rough guess here and some thoughts from @jgallagher was that the gimlet_sp kind was just named specifically, but should have been something more generic like sled_sp. This is somewhat reinforced that even Wicket's SpType calls this generically a Sled. There are generally four ways forward here:

We introduce a new cosmo_sp entry and say that server sled classes are specific. This seems the opposite of the direction that we want to go as there's actually nothing really different about the kind here.
We say, well the name gimlet_sp is baked and so we're just going to use that as the kind for the Cosmo SP artifacts.
We rename the gimlet_sp artifact to something that indicates it's the generic class of sleds.
We take a step back and ask what the signifier of the kind is here and what properties should actually dictate a change. This would effectively combine all of these different kinds into a single one.

An artifact kind in the current glossary draft is a well-known categorization. So given that we are using the artifact name to distinguish these, how do we figure out when there should and shouldn't be a category for something like an STM32H753 Hubris image. That is, I would question why even have different categories for these at all. The thing that seems to be the unifying pieces here seem to be:

The correct artifact and deployment unit can be determined by matching the caboose board property.
We are assuming there is a single archive that we need to pick.

Similarly the RoT and RoT bootloader have their own unique properties that make it clear that these all should be different artifact kinds. But it's not really clear to me that every class of hardware should, until we change something about these properties. For example, say we had an RoT that was based upon some other microcontroller that had the option of banked flash and therefore we had a very different way that the actual artifact was packaged and how we selected it, it follows that the artifact kind would have to be different. Conversely say we had a different generation SP hardware but it had the same single-unit nature, then there's no reason we probably couldn't reuse the same artifact kind since the way we would select and extract it would be the same.

I think this all leads me down towards (4) and that perhaps that the general kind is broader than we have used to date for the SP, RoT, and RoT bootloader. We should probably at least do (3) here, though I think we should consider (4) more strongly and what properties actually make sense here.

The text was updated successfully, but these errors were encountered:

labbott · 2025-03-20T14:40:28Z

For transparency, I think if we decided to go with option 2 today everything would "just work".

That said, I agree about pursuing a mix of option 3 and 4. More generically than "The correct artifact and deployment unit can be determined by matching the caboose board property." is that the correct artifact can be selected and updated in a consistent manner.

Consider if we didn't have a separate RoT bootloader type and modeled the bootloader as another entry in the composite type. There is update logic needed to select whether we apply the A or B artifact but that same logic does not apply to the bootloader. The sequence of MGS calls to update a bootloader is also different than updating hubris and in fact requires multiple reboots of the RoT. This would be a pain to keep under a single artifact kind.

How different would an SP have to be before we would really need/want a new artifact kind? The distinction of "select and extract it would be the same" is part of it but consider an SP with a single image but we require an extra reset of the SP or an extra manual step because of some hardware feature or even a hardware misfeature we don't discover until much later. If the application logic is sufficiently different that could be an argument for another SP kind.

SP artifact types match 1-to-1 to control plane types (

omicron/gateway-types/src/component.rs

Lines 12 to 48 in 5c90e2b

    
           #[derive( 
        
               Debug, 
        
               Clone, 
        
               Copy, 
        
               PartialEq, 
        
               Eq, 
        
               PartialOrd, 
        
               Ord, 
        
               Serialize, 
        
               Deserialize, 
        
               JsonSchema, 
        
           )] 
        
           #[serde(rename_all = "lowercase")] 
        
           pub enum SpType { 
        
               Sled, 
        
               Power, 
        
               Switch, 
        
           } 
        
           #[derive( 
        
               Debug, 
        
               Clone, 
        
               Copy, 
        
               PartialEq, 
        
               Eq, 
        
               PartialOrd, 
        
               Ord, 
        
               Serialize, 
        
               Deserialize, 
        
               JsonSchema, 
        
           )] 
        
           pub struct SpIdentifier { 
        
               #[serde(rename = "type")] 
        
               pub typ: SpType, 
        
               #[serde(deserialize_with = "deserializer_u32_from_string")] 
        
               pub slot: u32, 
        
           }

) so it's also worth asking if the control plane cares about any distinction.

There's also the implicit assumption that all artifacts of single SP/RoT kind are the same version (

omicron/update-common/src/artifacts/update_plan.rs

Lines 1050 to 1097 in 5c90e2b

    
           // Ensure that all A/B RoT images for each board kind and same 
        
           // signing key have the same version. (i.e. allow gimlet_rot signed 
        
           // with a staging key to be a different version from gimlet_rot signed 
        
           // with a production key) 
        
           for (entry, versions) in &self.rot_by_sign { 
        
               let kind = entry.kind; 
        
               // This unwrap is safe because we check above that each of the types 
        
               // has at least one entry 
        
               let version = &versions.first().unwrap().id.version; 
        
               match versions.iter().find(|x| x.id.version != *version) { 
        
                   None => (), 
        
                   Some(v) => { 
        
                       return Err(RepositoryError::MultipleVersionsPresent { 
        
                           kind, 
        
                           v1: version.clone(), 
        
                           v2: v.id.version.clone(), 
        
                       }); 
        
                   } 
        
               } 
        
           } 
        
           let mut rot_by_sign = HashMap::new(); 
        
           for (k, v) in self.rot_by_sign { 
        
               for val in v { 
        
                   rot_by_sign.insert(val.id, k.sign.clone()); 
        
               } 
        
           } 
        
           // Repeat the same version check for all SP images. (This is a separate 
        
           // loop because the types of the iterators don't match.) 
        
           for (kind, mut single_board_sp_artifacts) in [ 
        
               (KnownArtifactKind::GimletSp, self.gimlet_sp.values()), 
        
               (KnownArtifactKind::PscSp, self.psc_sp.values()), 
        
               (KnownArtifactKind::SwitchSp, self.sidecar_sp.values()), 
        
           ] { 
        
               // We know each of these iterators has at least 1 element (checked 
        
               // above) so we can safely unwrap the first. 
        
               let version = &single_board_sp_artifacts.next().unwrap().id.version; 
        
               for artifact in single_board_sp_artifacts { 
        
                   if artifact.id.version != *version { 
        
                       return Err(RepositoryError::MultipleVersionsPresent { 
        
                           kind, 
        
                           v1: version.clone(), 
        
                           v2: artifact.id.version.clone(), 
        
                       }); 
        
                   } 
        
               } 
        
           }

). A careful reading of this code also notes that we do a separate version check for RoT images with different signatures. Our artifact kinds do not make any kind of indication of how an artifact is signed. I wasn't around for the original design of the artifact kinds but I'm guessing the reason we ended up with psc_rot and sidecar_rot and gimlet_rot is for dealing with the signatures. I expect us to sign more things in the future.

Summarizing some of these thoughts

What does an artifact kind tell us about how an artifact is updated?
What does an artifact kind tell us about how an artifact is signed?
If we allow multiple artifacts of the same kind, what properties do we expect all artifact kinds to have?

lzrd · 2025-03-20T16:58:00Z

(Not to be confused with similar names in other name spaces.) ~~This issue discusses being able to change the BORD value in the caboose or other means to adapt to similar hardware needing distinct images: We need to be able to change board names on update #1595.~~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`gimlet_sp` and RoT artifact kinds should probably be something different #7841

`gimlet_sp` and RoT artifact kinds should probably be something different #7841

rmustacc commented Mar 19, 2025

labbott commented Mar 20, 2025

lzrd commented Mar 20, 2025 •

edited

Loading

gimlet_sp and RoT artifact kinds should probably be something different #7841

gimlet_sp and RoT artifact kinds should probably be something different #7841

Comments

rmustacc commented Mar 19, 2025

labbott commented Mar 20, 2025

lzrd commented Mar 20, 2025 • edited Loading

`gimlet_sp` and RoT artifact kinds should probably be something different #7841

`gimlet_sp` and RoT artifact kinds should probably be something different #7841

lzrd commented Mar 20, 2025 •

edited

Loading