Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virt_mshv_vtl: Returning appropriate values for various CPUIDs. #749

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

vibhutet
Copy link
Contributor

Addresses issue #556.

@vibhutet vibhutet requested a review from a team as a code owner January 30, 2025 22:41
if (result.level_number() != super::CPUID_LEAF_B_LEVEL_NUMBER_SMT)
|| (result.level_type() != super::CPUID_LEAF_B_LEVEL_TYPE_SMT)
{
tracing::trace!("Topology not found!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these error scenarios? Is emitting a trace sufficient? Can the message at least contain more detail on what's missing and why that's a problem? Note that traces won't be emitted during normal operation, only during specific debug scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpuid instruction execution is intercepted by the kernel. The kernel returns all 0s for this leaf and some other leaves. Hence, we get an incorrect ecx value here. Ideally this behavior should be corrected in the kernel. For basic checking purposes, made a note of the error. I will change the trace to error with more error details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications for the guest if the ecx is incorrect? Could it potentially cause the guest to do unexpected behavior? I'm wondering if we should fix the kernel first.

@@ -183,6 +183,7 @@ pub struct CpuidResults {
max_extended_state: u64,
arch_support: Box<dyn CpuidArchSupport>,
vps_per_socket: u32,
max_xfd: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this value being used? If it's only going to be used in TDX scenarios can we do everything that needs to be done inside of process_extended_state_subleaves instead of returning it?

@@ -65,7 +65,7 @@ trait CpuidArchInitializer {
&self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this, update this comment to explain what the return value is.

Although, seeing the TODO on this method, should we make that change now to handle max_xfd instead of returning it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the TODO comment, it might be a good idea to pass CpuidResults and update the max_xfd value in it, instead of returning the value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there are any cpuid leaves that we currently aren't supporting that we should support, or that we are supporting and shouldn't support? Also, there's this comment:

// TODO TDX: The following aren't required from AMD. Need to double-check if
// they're required for TDX

I'm curious if these are required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CPUID list needs to be evaluated. There is a different issue tracking this: #562

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the comment on what if ecx is incorrect. Since leaf 0xB is used to read the topology, invalid data received and erroring out here will lead to incorrect evaluation of vps and TD boot failure. Similar issue was observed in the topology builder while calculating the vps for the TD. As a backup, leaf 0x4 is being read over there to figure the correct number of threads and vps. (Issue #481)
Yes, this behavior for CPUID 0xB and others should be fixed in the kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just be fixing this in the kernel then instead of in usermode?

if (result.level_number() != super::CPUID_LEAF_B_LEVEL_NUMBER_SMT)
|| (result.level_type() != super::CPUID_LEAF_B_LEVEL_TYPE_SMT)
{
tracing::trace!("Topology not found!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications for the guest if the ecx is incorrect? Could it potentially cause the guest to do unexpected behavior? I'm wondering if we should fix the kernel first.

While accessing MSRs MSR_XFD(0x1C4) and MSR_XFD_ERR(0x1C5)
the calculated max_xfd value would have been used. Since,
these two MSRs are in the allowed MSRs read list, the
L1VMM won't receive an intercept for them and the value
won't be used. Hence, removing this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants