Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: return better HTTP errors when not ensured #649

Merged
merged 4 commits into from
Apr 10, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
move error codes to an enum propolis-api-types
hawkw committed Apr 10, 2024

Verified

This commit was signed with the committer’s verified signature.
hawkw Eliza Weisman
commit 3ad59e910ce785bfbda9c45692de7bf78640a69e
29 changes: 15 additions & 14 deletions bin/propolis-server/src/lib/server.rs
Original file line number Diff line number Diff line change
@@ -571,19 +571,21 @@ async fn instance_ensure_common(
{
let existing_properties = existing.properties();
if existing_properties.id != properties.id {
return Err(HttpError::for_status(
Some(format!(
return Err(HttpError::for_client_error(
Some(api::ErrorCode::AlreadyInitialized.to_string()),
http::status::StatusCode::CONFLICT,
format!(
"Server already initialized with ID {}",
existing_properties.id
)),
http::status::StatusCode::CONFLICT,
),
));
}

if *existing_properties != properties {
return Err(HttpError::for_status(
Some("Cannot update running server".to_string()),
return Err(HttpError::for_client_error(
Some(api::ErrorCode::AlreadyRunning.to_string()),
http::status::StatusCode::CONFLICT,
"Cannot update running server".to_string(),
));
}

@@ -653,10 +655,11 @@ async fn instance_ensure_common(
vm_hdl.await.unwrap()
}
.map_err(|e| {
HttpError::for_internal_error(format!(
"failed to create instance: {}",
e
))
HttpError::for_client_error(
Some(api::ErrorCode::CreateFailed.to_string()),
http::status::StatusCode::INTERNAL_SERVER_ERROR,
format!("failed to create instance: {e}"),
)
})?;

if let Some(ramfb) = vm.framebuffer() {
@@ -875,7 +878,7 @@ async fn instance_state_monitor(
// Inform the client of this condition so it doesn't wait forever.
state_watcher.changed().await.map_err(|_| {
HttpError::for_client_error(
Some(NO_INSTANCE.to_string()),
Some(api::ErrorCode::NoInstance.to_string()),
http::status::StatusCode::GONE,
format!(
"No instance present; will never reach generation {}",
@@ -1193,11 +1196,9 @@ pub fn api() -> ApiDescription<Arc<DropshotEndpointContext>> {
api
}

const NO_INSTANCE: &str = "NO_INSTANCE";

fn not_created_error() -> HttpError {
HttpError::for_client_error(
Some(NO_INSTANCE.to_string()),
Some(api::ErrorCode::NoInstance.to_string()),
http::StatusCode::FAILED_DEPENDENCY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soapbox rant, not necessarily a problem with the change: I get what we're going for here, but I've had bad luck in the past with using slightly off-meaning status codes in this way--they have a tendency, IME, to get bubbled up to a user somehow, and then the user reads how the status was originally described in whatever document or header defined it, and then folks get confused about what's really happening. In this case, RFC 4918 defines 424 Failed Dependency to mean that

The 424 (Failed Dependency) status code means that the method could
not be performed on the resource because the requested action
depended on another action and that action failed. For example, if a
command in a PROPPATCH method fails, then, at minimum, the rest of
the commands will also fail with 424 (Failed Dependency).

That's definitely close, but in the scenario that led us here (Propolis panic) no earlier requested actions actually failed--the problem is that the server reset.

Anyway, this is a soapbox rant; I don't think 424's semantics are far enough off from what we intend for this to be a blocker. But if we are leaning toward using descriptive strings in the error_code field (at least for internal control plane communication) I might be inclined to lean on those and stick to the HTTP error codes in RFC 9110.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely close, but in the scenario that led us here (Propolis panic) no earlier requested actions actually failed--the problem is that the server reset.

My rationale was that in the context of this particular propolis-server process, we depended on an earlier action (InstanceEnsure) which has not been performed at all. So, I guess indicating that it failed is not quite correct, but, hmm...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline and decided to stick with 424 Failed Dependency for now, to distinguish this case from the 404 Not Found you get from requesting a nonexistent route and the 410 Gone you get from trying to query a VM's state after it has already shut down and been destroyed.

"Server not initialized (no instance)".to_string(),
)
46 changes: 45 additions & 1 deletion crates/propolis-api-types/src/lib.rs
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@

//! Definitions for types exposed by the propolis-server API

use std::net::SocketAddr;
use std::{fmt, net::SocketAddr};

use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
@@ -395,3 +395,47 @@ pub struct SnapshotRequestPathParams {
pub struct VCRRequestPathParams {
pub id: Uuid,
}

/// Error codes used to populate the `error_code` field of Dropshot API responses.
#[derive(
Clone, Copy, Debug, Deserialize, PartialEq, Eq, Serialize, JsonSchema,
)]
pub enum ErrorCode {
/// This `propolis-server` process has not received an `InstanceEnsure`
/// request yet.
NoInstance,
/// This `propolis-server` process has already received an `InstanceEnsure`
/// request with a different ID.
AlreadyInitialized,
/// Cannot update a running server.
AlreadyRunning,
/// Instance creation failed
CreateFailed,
}

impl fmt::Display for ErrorCode {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fmt::Debug::fmt(self, f)
}
}

impl std::str::FromStr for ErrorCode {
type Err = &'static str;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s.trim() {
s if s.eq_ignore_ascii_case("NoInstance") => Ok(Self::NoInstance),
s if s.eq_ignore_ascii_case("AlreadyInitialized") => {
Ok(ErrorCode::AlreadyInitialized)
}
s if s.eq_ignore_ascii_case("AlreadyRunning") => {
Ok(ErrorCode::AlreadyRunning)
}
s if s.eq_ignore_ascii_case("CreateFailed") => {
Ok(ErrorCode::CreateFailed)
}
_ => Err("unknown error code, expected one of: \
'NoInstance', 'AlreadyInitialized', 'AlreadyRunning', \
'CreateFailed'"),
}
}
}