Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add omdb commands for printing basic saga information #7695

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Feb 28, 2025

This adds two commands:

  • omdb db saga running: list running sagas
  • omdb db saga details SAGA_ID: print detailed information about one saga

This definitely overlaps with #4378 but I tried not to re-implement anything here that I remember being implemented there.

Related: #4157, #7623

@davepacheco
Copy link
Collaborator Author

Running it against cargo xtask omicron-dev run-all, and having created one demo saga with omdb nexus sagas demo-create, it looks like this:

$ omdb db saga running
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:46709/omicron?sslmode=disable
note: database schema version matches expected (128.0.0)
ID                                   CREATED                  ELAP    NAME STATE   NDONE NRUN 
a3bbe42d-f40c-44bc-b9cc-6551159f8b1d 2025-02-28T20:31:36.818Z 36m 31s demo running 1/3?  1    
$ omdb db saga details a3bbe42d-f40c-44bc-b9cc-6551159f8b1d
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:46709/omicron?sslmode=disable
note: database schema version matches expected (128.0.0)
saga id: a3bbe42d-f40c-44bc-b9cc-6551159f8b1d
saga name: demo
saga state: running
unwinding: no 
created: 2025-02-28 20:31:36.818012 UTC (36m 37s ago)
creator SEC: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
current SEC: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
adopt generation: 1
last adopted: 2025-02-28T20:31:36.818Z

ACTION EXECUTION STATE
actions completed (includes failures): 1
estimated total nodes: 3
nodes completed: 1
           actions: 1 (includes failures)
      undo actions: 0 (includes failures)
nodes running: 1
           actions: 1
               node 0 started but not finished
      undo actions: 0

DAG INFORMATION
node    0: demo_wait (DemoWait)
note: start node, end node, and subsaga start nodes are not printed.

@davepacheco
Copy link
Collaborator Author

This is more duplicative with #4378 than I realized. I'm sorry about that! It may make sense to just close this in favor of that one, but maybe some of the stuff here can be useful for doing pagination, printing as much as possible when we fail to load stuff, etc. I'm also not sure that one's show prints a similarly concise summary of running actions, etc.

jmpesp added a commit that referenced this pull request Mar 5, 2025
Breaking apart #4378 and copying the structure of #7695, add `omdb db
saga` as a command and implement the following sub-commands:

    Usage: omdb db saga [OPTIONS] <COMMAND>

    Commands:
      running  List running sagas
        fault  Inject an error into a saga's currently running node

This addresses part of the minimum amount required during a release
deployment:

1. after quiescing (#6804), omdb can query if there are any running
   sagas.

2. if those running sagas are stuck in a loop and cannot be drained
   (#7623), and the release contains a change to the DAG that causes
   Nexus to panic after an upgrade (#7730), then omdb can inject a fault
   into the database that would cause that saga to unwind when the
   affected Nexus is restarted

Note for 2, unwinding a saga that is stuck in this way may not be valid
if there were significant changes between releases.
@jmpesp
Copy link
Contributor

jmpesp commented Mar 5, 2025

This is more duplicative with #4378 than I realized. I'm sorry about that! It may make sense to just close this in favor of that one, but maybe some of the stuff here can be useful for doing pagination, printing as much as possible when we fail to load stuff, etc. I'm also not sure that one's show prints a similarly concise summary of running actions, etc.

I'm going to be breaking apart 4378, and am taking your suggestion about using some the structure of this PR for that. I opened #7732 for this. Thanks for all the reviews :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants