[AN-356] When a cluster fails to start up, don't detach persistent disk #4821

lucymcnatt · 2025-01-13T21:10:06Z

Jira ticket: https://broadworkbench.atlassian.net/browse/AN-356

Summary of changes

What

This PR moves the detach logic so that the disk is only detached when a runtime fails to create and the disk isn't in creating or failed

Why

When the startup script fails (due to a full disk etc) the persistent disk becomes ‘detached’ in the DB. (The disk id is removed from the RUNTIME_CONFIG)

This means that the user cannot even try to increase their disk size in the UI when they have a full disk because they will get a persistent disk not found for runtime error.

Testing these changes

What to test

create a jupyter runtime with a small disk
go to the terminal and df -Th to see how much space is available on sdb
fallocate a file to fill up the space
pause the runtime
start the runtime --> it should fail
increase the disk size and starting again

Who tested and where

This change is covered by automated tests
- NB: Rerun automation tests on this PR by commenting jenkins retest or jenkins multi-test.
I validated this change
Primary reviewer validated this change
I validated this change in the dev environment

codecov · 2025-01-13T22:16:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.62%. Comparing base (dce08ef) to head (055d80b).
Report is 2 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4821      +/-   ##
===========================================
- Coverage    74.62%   74.62%   -0.01%     
===========================================
  Files          166      166              
  Lines        14692    14690       -2     
  Branches      1135     1158      +23     
===========================================
- Hits         10964    10962       -2     
  Misses        3728     3728

Files with missing lines	Coverage Δ
...nardo/monitor/BaseCloudServiceRuntimeMonitor.scala	`89.35% <100.00%> (-0.09%)`	⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dce08ef...055d80b. Read the comment docs.

LizBaldo · 2025-01-14T15:21:53Z

.../org/broadinstitute/dsde/workbench/leonardo/monitor/BaseCloudServiceRuntimeMonitorSpec.scala

@@ -101,6 +101,7 @@ class BaseCloudServiceRuntimeMonitorSpec extends AnyFlatSpec with Matchers with
      disk <- makePersistentDisk().save()
      start <- IO.realTimeInstant
      tid <- traceId.ask[TraceId]
+      implicit0(ec: ExecutionContext) = scala.concurrent.ExecutionContext.Implicits.global


What does his do? Still wrapping my head around how to use implicits properly 😭

...honestly I'm not entirely sure myself, just that I needed the EC implicit to do the disk query

LizBaldo · 2025-01-14T15:25:03Z

.../org/broadinstitute/dsde/workbench/leonardo/monitor/BaseCloudServiceRuntimeMonitorSpec.scala

@@ -303,6 +311,47 @@ class BaseCloudServiceRuntimeMonitorSpec extends AnyFlatSpec with Matchers with
    res.unsafeRunSync()(cats.effect.unsafe.IORuntime.global)
  }

+  it should "detach Ready disk on failed runtime create" in isolatedDbTest {


I am thinking, should we also detach the PD when the runtime is in deleting status? not just deleted

that's done as a part of the deletedRuntime func in the GceRuntimeMonitor (complete deletion detaches the disk)

lucymcnatt added 2 commits January 13, 2025 16:08

[AN-356] When a cluster fails to start up, dont detach persistent disk

fa60d02

updating unit tests

055d80b

lucymcnatt marked this pull request as ready for review January 13, 2025 21:57

lucymcnatt requested a review from a team as a code owner January 13, 2025 21:57

LizBaldo reviewed Jan 14, 2025

View reviewed changes

salonishah11 requested a review from aednichols January 14, 2025 16:10

LizBaldo approved these changes Jan 15, 2025

View reviewed changes

aednichols approved these changes Jan 15, 2025

View reviewed changes

lucymcnatt merged commit 3fd8cc2 into develop Jan 16, 2025
23 checks passed

lucymcnatt deleted the AN-356-disk-detachment branch January 16, 2025 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AN-356] When a cluster fails to start up, don't detach persistent disk #4821

[AN-356] When a cluster fails to start up, don't detach persistent disk #4821

lucymcnatt commented Jan 13, 2025 •

edited

Loading

codecov bot commented Jan 13, 2025 •

edited

Loading

LizBaldo Jan 14, 2025

lucymcnatt Jan 15, 2025

LizBaldo Jan 14, 2025

lucymcnatt Jan 15, 2025

[AN-356] When a cluster fails to start up, don't detach persistent disk #4821

[AN-356] When a cluster fails to start up, don't detach persistent disk #4821

Conversation

lucymcnatt commented Jan 13, 2025 • edited Loading

Summary of changes

What

Why

Testing these changes

What to test

Who tested and where

codecov bot commented Jan 13, 2025 • edited Loading

Codecov Report

LizBaldo Jan 14, 2025

Choose a reason for hiding this comment

lucymcnatt Jan 15, 2025

Choose a reason for hiding this comment

LizBaldo Jan 14, 2025

Choose a reason for hiding this comment

lucymcnatt Jan 15, 2025

Choose a reason for hiding this comment

lucymcnatt commented Jan 13, 2025 •

edited

Loading

codecov bot commented Jan 13, 2025 •

edited

Loading