Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IA-5053] Add retries to SamService, flesh out more authz methods #4779

Merged
merged 13 commits into from
Sep 6, 2024

Conversation

rtitle
Copy link
Collaborator

@rtitle rtitle commented Sep 5, 2024

https://broadworkbench.atlassian.net/browse/IA-5053

Summary of changes

Draft while I self-review and test. This PR fleshes out more authz logic in SamService/SamServiceInterp. It does NOT actually update Leo to use SamService for authorization yet (I think we will need to do some more testing and potential historical data migrations before that can happen).

What

  • Added pattern for retries in SamServiceInterp
  • Added access control methods (checkAuthz, create/deleteResource, etc)
  • Improved logging
  • Added unit tests

Why

  • I'm trying to prepare for when we can cut over to SamService for real. Next step will be understanding/migrating historical data to the desired state. Docs 1 and 2 will help with this.

Testing these changes

What to test

  • Unit tests
  • Automation tests

Who tested and where

  • This change is covered by automated tests
    • NB: Rerun automation tests on this PR by commenting jenkins retest or jenkins multi-test.
  • I validated this change
  • Primary reviewer validated this change
  • I validated this change in the dev environment

@@ -13,45 +19,45 @@ trait SamService[F[_]] {

/**
* Gets a user's pet GCP service account, using the user's token.
* @param userInfo the user info containing an access token
* @param bearerToken the user's access token
Copy link
Collaborator Author

@rtitle rtitle Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to have these methods take a token string instead of a UserInfo object. It was too tempting to use the userInfo.email field for logging/etc, but that field should not be relied upon. We should instead call SamService.getUserEmail when we need to resolve the email from a token.

}
_ <- logger.info(ctx.loggingCtx)(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some log statements to these methods.

Note: this class looks a lot like WSM SamService, but in Scala. :)

import org.scalatest.BeforeAndAfterAll
import org.scalatest.funspec.AnyFunSpecLike
import org.scalatestplus.mockito.MockitoSugar

import java.util.UUID
import scala.jdk.CollectionConverters._

class SamServiceInterpSpec extends AnyFunSpecLike with LeonardoTestSuite with BeforeAndAfterAll with MockitoSugar {
Copy link
Collaborator Author

@rtitle rtitle Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of tests here but I think it has pretty complete coverage of SamServiceInterp. It mocks everything so runs very fast. I tried to organize test cases a bit with ScalaTest FunSpec.

@rtitle
Copy link
Collaborator Author

rtitle commented Sep 5, 2024

@LizBaldo this might be relevant to your auth domain work. I'm trying to get SamService in Leo "ready for prime time". This PR doesn't actually start using it yet for authz, but improves some of the code.

private val defaultSamRetryConfig =
RetryConfig(addJitter(1 seconds, 1 seconds), _ * 2, 5, isRetryable)

/**
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)
}

// All Leo resources have a creator role
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for apps with the workspace-shared access policy? Leo's config has an "owner" role there. I'm assuming also that when you say "leo resource" you're referring only to v1 stuff? I'm not sure how WSM handles these roles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are there cases where the Leonardo SA is creating resources?

Copy link
Collaborator Author

@rtitle rtitle Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was trying to assume "desired state" here, where we migrate away from WSM-owned resources, and use dedicated resource types for Leo. That was a recommendation in this doc.

Hm you're right that kubernetes-app-shared has owner and user roles. And kubernetes-app has creator and manager.

Maybe I'll revisit the method signature here… I was trying to see if createResource could be generic to all types of resources Leo supports, or if we need a createRuntimeResource, createDiskResource, createKubernetesAppResource, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are there cases where the Leonardo SA is creating resources?

I don't think so -- all Sam resources should be created by the user. It's also a recommendation to not create Sam resources async. So I was intentionally giving the createResource method a userToken parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. And I love the idea of a standardized Leo-managed resource with predictable roles. Maybe creating a resource gives you the Creator role but depending on the resource, the context, and the user, they could get other roles as well? This might just require a bit more design to nail down an approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could start with this method and add other methods for special cases if we need to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlilynolting what do you think of this change? Instead of assuming a creator role, I updated SamService.createResource to just take the policies directly. We can maybe add specialized methods (getRuntimePolicies, getAppPolicies, etc) later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a solid change that reflects the Sam contract really faithfully - Leo calls Sam with a new policy it wants to create for this resource. The policy name doesn't need to be unique, right?

Copy link
Collaborator Author

@rtitle rtitle Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the policy name can be any string -- I think it just needs to be unique per resource (e.g. you can't have 2 foo policies on a resource). But the Sam request should fail if uniqueness is violated.

@@ -211,7 +217,7 @@ class SamServiceInterp[F[_]](apiClientProvider: SamApiClientProvider[F],
samResourceId: SamResourceId,
projectParent: Option[GoogleProject],
workspaceParent: Option[WorkspaceId],
creator: Option[WorkbenchEmail]
policies: Map[String, SamPolicyData]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to type the key here as SamPolicyName?

Copy link
Collaborator Author

@rtitle rtitle Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with String because I don't love how SamPolicyName is an enum in Leo. It's not really an enumeration that maps to anything in Sam, this is just an arbitrary name for the policy. (Often role names are reused for policy names, see WSM for example).

SamPolicyName used all over the place in the old auth code (and conflated with roles somewhat) so we can't remove/change it yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great reason not to do it, lol. Could you add a comment suggesting that we might want to move to a dedicated policyName type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍

Copy link

codecov bot commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 97.97980% with 2 lines in your changes missing coverage. Please review.

Project coverage is 74.36%. Comparing base (0f69d85) to head (8c7690e).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...kbench/leonardo/dao/sam/SamApiClientProvider.scala 0.00% 1 Missing ⚠️
...ute/dsde/workbench/leonardo/dao/sam/SamRetry.scala 90.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4779      +/-   ##
===========================================
+ Coverage    74.20%   74.36%   +0.16%     
===========================================
  Files          164      165       +1     
  Lines        14981    15060      +79     
  Branches      1243     1197      -46     
===========================================
+ Hits         11117    11200      +83     
+ Misses        3864     3860       -4     
Files with missing lines Coverage Δ
...dsde/workbench/leonardo/dao/sam/SamException.scala 100.00% <100.00%> (ø)
...e/dsde/workbench/leonardo/dao/sam/SamService.scala 100.00% <100.00%> (ø)
.../workbench/leonardo/dao/sam/SamServiceInterp.scala 98.29% <100.00%> (+2.29%) ⬆️
...ench/leonardo/http/service/DiskServiceInterp.scala 90.90% <100.00%> (ø)
...ch/leonardo/http/service/LeoAppServiceInterp.scala 86.96% <100.00%> (ø)
...h/leonardo/http/service/RuntimeServiceInterp.scala 87.90% <100.00%> (ø)
...kbench/leonardo/dao/sam/SamApiClientProvider.scala 0.00% <0.00%> (ø)
...ute/dsde/workbench/leonardo/dao/sam/SamRetry.scala 90.00% <90.00%> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f69d85...8c7690e. Read the comment docs.

@mlilynolting
Copy link
Contributor

This all looks great to me. Are you planning anything else before you un-draft the PR?

@rtitle
Copy link
Collaborator Author

rtitle commented Sep 5, 2024

Thanks! I'm just tweaking some of the retry behavior, but not planning anything else substantial. I'll plan to un-draft it soon.

@rtitle rtitle marked this pull request as ready for review September 6, 2024 12:50
@rtitle
Copy link
Collaborator Author

rtitle commented Sep 6, 2024

@mlilynolting @LizBaldo I tested this on a BEE and things look good. Moved out of draft, requesting reviews -- thanks.

@rtitle rtitle changed the title [DRAFT] [IA-5053] Add retries to SamService, flesh out more authz methods [IA-5053] Add retries to SamService, flesh out more authz methods Sep 6, 2024
.adaptError { case e: ApiException =>
SamException.create("Error listing resources from Sam", e, ctx.traceId)
}
} yield resources.asScala.toList.map(_.getResourceId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"has access to" might be a bit limiting. I don't know that there is a use case for listing just the resource IDs here - there might be a specific access right that we want to check, like read. I don't think we necessarily need to change that in this PR, since this method isn't actually used yet. Just a headsup that this might need to change for Leo's use case

Copy link
Collaborator Author

@rtitle rtitle Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point - we might want to additionally check specific actions here. I agree this can be changed in a subsequent PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So y ou are saying adding a scope to this function that could be a role or an action? E.g. list all resources the user has read access to? That might be tricky since different resources can have completely different actions no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally the current use cases would be covered by adding a list of actions. We don't want to check roles if we can avoid it (we might have to if we have to use parent role as a standin for child action). This method is only called in the context of a single resource type so that will scope us to only the actions for that type.

Copy link
Collaborator Author

@rtitle rtitle Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @mlilynolting -- I'm imagining adding a new parameter actions: List[SamAction] to this method, which would further constrain the results.

.adaptError { case e: ApiException =>
SamException.create("Error listing resources from Sam", e, ctx.traceId)
}
} yield resources.asScala.toList.map(_.getResourceId)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So y ou are saying adding a scope to this function that could be a role or an action? E.g. list all resources the user has read access to? That might be tricky since different resources can have completely different actions no?

@rtitle rtitle merged commit 87b81f7 into develop Sep 6, 2024
23 of 24 checks passed
@rtitle rtitle deleted the rt-ia-5053 branch September 6, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants