Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vault: limit token delay to not exceed token TTL #504

Merged
merged 1 commit into from
Jan 30, 2025
Merged

Conversation

aead
Copy link
Member

@aead aead commented Jan 29, 2025

This commit fixes a bug in the Vault token renewal. When KES renews its current token T1 and receives a new token T2, KES waits a certain amount of time before using T2 to account for replication lag between Vault nodes.

Vault tokens are not signed but static secrets. Each Vault node in a dist. cluster needs to know the token before being able to verify it. Without the usage delay described above, KES might send a request to a Vault node that has not received the new token T2, yet.

Now, KES must also not wait longer then the remaining TTL of the current token T1. Otherwise, T1 expires BEFORE KES starts using T2. This results in auth errors like the following:

Error making API request.\n\nURL: GET http://127.0.0.1:8200/v1/kv/data/my-key3\nCode: 403. Errors:\n\n* 2 errors occurred:\n\t* permission denied\n\t* invalid token\n\n" req.method=POST req.path=/v1/key/create/my-key3 req.ip=127.0.0.1 req.identity=a49fea12c5d1c69f1eba1e4697e62ccdbe389a80f317191892711b47d83c3e85

This commit limits the max. time KES waits before using the new token T2 to either half the remaining TTL of T1 or 30s - whatever is shorter. This ensures that T1 is still valid once KES switches to T2.

@aead aead requested review from vadmeste and ramondeklein January 29, 2025 22:04
@@ -210,6 +213,9 @@ func (c *client) RenewToken(ctx context.Context, authenticate authFunc, secret *
if err == nil {
break
}
if resp, ok := err.(*vaultapi.ResponseError); ok && http.StatusBadRequest <= resp.StatusCode && resp.StatusCode < http.StatusInternalServerError {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduces the time we spent trying to renew the token

@aead aead force-pushed the var-token-renew-delay branch 3 times, most recently from 228d734 to 7d8d150 Compare January 29, 2025 22:13
@aead aead force-pushed the var-token-renew-delay branch from 11d57e4 to 5c7f6b0 Compare January 30, 2025 08:44
This commit fixes a bug in the Vault token renewal.
When KES renews its current token `T1` and receives
a new token `T2`, KES waits a certain amount of time
before using `T2` to account for replication lag between
Vault nodes.

Vault tokens are not signed but static secrets. Each Vault
node in a dist. cluster needs to know the token before being
able to verify it. Without the usage delay described above,
KES might send a request to a Vault node that has not received
the new token `T2`, yet.

Now, KES must also not wait longer then the remaining TTL of the
current token `T1`. Otherwise, `T1` expires BEFORE KES starts using
`T2`. This results in auth errors like the following:

```
Error making API request.\n\nURL: GET http://127.0.0.1:8200/v1/kv/data/my-key3\nCode: 403. Errors:\n\n* 2 errors occurred:\n\t* permission denied\n\t* invalid token\n\n" req.method=POST req.path=/v1/key/create/my-key3 req.ip=127.0.0.1 req.identity=a49fea12c5d1c69f1eba1e4697e62ccdbe389a80f317191892711b47d83c3e85
```

This commit limits the max. time KES waits before using the new token `T2`
to either half the remaining TTL of `T1` or 30s - whatever is shorter.
This ensures that `T1` is still valid once KES switches to `T2`.

Co-authored-by: Ramon de Klein <mail@ramondeklein.nl>
Signed-off-by: Andreas Auernhammer <github@aead.dev>
@aead aead force-pushed the var-token-renew-delay branch from 5c7f6b0 to f45aa1d Compare January 30, 2025 08:46
@aead aead merged commit 376928c into master Jan 30, 2025
7 checks passed
@aead aead deleted the var-token-renew-delay branch January 30, 2025 09:29
@ramondeklein
Copy link
Contributor

Do we know why did this start to fail with GenerateKey and not with Stat?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants