Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry decryption from a long-lived async task, instead of lots of detached ones #4715

Merged
merged 8 commits into from
Feb 28, 2025

Conversation

andybalaam
Copy link
Member

@andybalaam andybalaam commented Feb 25, 2025

Part of element-hq/element-meta#2697

In response to a comment on #4644 - avoid spawning a detached async task every time we have some things to decrypt because some Megolm sessions updated.

Instead, we create one async task and hold it inside a new struct called RetryDecryptionTask. This struct holds one end of channel, and sends retry requests to the async task, which stops when the channel is closed, which will happen when the RetryDecryptionTask struct is dropped.

So we keep the feature that retrying decryption does not block the main processing, but we lose the spawning of lots of async tasks that are not kept track of.

As written, the task is not cancellable, but this could be added if we think we need it.

I strongly suggest reviewing commit by commit, since this is structured as a set of refactorings that hopefully make sense individually.

Copy link

codecov bot commented Feb 25, 2025

Codecov Report

Attention: Patch coverage is 73.40426% with 25 lines in your changes missing coverage. Please review.

Project coverage is 86.14%. Comparing base (7a0bf9b) to head (9cb0630).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
...i/src/timeline/controller/decryption_retry_task.rs 72.41% 24 Missing ⚠️
...rates/matrix-sdk-ui/src/timeline/controller/mod.rs 85.71% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4715   +/-   ##
=======================================
  Coverage   86.14%   86.14%           
=======================================
  Files         291      292    +1     
  Lines       34308    34327   +19     
=======================================
+ Hits        29553    29570   +17     
- Misses       4755     4757    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andybalaam
Copy link
Member Author

This lines that codecov is complaining about missing coverage are pretty much all lifted-and-shifted from the their old location into decryption_retry_task.rs. This code is tested by existing tests in crates/matrix-sdk-ui/src/timeline/tests/encryption.rs and crates/matrix-sdk-ui/src/timeline/tests/read_receipts.rs. Whilst it would be nice to add unit tests closer to the code, these cover the more real-world cases, and this PR is a detour-on-a-detour for me :-)

@andybalaam
Copy link
Member Author

@poljar is this something like what you intended in https://github.com/matrix-org/matrix-rust-sdk/pull/4644/files#r1963599148 ?


// Spawn the long-running task, providing the receiver so we can listen for
// decryption requests
matrix_sdk::executor::spawn(decryption_task(state, room_data_provider, receiver));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we're still dropping the JoinHandle though the task will get killed because the sender gets dropped, which in turn means the receiver will get a None.

That's a bit implicit compared to holding on to the JoinHandle and cancelling in the drop() call of `DecryptionRetryTask, but I guess there's nothing wrong with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried to join in drop() but it's async, so I can't do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you wouldn't await the join in the drop() call, you would just abort() the task. But perhaps it's cleaner to let the task naturally die due to the Sender being gone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the join handle and manually aborting shows a clear ownership of the task, and would be a tiny bit clearer, even in terms of discoverability: we'd see the JoinHandle in the struct, thus know that this is a task's owner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was thinking this as well. But then the task could stop in two ways:

  1. It ends because the receiver receives a None.
  2. It gets cancelled.

And then I would need to think about cancel safety of all the things the task contains. Which made me conclude that it's fine to let the task die of natural causes the way it's done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair; my argument in terms of discoverability still holds IMO, can we store the task in the struct at least?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's probably a wise idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f39a969

@poljar
Copy link
Contributor

poljar commented Feb 26, 2025

@poljar is this something like what you intended in https://github.com/matrix-org/matrix-rust-sdk/pull/4644/files#r1963599148 ?

Yeah, I think that's fine, I explained things a bit more here: #4715 (review)

@poljar poljar requested review from poljar and removed request for bnjbvr February 26, 2025 13:56
Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks for splitting this into many small commits, made the review easier. I left one nit, but I'll approve this in advance.

There's also a typo in one commit:

Adjust tsts that retry decryption to wait with

In any case, this should make it much easier to move the redecryption logic into the correct place, wherever we decide this to be.

Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to change my mind, could you please put the JoinHandle into the DecryptionRetryTask.

I shouldn't be used for anything, just to let people know that the task is owned by this struct.

@andybalaam andybalaam requested a review from poljar February 28, 2025 10:04
Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the last minute change, we can merge this after you make the compiler happy with the right types.

@andybalaam andybalaam force-pushed the andybalaam/decrypt-in-owned-task branch from ec3bc04 to ddfe69e Compare February 28, 2025 11:20
@andybalaam andybalaam enabled auto-merge (rebase) February 28, 2025 11:27
@andybalaam andybalaam disabled auto-merge February 28, 2025 11:32
@andybalaam andybalaam force-pushed the andybalaam/decrypt-in-owned-task branch from ddfe69e to 9cb0630 Compare February 28, 2025 11:33
@andybalaam andybalaam enabled auto-merge (rebase) February 28, 2025 11:33
@andybalaam andybalaam merged commit 8cd7085 into main Feb 28, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants