Retry decryption from a long-lived async task, instead of lots of detached ones #4715

andybalaam · 2025-02-25T15:13:40Z

In response to a comment on #4644 - avoid spawning a detached async task every time we have some things to decrypt because some Megolm sessions updated.

Instead, we create one async task and hold it inside a new struct called RetryDecryptionTask. This struct holds one end of channel, and sends retry requests to the async task, which stops when the channel is closed, which will happen when the RetryDecryptionTask struct is dropped.

So we keep the feature that retrying decryption does not block the main processing, but we lose the spawning of lots of async tasks that are not kept track of.

As written, the task is not cancellable, but this could be added if we think we need it.

I strongly suggest reviewing commit by commit, since this is structured as a set of refactorings that hopefully make sense individually.

codecov · 2025-02-25T15:35:42Z

Codecov Report

Attention: Patch coverage is 73.40426% with 25 lines in your changes missing coverage. Please review.

Project coverage is 86.14%. Comparing base (7a0bf9b) to head (9cb0630).
Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
...i/src/timeline/controller/decryption_retry_task.rs	72.41%	24 Missing ⚠️
...rates/matrix-sdk-ui/src/timeline/controller/mod.rs	85.71%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4715   +/-   ##
=======================================
  Coverage   86.14%   86.14%           
=======================================
  Files         291      292    +1     
  Lines       34308    34327   +19     
=======================================
+ Hits        29553    29570   +17     
- Misses       4755     4757    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andybalaam · 2025-02-25T15:48:48Z

This lines that codecov is complaining about missing coverage are pretty much all lifted-and-shifted from the their old location into decryption_retry_task.rs. This code is tested by existing tests in crates/matrix-sdk-ui/src/timeline/tests/encryption.rs and crates/matrix-sdk-ui/src/timeline/tests/read_receipts.rs. Whilst it would be nice to add unit tests closer to the code, these cover the more real-world cases, and this PR is a detour-on-a-detour for me :-)

andybalaam · 2025-02-26T12:17:35Z

@poljar is this something like what you intended in https://github.com/matrix-org/matrix-rust-sdk/pull/4644/files#r1963599148 ?

poljar · 2025-02-26T12:22:04Z

crates/matrix-sdk-ui/src/timeline/controller/decryption_retry_task.rs

+
+        // Spawn the long-running task, providing the receiver so we can listen for
+        // decryption requests
+        matrix_sdk::executor::spawn(decryption_task(state, room_data_provider, receiver));


Hmm, we're still dropping the JoinHandle though the task will get killed because the sender gets dropped, which in turn means the receiver will get a None.

That's a bit implicit compared to holding on to the JoinHandle and cancelling in the drop() call of `DecryptionRetryTask, but I guess there's nothing wrong with it.

Yeah, I tried to join in drop() but it's async, so I can't do it.

Oh, you wouldn't await the join in the drop() call, you would just abort() the task. But perhaps it's cleaner to let the task naturally die due to the Sender being gone.

I think keeping the join handle and manually aborting shows a clear ownership of the task, and would be a tiny bit clearer, even in terms of discoverability: we'd see the JoinHandle in the struct, thus know that this is a task's owner.

At first I was thinking this as well. But then the task could stop in two ways:

It ends because the receiver receives a None.

It gets cancelled.

And then I would need to think about cancel safety of all the things the task contains. Which made me conclude that it's fine to let the task die of natural causes the way it's done.

Fair; my argument in terms of discoverability still holds IMO, can we store the task in the struct at least?

Sure, that's probably a wise idea.

Done in f39a969

poljar · 2025-02-26T12:29:18Z

@poljar is this something like what you intended in https://github.com/matrix-org/matrix-rust-sdk/pull/4644/files#r1963599148 ?

Yeah, I think that's fine, I explained things a bit more here: #4715 (review)

poljar

Nice work, thanks for splitting this into many small commits, made the review easier. I left one nit, but I'll approve this in advance.

There's also a typo in one commit:

Adjust tsts that retry decryption to wait with

In any case, this should make it much easier to move the redecryption logic into the correct place, wherever we decide this to be.

crates/matrix-sdk-ui/src/timeline/tests/read_receipts.rs

poljar

Sorry to change my mind, could you please put the JoinHandle into the DecryptionRetryTask.

I shouldn't be used for anything, just to let people know that the task is owned by this struct.

poljar

Thanks for the last minute change, we can merge this after you make the compiler happy with the right types.

…imeouts

…el, instead of spawning a task directly

…nto the async task

…e use it

…ntroller

andybalaam requested a review from a team as a code owner February 25, 2025 15:13

andybalaam requested review from bnjbvr and removed request for a team February 25, 2025 15:13

This was referenced Feb 25, 2025

fix(crypto): Redecrypt non-UTD messages to remove no-longer-relevant warning shields #4644

Merged

EX: Messages sent after an identity reset can sometimes be flagged as sent from insecure device element-hq/element-meta#2697

Closed

poljar reviewed Feb 26, 2025

View reviewed changes

poljar requested review from poljar and removed request for bnjbvr February 26, 2025 13:56

poljar approved these changes Feb 26, 2025

View reviewed changes

crates/matrix-sdk-ui/src/timeline/tests/read_receipts.rs Show resolved Hide resolved

poljar requested changes Feb 27, 2025

View reviewed changes

andybalaam requested a review from poljar February 28, 2025 10:04

poljar approved these changes Feb 28, 2025

View reviewed changes

andybalaam force-pushed the andybalaam/decrypt-in-owned-task branch from ec3bc04 to ddfe69e Compare February 28, 2025 11:20

andybalaam enabled auto-merge (rebase) February 28, 2025 11:27

andybalaam disabled auto-merge February 28, 2025 11:32

andybalaam added 8 commits February 28, 2025 11:33

refactor(timeline): Move the decryption retrying into a separate struct

6e093f9

refactor(timeline): Move finding retry indices into DecryptionRetryTask

57fed3c

refactor(timeline): Split finding retry indices into its own function

4d17f1f

refactor(timeline): Adjust tests that retry decryption to wait with t…

224c33b

…imeouts

refactor(timeline): Pass requests to retry decryption through a chann…

a7efad2

…el, instead of spawning a task directly

refactor(timeline): Move the code to find which events to redecrypt i…

774314e

…nto the async task

refactor(timeline): Share should_retry logic between the two places w…

32f753e

…e use it

refactor(crypto): Keep a long-lived DecryptionRetryTask in TimelineCo…

9cb0630

…ntroller

andybalaam force-pushed the andybalaam/decrypt-in-owned-task branch from ddfe69e to 9cb0630 Compare February 28, 2025 11:33

andybalaam enabled auto-merge (rebase) February 28, 2025 11:33

andybalaam merged commit 8cd7085 into main Feb 28, 2025
42 checks passed

andybalaam deleted the andybalaam/decrypt-in-owned-task branch February 28, 2025 12:35

richvdh mentioned this pull request Mar 19, 2025

UTDs not retried when key arrives at almost the same time element-hq/element-x-android#4202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry decryption from a long-lived async task, instead of lots of detached ones #4715

Retry decryption from a long-lived async task, instead of lots of detached ones #4715

andybalaam commented Feb 25, 2025 •

edited

Loading

codecov bot commented Feb 25, 2025 •

edited

Loading

andybalaam commented Feb 25, 2025

andybalaam commented Feb 26, 2025

poljar Feb 26, 2025

andybalaam Feb 26, 2025

poljar Feb 26, 2025

bnjbvr Feb 27, 2025

poljar Feb 27, 2025

bnjbvr Feb 27, 2025

poljar Feb 27, 2025

andybalaam Feb 28, 2025

poljar commented Feb 26, 2025

poljar left a comment

poljar left a comment

poljar left a comment

Retry decryption from a long-lived async task, instead of lots of detached ones #4715

Retry decryption from a long-lived async task, instead of lots of detached ones #4715

Conversation

andybalaam commented Feb 25, 2025 • edited Loading

codecov bot commented Feb 25, 2025 • edited Loading

Codecov Report

andybalaam commented Feb 25, 2025

andybalaam commented Feb 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poljar commented Feb 26, 2025

poljar left a comment

Choose a reason for hiding this comment

poljar left a comment

Choose a reason for hiding this comment

poljar left a comment

Choose a reason for hiding this comment

andybalaam commented Feb 25, 2025 •

edited

Loading

codecov bot commented Feb 25, 2025 •

edited

Loading