Performance / memory usage fix: do not allocate memory in pg_tde_slot #265

dutow · 2024-08-26T06:55:55Z

Until now, for some reason we allocated memory for each decrypted tuple in pg_tde_slot, and only freed all that memory when the transaction ended. This caused two issues:

palloc takes time, and the many small pallocs resulted in a significant performance drop in scans
sequential scans could potentially allocate memory to the entire table, requiring way too much memory for large tables

This allocation is most likely a leftover from before we even used slots, and handled selects differently. With slots, we don't need them at all, as slots are not expected to handle multiple tuples at the same time, only the last one.

This commit removes the palloc completely, and instead adds a single BLCKSZ array to the slot structure, which can hold any size of decrypted tuple.

To be safe, it also disables the get tuple function, forcing the core code to use copy instead when needed.

This change results in:

no "memory spike" during sequential scans
~1.55x overhead instead of ~2.2x

(The TODO in slot_copytuple will be addressed in a separate commit)

codeforall · 2024-08-26T15:57:37Z

Overall, the PR looks good. Just one minor comment. I believe we can now get rid of TdeSlotForgetDecryptedTuple function as we do not need to pfree the decrypted_tuple

Until now, for some reason we allocated memory for each decrypted tuple in pg_tde_slot, and only freed all that memory when the transaction ended. This caused two issues: * palloc takes time, and the many small pallocs resulted in a significant performance drop in scans * sequential scans could potentially allocate memory to the entire table, requiring way too much memory for large tables This allocation is most likely a leftover from before we even used slots, and handled selects differently. With slots, we don't need them at all, as slots are not expected to handle multiple tuples at the same time, only the last one. This commit removes the palloc completely, and instead adds a single BLCKSZ array to the slot structure, which can hold any size of decrypted tuple. To be safe, it also disables the get tuple function, forcing the core code to use copy instead when needed. This change results in: 1. no "memory spike" during sequential scans 2. ~1.55x overhead instead of ~2.2x (The TODO in slot_copytuple will be addressed in a separate commit)

dutow requested review from codeforall and dAdAbird August 26, 2024 07:36

dAdAbird approved these changes Aug 26, 2024

View reviewed changes

codeforall approved these changes Aug 27, 2024

View reviewed changes

dutow added 2 commits August 28, 2024 19:51

Fixing review comment: removing more stuff

77b6c0f

dutow force-pushed the perffixagain2 branch from bc52945 to 77b6c0f Compare August 28, 2024 18:51

dutow merged commit 0586660 into percona:main Aug 28, 2024
13 checks passed

dAdAbird mentioned this pull request Sep 27, 2024

PG-1056 xmin and xmax correctness #292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance / memory usage fix: do not allocate memory in pg_tde_slot #265

Performance / memory usage fix: do not allocate memory in pg_tde_slot #265

dutow commented Aug 26, 2024

codeforall commented Aug 26, 2024

Performance / memory usage fix: do not allocate memory in pg_tde_slot #265

Performance / memory usage fix: do not allocate memory in pg_tde_slot #265

Conversation

dutow commented Aug 26, 2024

codeforall commented Aug 26, 2024