diff calculation performance improvement #5478

ajtmccarty · 2025-01-15T22:26:20Z

IFC-1129

updates the DiffAllPathsQuery to speed up calculating an incremental diff on the base branch (main) portion of the diff. my tests for calculating/updating very large diffs indicate that this is a bottleneck that slows down updating a diff even when there are very few or no changes on the main branch.

my local testing indicates that this change speeds up calculating a diff on main when there are no/very few changes on main by a factor of 4, which is pretty good, but I think there is still more that we can do

codspeed-hq · 2025-01-15T22:32:38Z

CodSpeed Performance Report

Merging #5478 will not alter performance

_{Comparing ajtm-01152025-diff-perf (8344849) with stable (9ec036d)}

Summary

✅ 10 untouched benchmarks

ajtmccarty · 2025-01-16T00:34:25Z

backend/infrahub/core/diff/repository/repository.py

@@ -61,7 +61,6 @@ async def get(
        include_empty: bool = False,
    ) -> list[EnrichedDiffRoot]:
        final_max_depth = config.SETTINGS.database.max_depth_search_hierarchy
-        final_limit = limit or config.SETTINGS.database.query_size_limit


this will prevent updating diffs with over 5000 nodes correctly

ajtmccarty · 2025-01-16T00:40:31Z

backend/infrahub/core/query/diff.py

+    WITH nfs[0] AS uuid, nfs[1] AS field_name
+    WITH collect(DISTINCT uuid) as uuids, collect(DISTINCT field_name) AS field_names
+    RETURN uuids AS node_ids_list, field_names AS field_names_list
+}


when calculating a diff on the base branch (you can think of it as main b/c it almost always is) we need to tell the query which nodes and attributes/relationships it should process b/c the main diff only needs to look at nodes and attributes/relationships that are included in the branch diff
in the case of a big diff (think thousands of nodes), this list of "node_field_specifiers", can be over 100,000 elements. each element is just a UUID and a name, but this is still quite a long list

the previous filtering logic in this query, checked if the UUID of a Node node and the name/identifier of an Attribute or Relationship node was in this potentially huge list of node field specifiers. it could take a very long time to check every potential node against this big list

this change uses two smaller lists of unique node UUIDs and unique attribute names/relationship identifiers as a pre-check that can be executed much more quickly than the check against the node field specifier list

ajtmccarty added 2 commits January 15, 2025 13:19

separate node specifiers

ec78c6b

remove limit on diff get query

8344849

github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Jan 15, 2025

ajtmccarty marked this pull request as ready for review January 16, 2025 00:44

ajtmccarty requested a review from a team January 16, 2025 00:44

dgarros approved these changes Jan 16, 2025

View reviewed changes

ajtmccarty commented Jan 16, 2025

View reviewed changes

ajtmccarty merged commit 7cf1310 into stable Jan 16, 2025
32 checks passed

ajtmccarty deleted the ajtm-01152025-diff-perf branch January 16, 2025 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diff calculation performance improvement #5478

diff calculation performance improvement #5478

ajtmccarty commented Jan 15, 2025 •

edited

Loading

codspeed-hq bot commented Jan 15, 2025

ajtmccarty Jan 16, 2025

ajtmccarty Jan 16, 2025

diff calculation performance improvement #5478

diff calculation performance improvement #5478

Conversation

ajtmccarty commented Jan 15, 2025 • edited Loading

codspeed-hq bot commented Jan 15, 2025

CodSpeed Performance Report

Merging #5478 will not alter performance

Summary

ajtmccarty Jan 16, 2025

Choose a reason for hiding this comment

ajtmccarty Jan 16, 2025

Choose a reason for hiding this comment

ajtmccarty commented Jan 15, 2025 •

edited

Loading