-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diff calculation performance improvement #5478
Conversation
CodSpeed Performance ReportMerging #5478 will not alter performanceComparing Summary
|
@@ -61,7 +61,6 @@ async def get( | |||
include_empty: bool = False, | |||
) -> list[EnrichedDiffRoot]: | |||
final_max_depth = config.SETTINGS.database.max_depth_search_hierarchy | |||
final_limit = limit or config.SETTINGS.database.query_size_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will prevent updating diffs with over 5000 nodes correctly
WITH nfs[0] AS uuid, nfs[1] AS field_name | ||
WITH collect(DISTINCT uuid) as uuids, collect(DISTINCT field_name) AS field_names | ||
RETURN uuids AS node_ids_list, field_names AS field_names_list | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when calculating a diff on the base branch (you can think of it as main
b/c it almost always is) we need to tell the query which nodes and attributes/relationships it should process b/c the main
diff only needs to look at nodes and attributes/relationships that are included in the branch diff
in the case of a big diff (think thousands of nodes), this list of "node_field_specifiers", can be over 100,000 elements. each element is just a UUID and a name, but this is still quite a long list
the previous filtering logic in this query, checked if the UUID of a Node
node and the name/identifier of an Attribute
or Relationship
node was in this potentially huge list of node field specifiers. it could take a very long time to check every potential node against this big list
this change uses two smaller lists of unique node UUIDs and unique attribute names/relationship identifiers as a pre-check that can be executed much more quickly than the check against the node field specifier list
IFC-1129
updates the
DiffAllPathsQuery
to speed up calculating an incremental diff on the base branch (main
) portion of the diff. my tests for calculating/updating very large diffs indicate that this is a bottleneck that slows down updating a diff even when there are very few or no changes on themain
branch.my local testing indicates that this change speeds up calculating a diff on
main
when there are no/very few changes onmain
by a factor of 4, which is pretty good, but I think there is still more that we can do