skip expensive diff operations when possible #5342

ajtmccarty · 2024-12-31T15:21:35Z

IFC-1004

the goal of this PR is to skip expensive diff operations (loading, calculating, saving, and combining) where possible. These operations are not too expensive when small, but diffs can be very big and the operations become slower as the size of the diff increases.

the previous approach for calculating an incremental diff (meaning at least one diff covering some of the requested time period already exists) was to load the existing diff, calculate the missing piece, add them together, repeat for all the missing time ranges, then save the whole updated diff. Sometimes this is still required, but we now try very hard to make sure that any of these expensive operations are absolutely necessary before running them.

this PR introduces the the EnrichedDiffMetadata class, which is what is sounds like: all of the data about a diff without the actual data, so from_time, to_time, branches, tracking ID, uuid, but no nodes. we try to use this class for as long as possible and only hydrate the EnrichedDiffMetadata instance if we really need to add to calculated diffs together.

new checks that we perform to avoid hydrating a diff:

quick query to check if there are any changes on either branch involved in the diff in the time frame. if none, then we can just adjust the time frame of the aggregated diff to include this time frame with no changes
if we run the diff calculation for a time frame and it returns no changes, we can do basically the same thing

this PR got bigger than I would have liked b/c DiffCoordinator assumed that the EnrichedDiffs instances always had full data and needed a lot of changes to support the new "only get the diff data if you really need it" logic

codspeed-hq · 2024-12-31T15:33:57Z

CodSpeed Performance Report

Merging #5342 will improve performances by 35.91%

_{Comparing ajtm-12312024-skip-diff-no-changes (e10dc77) with stable (929df94)}

Summary

⚡ 1 improvements
✅ 9 untouched benchmarks

Benchmarks breakdown

	Benchmark	`stable`	`ajtm-12312024-skip-diff-no-changes`	Change
⚡	`test_schemabranch_duplicate`	443.6 µs	326.4 µs	+35.91%

ajtmccarty

plan to go through and add some more comments/doc strings b/c some of these changes are unfortunately, a little hard to understand at first glance

ajtmccarty · 2025-01-03T02:44:22Z

backend/infrahub/core/branch/tasks.py

@@ -70,7 +70,7 @@ async def rebase_branch(branch: str) -> None:
            service=service,
        )
        diff_repository = await component_registry.get_component(DiffRepository, db=db, branch=obj)
-        enriched_diff = await diff_coordinator.update_branch_diff(base_branch=base_branch, diff_branch=obj)
+        enriched_diff = await diff_coordinator.update_branch_diff_and_return(base_branch=base_branch, diff_branch=obj)


made this change in a number of places b/c there are now two version of this method

update_branch_diff might return EnrichedDiffsMetadata (no actual diff data) instead of EnrichedDiffs

update_branch_diff_and_return will always return EnrichedDiffs

ajtmccarty · 2025-01-03T02:45:05Z

backend/infrahub/core/diff/combiner.py

@@ -320,6 +320,7 @@ def _combine_relationships(
                combined_relationship = EnrichedDiffRelationship(
                    name=later_relationship.name,
                    label=later_relationship.label,
+                    identifier=later_relationship.identifier,


added the identifier of a relationship to the diff b/c it is easier and cleaner than always getting it out of the SchemaBranch

ajtmccarty · 2025-01-03T02:46:06Z

backend/infrahub/core/diff/coordinator.py

this is where most of the changes are. needed a lot of refactoring to support EnrichedDiffs and EnrichedDiffsMetadata

ajtmccarty · 2025-01-03T02:47:22Z

backend/infrahub/core/diff/coordinator.py

            )
+            if not isinstance(enriched_diffs, EnrichedDiffs):


this conditional shows up a lot. it is to check if the enriched_diffs variable is an instance of EnrichedDiffsMetadata or EnrichedDiffs in a manner that mypy accepts. EnrichedDiffs inherits from EnrichedDiffsMetadata

ajtmccarty · 2025-01-03T02:48:51Z

backend/infrahub/core/diff/coordinator.py

-            for relationship in node.relationships:
-                relationship_schema = node_schema.get_relationship(name=relationship.name)
-                specifiers.add(NodeFieldSpecifier(node_uuid=node.uuid, field_name=relationship_schema.get_identifier()))
-        return specifiers


replaced with a simple database query

ajtmccarty · 2025-01-03T02:50:37Z

backend/infrahub/core/diff/data_check_synchronizer.py

+                )
+
+            data_conflicts = await self._get_data_conflicts(enriched_diff=enriched_diff)
+            enriched_conflicts_map = self._get_enriched_conflicts_map(enriched_diff=enriched_diff)


changes to handle the case when synchronize receives an EnrichedDiffsMetadata (which does not include conflicts data)
the has_validator conditional is used to determine if this is a new ProposedChange that needs to have its conflict data set for the first time

ajtmccarty · 2025-01-03T02:53:26Z

backend/infrahub/core/diff/query/field_specifiers.py

new query to get the node_uuid-attribute_name/relationship_identifier tuples that the diff calculation query uses to identify nodes to include
hopefully can just be incorporated into the diff calculation query so that we can skip the step of getting it into memory completely

dgarros

Looks good, might be good to have someone from the CS team do some additional tests before merging it

ajtmccarty · 2025-01-07T22:01:33Z

converting to a draft to fix an issue that can crash that database on a very large diff

github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Dec 31, 2024

ajtmccarty mentioned this pull request Dec 31, 2024

skip diff update if no changes #5204

Closed

ajtmccarty changed the title ~~WIP skip diff calculations when possible~~ skip expensive diff operations when possible Jan 3, 2025

ajtmccarty commented Jan 3, 2025

View reviewed changes

ajtmccarty marked this pull request as ready for review January 3, 2025 02:54

ajtmccarty requested a review from a team January 3, 2025 02:54

dgarros approved these changes Jan 3, 2025

View reviewed changes

ajtmccarty force-pushed the ajtm-12312024-skip-diff-no-changes branch 2 times, most recently from 274883c to def1b18 Compare January 6, 2025 14:33

ajtmccarty marked this pull request as draft January 7, 2025 22:01

ajtmccarty added 16 commits January 8, 2025 14:10

skip diff update if no changes

deef4cb

missing unit test update

2030ea6

refactor to improve performance if no changes

e8d9e3a

typos

c74c4bc

pylint

6918255

add relationship identifier to saved diff

8f13125

unit test update

4065135

refactor coordinator to allow skipping diff loading and calculation

b76e3c7

deal with CoreDataCheck syncing

bd7e561

update some method and variable names

3745798

unit tests, little more refactoring

75176d2

some more comments and naming cleanup

8a7d839

add changelog

84f4904

add moved file

95a4b32

fix mock function call name

6c8557f

add more diff-related logging, get node field specifiers in batches

ffa586b

ajtmccarty force-pushed the ajtm-12312024-skip-diff-no-changes branch from def1b18 to ffa586b Compare January 8, 2025 22:10

add missing await in unit tests

92f4bc1

formatting

e10dc77

ajtmccarty marked this pull request as ready for review January 9, 2025 01:20

dgarros merged commit a120fb8 into stable Jan 9, 2025
34 checks passed

dgarros deleted the ajtm-12312024-skip-diff-no-changes branch January 9, 2025 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip expensive diff operations when possible #5342

skip expensive diff operations when possible #5342

ajtmccarty commented Dec 31, 2024 •

edited

Loading

codspeed-hq bot commented Dec 31, 2024 •

edited

Loading

ajtmccarty left a comment

ajtmccarty Jan 3, 2025

ajtmccarty Jan 3, 2025

ajtmccarty Jan 3, 2025

ajtmccarty Jan 3, 2025

ajtmccarty Jan 3, 2025

ajtmccarty Jan 3, 2025

ajtmccarty Jan 3, 2025

dgarros left a comment

ajtmccarty commented Jan 7, 2025

skip expensive diff operations when possible #5342

skip expensive diff operations when possible #5342

Conversation

ajtmccarty commented Dec 31, 2024 • edited Loading

codspeed-hq bot commented Dec 31, 2024 • edited Loading

Merging #5342 will improve performances by 35.91%

Summary

Benchmarks breakdown

ajtmccarty left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgarros left a comment

Choose a reason for hiding this comment

ajtmccarty commented Jan 7, 2025

ajtmccarty commented Dec 31, 2024 •

edited

Loading

codspeed-hq bot commented Dec 31, 2024 •

edited

Loading