Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(blockifier): add cairo native cache rate metric #4393

Merged
merged 1 commit into from
Mar 2, 2025

Conversation

AvivYossef-starkware
Copy link
Contributor

No description provided.

@reviewable-StarkWare
Copy link

This change is Reviewable

Copy link
Contributor Author

AvivYossef-starkware commented Feb 23, 2025

@AvivYossef-starkware AvivYossef-starkware marked this pull request as ready for review February 23, 2025 15:21
@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/add_cairo_native_cache_rate_metric branch from 474752c to 6e1d0dc Compare February 23, 2025 15:22
@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/refactor_gauge_register branch from 1e9c1d7 to 67bf4e8 Compare February 23, 2025 15:44
@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/add_cairo_native_cache_rate_metric branch from 6e1d0dc to cb72896 Compare February 23, 2025 15:44
@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/add_cairo_native_cache_rate_metric branch from cb72896 to 8cd464e Compare February 24, 2025 08:12
@AvivYossef-starkware AvivYossef-starkware changed the base branch from aviv/refactor_gauge_register to main February 24, 2025 08:12
@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/add_cairo_native_cache_rate_metric branch 3 times, most recently from f8ad409 to 5f1026c Compare February 27, 2025 08:50
@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 82 at r3 (raw file):

    }

    pub fn miss_rate(&self) -> Ratio<usize> {

Is there a reason to use Ratio<usize> and not f64? You convert it to f64 anyway when setting the number in the gauge right?

Suggestion:

    pub fn miss_rate(&self) -> f64 {

@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

    cache_misses: Arc<AtomicUsize>,
    total_cache_calls: Arc<AtomicUsize>,
}

Do we want to also keep track of the number of cache hits getting a native compiled classes?

Code quote:

pub struct CacheMetrics {
    cache_misses: Arc<AtomicUsize>,
    total_cache_calls: Arc<AtomicUsize>,
}

@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

    cache_misses: Arc<AtomicUsize>,
    total_cache_calls: Arc<AtomicUsize>,
}

I think this is slightly cleaner.
This way we just clone the Arc pointer when cloning the contract class manager, and not the struct itself.

None-blocking

Suggestion:

    cache_metrics: Arc<CacheMetrics>,
}

#[derive(Default)]

pub struct CacheMetrics {
    cache_misses: AtomicUsize,
    total_cache_calls: AtomicUsize,
}

@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 138 at r3 (raw file):

            compiler: None,
            cache_metrics: CacheMetrics::new(),
        }

How about adding a private NativeClassManager::new() method for handling all the different possibilities for creating this struct in one place?

Non-blocking

Code quote:

        NativeClassManager {
            cairo_native_run_config,
            cache,
            sender: Some(sender),
            compiler: None,
            cache_metrics: CacheMetrics::new(),
        }

@avi-starkware
Copy link
Collaborator

crates/starknet_batcher/src/block_builder.rs line 354 at r3 (raw file):

    ) -> BlockBuilderResult<(Box<dyn BlockBuilderTrait>, AbortSignalSender)>;

    fn get_contract_class_manager_cache_miss_rate(&self) -> Ratio<usize>;

Suggestion:

    fn get_class_cache_miss_rate(&self) -> Ratio<usize>;

@avi-starkware
Copy link
Collaborator

crates/starknet_batcher/src/metrics.rs line 19 at r3 (raw file):

    CAIRO_NATIVE_CACHE_MISS_RATIO.register();
    CAIRO_NATIVE_CACHE_MISS_RATIO.set(0.0);

Why do you need to set the starting value here and it is not needed when registering other metrics?
Can't you just put an initial_value in the metric definition?

Code quote:

    CAIRO_NATIVE_CACHE_MISS_RATIO.set(0.0);

@avi-starkware
Copy link
Collaborator

crates/starknet_sequencer_metrics/src/metric_definitions.rs line 59 at r3 (raw file):

        MetricCounter { SYNCED_TRANSACTIONS, "batcher_synced_transactions", "Counter of synced transactions", 0 },
        MetricCounter { REVERTED_BLOCKS, "batcher_reverted_blocks", "Counter of reverted blocks", 0 },
        MetricGauge { CAIRO_NATIVE_CACHE_MISS_RATIO, "cairo_native_cache_miss_ratio","The cache miss of cairo native"}

Suggestion:

        MetricCounter { CLASS_CACHE_MISSES, "blockifier_class_cache_misses", "Counter of cache misses in the global class cache"}
        MetricCounter { CLASS_CACHE_CALLS, "blockifier_class_cache_calls", "Counter of total cache calls in the global class cache"}

@avi-starkware
Copy link
Collaborator

crates/starknet_sequencer_metrics/src/metric_definitions.rs line 59 at r3 (raw file):

        MetricCounter { SYNCED_TRANSACTIONS, "batcher_synced_transactions", "Counter of synced transactions", 0 },
        MetricCounter { REVERTED_BLOCKS, "batcher_reverted_blocks", "Counter of reverted blocks", 0 },
        MetricGauge { CAIRO_NATIVE_CACHE_MISS_RATIO, "cairo_native_cache_miss_ratio","The cache miss of cairo native"}

Actually, better make the second counter cache_hits. This way we don't have to update it on a cache miss.

@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

    cache_misses: Arc<AtomicUsize>,
    total_cache_calls: Arc<AtomicUsize>,
}

This way we only have to update a single counter in each case.

Suggestion:

pub struct CacheMetrics {
    cache_misses: Arc<AtomicUsize>,
    cache_hits: Arc<AtomicUsize>,
}

Copy link
Collaborator

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment:
I think it is better to have two counter metrics:

  • cache_hits
  • cache_misses

No need to calculate the ratio - we can do the calculation in grafana
And, cache_hits is better than total_cache_calls since we only have to update it when we get a hit and not upon a miss.

Reviewed 5 of 11 files at r1, 6 of 6 files at r3, all commit messages.
Reviewable status: 11 of 12 files reviewed, 7 unresolved discussions (waiting on @AvivYossef-starkware and @noaov1)

Copy link
Collaborator

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 11 files at r1.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @AvivYossef-starkware and @noaov1)

@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/add_cairo_native_cache_rate_metric branch 2 times, most recently from f59fcde to cf25f1c Compare February 27, 2025 18:12
Copy link
Contributor Author

@AvivYossef-starkware AvivYossef-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 14 files reviewed, 6 unresolved discussions (waiting on @avi-starkware and @noaov1)


crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

Do we want to also keep track of the number of cache hits getting a native compiled classes?

added


crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

I think this is slightly cleaner.
This way we just clone the Arc pointer when cloning the contract class manager, and not the struct itself.

None-blocking

Liked it


crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

This way we only have to update a single counter in each case.

Thanks


crates/blockifier/src/state/native_class_manager.rs line 82 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

Is there a reason to use Ratio<usize> and not f64? You convert it to f64 anyway when setting the number in the gauge right?

Not relevant anymore


crates/blockifier/src/state/native_class_manager.rs line 138 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

How about adding a private NativeClassManager::new() method for handling all the different possibilities for creating this struct in one place?

Non-blocking

sounds good, I'll open a separate PR for it


crates/starknet_batcher/src/metrics.rs line 19 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

Why do you need to set the starting value here and it is not needed when registering other metrics?
Can't you just put an initial_value in the metric definition?

It changed last week


crates/starknet_sequencer_metrics/src/metric_definitions.rs line 59 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

Actually, better make the second counter cache_hits. This way we don't have to update it on a cache miss.

Done.


crates/starknet_batcher/src/block_builder.rs line 354 at r3 (raw file):

    ) -> BlockBuilderResult<(Box<dyn BlockBuilderTrait>, AbortSignalSender)>;

    fn get_contract_class_manager_cache_miss_rate(&self) -> Ratio<usize>;

Done

@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 248 at r4 (raw file):

    pub fn take_cache_hit_counter(&self) -> u64 {
        self.cache_metrics.cache_hits.swap(0, Ordering::Relaxed)
    }

Why take and not get?
Won't you get the exact same number in the metric by updating it to the get value instead of incrementing by the take value?

Code quote:

    /// Retrieves the current cache miss counter and resets it to zero.
    pub fn take_cache_miss_counter(&self) -> u64 {
        self.cache_metrics.cache_misses.swap(0, Ordering::Relaxed)
    }

    /// Retrieves the current cache hit counter and resets it to zero.
    pub fn take_cache_hit_counter(&self) -> u64 {
        self.cache_metrics.cache_hits.swap(0, Ordering::Relaxed)
    }

Copy link
Collaborator

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 13 files at r4, all commit messages.
Reviewable status: 9 of 14 files reviewed, 3 unresolved discussions (waiting on @AvivYossef-starkware and @noaov1)


crates/starknet_sequencer_dashboard/src/dashboard_definitions.rs line 66 at r4 (raw file):

    "The ratio of cache misses in the Cairo native cache",
    formatcp!(
        "100 * ({} / ({} + {}))",

What happens when you have division by zero here? (note you will always have division by zero here when cairo_native is off)

Code quote:

        "100 * ({} / ({} + {}))",

crates/starknet_batcher/src/batcher_test.rs line 125 at r4 (raw file):

        let mut block_builder_factory = MockBlockBuilderFactoryTrait::new();
        block_builder_factory.expect_take_class_cache_miss_counter().return_const(0_u64);
        block_builder_factory.expect_take_class_cache_hit_counter().return_const(0_u64);

Why is the return_const now zero?
shouldn't the behavior be the same (the ratio of 1/2 we had before means 1 in both counters now)

I am not sure I fully understand what return_const does...

Code quote:

        block_builder_factory.expect_take_class_cache_miss_counter().return_const(0_u64);
        block_builder_factory.expect_take_class_cache_hit_counter().return_const(0_u64);

Copy link
Collaborator

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 13 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @AvivYossef-starkware and @noaov1)

@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

Previously, AvivYossef-starkware wrote…

added

My question in this comment was not about cache hits in general
I think we want to additionally have the number of cache hits that return a cairo_native contract
So we could keep track of how many of the entry_points are run with cairo_native and how many with casm

Code snippet:

pub struct CacheMetrics {
    cache_misses: Arc<AtomicUsize>,
    cache_hits: Arc<AtomicUsize>,
}

@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/add_cairo_native_cache_rate_metric branch from cf25f1c to 9424c39 Compare March 2, 2025 09:55
Copy link
Contributor Author

@AvivYossef-starkware AvivYossef-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 14 files reviewed, 4 unresolved discussions (waiting on @avi-starkware and @noaov1)


crates/blockifier/src/state/native_class_manager.rs line 63 at r3 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

My question in this comment was not about cache hits in general
I think we want to additionally have the number of cache hits that return a cairo_native contract
So we could keep track of how many of the entry_points are run with cairo_native and how many with casm

This PR aims to add the cache miss rate, which will help us decide the right cache size. It’s also a good idea to include the cairo_native compilation rate. I will do that in a separate PR.


crates/blockifier/src/state/native_class_manager.rs line 248 at r4 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

Why take and not get?
Won't you get the exact same number in the metric by updating it to the get value instead of incrementing by the take value?

MetricCounter has no set attribute, only an increment. This is beneficial because every time we create a new contract manager, it won’t need to track the previous cache hits and misses.


crates/starknet_batcher/src/batcher_test.rs line 125 at r4 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

Why is the return_const now zero?
shouldn't the behavior be the same (the ratio of 1/2 we had before means 1 in both counters now)

I am not sure I fully understand what return_const does...

The return const tells the mock object to return a constant value every time the function is called.
Nothing tests these values, so I could put every possible number here.


crates/starknet_sequencer_dashboard/src/dashboard_definitions.rs line 66 at r4 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

What happens when you have division by zero here? (note you will always have division by zero here when cairo_native is off)

According to chatgpt:

This 0/0 yields NaN, and Grafana will typically display this as a gap or missing datapoint in your graph.

but I prefer using clamp_min and avoid it

@avi-starkware
Copy link
Collaborator

crates/blockifier/src/state/native_class_manager.rs line 248 at r4 (raw file):

Previously, AvivYossef-starkware wrote…

MetricCounter has no set attribute, only an increment. This is beneficial because every time we create a new contract manager, it won’t need to track the previous cache hits and misses.

Does that mean that the metric keeps growing infinitely?
Will it reset to 0 when restarting the batcher?

I think we would like to be able to see how changing the cache size affects the cache miss rate, and that will only be possible if the metrics reset when restarting the batcher

@avi-starkware
Copy link
Collaborator

crates/starknet_batcher/src/batcher_test.rs line 125 at r4 (raw file):

Previously, AvivYossef-starkware wrote…

The return const tells the mock object to return a constant value every time the function is called.
Nothing tests these values, so I could put every possible number here.

Oh okay sounds good

@avi-starkware
Copy link
Collaborator

crates/starknet_batcher/src/metrics.rs line 12 at r5 (raw file):

        MetricCounter { PROPOSAL_STARTED, "batcher_proposal_started", "Counter of proposals started", init = 0 },
        MetricCounter { CAIRO_NATIVE_CACHE_MISSES, "cairo_native_cache_misses", "Counter for cache misses in the Cairo native class manager", init=0 },
        MetricCounter { CAIRO_NATIVE_CACHE_HITS, "cairo_native_cache_hits", "Counter for cache hits in the Cairo native class manager", init=0 },

Suggestion:

        MetricCounter { CLASS_CACHE_MISSES, "class_cache_misses", "Counter of global class cache misses", init=0 },
        MetricCounter { CLASS_CACHE_HITS, "class_cache_hits", "Counter of global class cache hits", init=0 },

Copy link
Collaborator

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @AvivYossef-starkware and @noaov1)

Copy link
Collaborator

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @AvivYossef-starkware and @noaov1)

@AvivYossef-starkware AvivYossef-starkware force-pushed the aviv/add_cairo_native_cache_rate_metric branch from 9424c39 to b7ba41e Compare March 2, 2025 11:19
Copy link
Contributor Author

@AvivYossef-starkware AvivYossef-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 14 files reviewed, 2 unresolved discussions (waiting on @avi-starkware, @Itay-Tsabary-Starkware, and @noaov1)


crates/blockifier/src/state/native_class_manager.rs line 248 at r4 (raw file):

Previously, avi-starkware (Avi Cohen) wrote…

Does that mean that the metric keeps growing infinitely?
Will it reset to 0 when restarting the batcher?

I think we would like to be able to see how changing the cache size affects the cache miss rate, and that will only be possible if the metrics reset when restarting the batcher

I think that the metric will reset to 0 every time that we reset the batcher because the register metric function is called
and the initial value is 0

    pub fn register(&self) {
        counter!(self.name).absolute(self.initial_value);
        describe_counter!(self.name, self.description);
    }

@Itay-Tsabary-Starkware am I right?


crates/starknet_batcher/src/metrics.rs line 12 at r5 (raw file):

        MetricCounter { PROPOSAL_STARTED, "batcher_proposal_started", "Counter of proposals started", init = 0 },
        MetricCounter { CAIRO_NATIVE_CACHE_MISSES, "cairo_native_cache_misses", "Counter for cache misses in the Cairo native class manager", init=0 },
        MetricCounter { CAIRO_NATIVE_CACHE_HITS, "cairo_native_cache_hits", "Counter for cache hits in the Cairo native class manager", init=0 },

Thanks

Copy link
Collaborator

@avi-starkware avi-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Itay-Tsabary-Starkware and @noaov1)

@AvivYossef-starkware AvivYossef-starkware added this pull request to the merge queue Mar 2, 2025
Merged via the queue into main with commit 36fc30d Mar 2, 2025
21 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants