Systematic approach to clear/set side metadata #1256

wks · 2025-01-08T10:22:32Z

TL;DR: No scattering side metadata clearing/setting operations everywhere! Don't blindly insert those operations before GC / before nursery GC / after GC / when allocating / etc., and per-space / per-chunk / per-block / per-object / etc. Do it systematically. Make a framework for it.

This issue is only about side metadata. Because free spaces are always zeroed, in-header metadata are always 0 upon allocation/copying of an object, and we don't need to worry about bulk-clearing. But we may start worrying about cyclic mark bits if we start to support in-header mark bits for ImmixSpace. (No. We still don't support in-header mark bits now!)

Problem

Take ImmixSpace as an example.

In PrepareBlockState (only in major GCs), it
- bulk clears mark bits (if on the side)
- bulk clears unlog bits (if on the side)
In SweepChunk, it
- iterate through every block and
  - clears the forwarding bits (if on the side, and if the block is a defrag source)
  - iterate through every line in the block (or just every block if BLOCK_ONLY) and
    - bulk clear VO bits (if the line is free)
    - (should) bulk clear unlog bits (if the line is free and if unlog bits are on the side) See Only reset unlog bits of objects in modbuf for nursery GCs #1169
    - (should) bulk clear pin bits (if the line is free and if the pin bits are on the side) See Stale forwarding bits in nursery GC in StickyImmix #1118 (comment)

Some metadata have undergone several refactoring. The forwarding bits, for example, was

cleared in prepare for every block in the beginning of every major GC when the forwarding bits were first allowed to be on the side (Allow forwarding bits to be on the side for Immix #753)
then "optimized" to clear only forwarding bits of defrag sources (Fix stale forwarding bits bug #777)
then the "optimization" was reverted because minor GCs can move objects, too. (Clear all side forwarding bits #974)
then changed to be cleared after GC because young objects may be allocated over free spaces with stale forwarding bits, and we don't clear forwarding bits before minor GCs. (Clear side forwarding bits properly #1138)

This example just shows how difficult it is to get one metadata right. The ImmixSpace has (1) local mark bits, (2) local forwarding bits, (3) local pinning bits (optional), (4) global unlog bits (conditionally needed), and (5) global VO bits (optional). Properly taking care of all of those metadata bits take a lot of effort if we do them one by one.

Characteristics of metadata

Each kind of metadata has several properties that dictate their implementation.

Observers: Which thread observes the metadata? Mutator? Or GC worker?
Temporal Scope: During which time is the metadata meaningful?
- Persists Across GC: Is it only meaningful in one GC, or across different GCs?
- Nursery/Mature GC: Is it only meaningful for nursery GC, mature GC, or both?
- Copying/Non-copying GC: Is it only meaningful for copying GC, non-copying GC, or both?
Spatial Scope: In which part of the space is the metadata meaningful?
- Nursery/mature space: Is it only meaningful for the nursery, the mature space, or both?
- From space: Is it only meaningful for the from space / defrag sources?
Stale bits: Is it OK to leave stale bits from live objects when they die? (Here we only consider free spaces, and disregard new objects allocated over them.)

And those properties may change for for different plans and different spaces.

StickyImmix

StickyImmix only has an ImmixSpace, and both young and mature objects are allocated into it. But each line either contains only young objects or only old objects, but not a mixture of them. The constraints can be summarized in the following table.

Metadata	mark bits	forwarding bits	pinning bits	unlog bits	VO bits
When are they set?	during GC	during copying GC	by mutator	promoted/survived	obj alloc
Observer	GC	GC	GC	mutator	both
Persists across GC?	to next minor	no	yes	yes	yes
When must it be observed as clean?	before major GC	before copying GC	obj alloc	obj alloc	during mutator time
Where must it be observed as clean?	whole space	from-space	free space	free space	where there's no object
Are stale bits OK to mutators?	yes	yes	yes	yes	no

Let's look at those metadata one by one.

The mark bits are used as sticky bits to distinguish between young and old objects, and must persist from one GC to the next GC, unless the next GC is a major GC. And we don't know if the next GC is a major GC. Therefore, we must clear mark bits in the beginning of a major GC and there is no other choice.
The forwarding bits are used to synchronize between GC workers for copying. It is only needed for the from-space (i.e. where objects will be moved out. The problem is, the definition of "from-space" varies between major/minor GC. For StickyImmix, (1) during minor GCs, the nursery is the from-space, and (2) during major GCs, defrag sources are the from-space. So we need to ensure the forwarding bits are clean for at those times in those places.
The pinning bits are a per-object property. They are set by the mutator, and remain valid until the object dies. We only need to ensure that newly allocated objects have no pinning bits set.
The unlog bits are per-object, too. They are set when the object is promoted, and are refreshed when an object survives a GC, and they remain valid until and object dies. We just need to ensure newly allocated objects have no unlog bits set.
The VO bits are per-object, too. It has a unique property that it must be precise w.r.t. the live and death of objects. Not only do we need to ensure that allocatable lines do not contain stale VO bits, but we must clear VO bits for partially-occupied un-allocatable lines, too.

Given those constraints,

We must clear mark bits in the beginning of a major GC, and there is no other choice.
For forwarding bits, pinning bits and unlog bits, there are quite some flexibility about when and where to clear them.
- We can clear forwarding bits in the beginning or the end of a GC, for the entire space. But doing so may be costly (measurement needed). In the current code base, we clear the forwarding bits in the end of a GC, for the from-space (defrag source for mature GC, or the entire space for nursery GC). This may be subject to optimization because clearing forwarding bits after every nursery GC for the entire space is costly. Currently we recommend using in-header forwarding bits, but the poor CRuby binding doesn't have spare bits in the header.
- For pinning bits and unlog bits,
  - We just need to ensure they are 0 when an object is allocated. The possible time to clear the bits include (1) when a line is swept and becomes free, (2) when allocating a new object. We currently do the former.
  - But stale bits are OK when new objects cannot be allocated. So if a line is partially occupied, we don't need to clear stale bits in the line.
VO bits are tricky. We may reconstruct the VO bits during tracing, or copy over the on-the-side mark bits. The latter is faster. See Fix VO bits for Immix #849 for more details.

GenImmix

GenImmix is also generational, but we don't use mark bits as sticky bits. Young objects are never allocated into the ImmixSpace. And because GenImmix doesn't support pinning, the pin bits are useless for GenImmix.

For the CopySpace:

| Metadata | forwarding bits | VO bits |
|---|---|---|---|
| When are they set? | during copying GC | obj alloc |
| Observer | GC | both |
| Persists across GC? | no | yes |
| When must it be observed as clean? | before every GC |during mutator time |
| Where must it be observed as clean? | whole space |where there's no object |
| Are stale bits OK to mutators? | yes | no |

The CopySpace only has the forwarding bits and the VO bits (optional). The forwarding bits can be cleared at the end of a GC or at the beginning of a GC. It doesn't matter. In the current code base, we do bulk clearing at the end of a GC (if on the side).

For the ImmixSpace:

Metadata	mark bits	forwarding bits	~~pinning bits~~	unlog bits	VO bits
When are they set?	during GC	during copying GC	/	allocated	obj alloc
Observer	GC	GC	/	mutator	both
Persists across GC?	no	no	/	yes	yes
When must it be observed as clean?	before major GC	before copying GC	/	never	during mutator time
Where must it be observed as clean?	whole space	from-space	/	nowhere	where there's no object
Are stale bits OK to mutators?	yes	yes	/	yes	no

It's mostly the same as StickyImmix, except that:

The mark bits are no longer used as the sticky bits, so it no longer persists through GC. We may clear the mark bits in the beginning or the end of a major GC. Currently we do it in the beginning so that it works for StickyImmix, too.
Because the ImmixSpace only holds mature objects, all objects in the ImmixSpace have unlog bits set when they are copied into the ImmixSpace. We currently set unlog bits when forwarding. Also because we never allocate young objects into the ImmixSpace, we never need to clear stale unlog bits.

Solution 1: Declarative approach

Programmers provide properties

We may declare the properties of each metadata. For example, in StickyImmix,

The mark bits are (1) persistent from one GC to the next nursery GC, and (2) must be observed as clean before a major GC, and (3) set in every GC. After some computation, it will figure out that the only possible place to clear the metadata is in the beginning of a GC, and will do it in prepare for the entire space.
The forwarding bits are (1) per-GC, and not persistent, and (2) must be observed as clean before copying GC in from-space, and (3) set during copying GC. The algorithm will find two possible times: (1) before nursery and defrag GC, or (2) after copying GC. And it may pick one solution according to some configuration, such as preferring the end of a GC over the beginning of a GC.
The unlog bits are (1) persistent, and (2) must be observed as clean for new objects allocated in the ImmixSpace, and (3) set when promoted, when tracing, or restored in ProcModBuf. It may find several solutions: (1) bulk clearing in the end of GC for free lines, (2) clearing when allocating object.

But this may require some kind of constraint solvers, which may be too general, given that we only have 5 kinds of metadata to deal with for StickyImmix.

Programmers decide a time and range

A simpler but still declarative approach is simply letting the programmer specify a time each metadata is cleared. Possible times can be:

In the beginning of a GC (when preparing)
In the end of a GC (when releasing)

and possible ranges can be

The whole space
From space (interpreted as defrag source in major GC, or the whole space in nursery GC for StickyImmix)
Nursery
Mature space
Free lines (only when releasing)
Free blocks (only when releasing)

The framework will insert hooks and do something like:

fn prepare_chunk() {
    for block in chunk {
        for (spec, clearing_policy) in space.metadata().zipWith(plan.clearing_policies()) {
            if meta.is_on_side() && clearing_policy.clear_in_prepare() {
                if clearing_policy.should_clear_the_block(block.is_nursery(), block.is_from_space()) {
                    spec.bzero_metadata(block.start(), Block::SIZE);
                } else if clearing_policy.clear_at_line_granularity() {
                    for line in block.lines() {
                        if clearing_policy.should_clear_the_line(line.is_nursery(), line.is_from_space()) {
                            spec.bzero_metadata(line.start(), Line::SIZE);
}}}}}}}

In practice, due to the cost of iterating through all side metadata specs, we may reorder the above loops to skip unneeded metadata.

Solution 2: Aspect-oriented programming (AOP)

This approach simply provides a trait that include callbacks, for example,

fn on_prepare_block(block: Block, is_defrag_source: bool, is_nursery_gc: bool, is_copying_gc: bool)
fn on_release_block(block: Block, is_defrag_source: bool, is_nursery_gc: bool, is_copying_gc: bool)
fn on_release_line(line: Line, is_free: bool, is_nursery_gc: bool, is_copying_gc: bool)
...

And the programmer implements this trait for each Space. Inside each function, it will clear the metadata needed to clear For example,

fn on_release_line(line: Line, is_free: bool, ...) {
    #[cfg(feature="pinning")]
    if pinning_bits.is_on_side() && is_free {
        pinning_bits.bulk_zero_metadata(line);
    }
    if is_sticky_immix && unlog_bits.is_on_side() && is_free {
         unlog_bits.bulk_zero_metadata(line);
    }
    #[cfg(feature="vo_bit")]
    if is_free {
        vo_bits.bulk_zero_metadata(line);
    } else {
        vo_bits.copy_from(mark_bits, line);
    }
}

This simply moves the code into one place, but doesn't change the fact that those code are hand-written. Given that we only have 5 different metadata, this not-so-intelligent approach may still be the most practical one for now.

The text was updated successfully, but these errors were encountered:

k-sareen · 2025-01-08T10:35:40Z

I think the appropriate API should be:

fn initialize_metadata_for_region(start: Address, size: usize);

(or range instead of region if you prefer that).

This function would be called whenever we go into the slow-path to allocate a block, TLAB, or line. And that should be it I think. Then we should set all the relevant metadata either when the object is allocated or during tracing time.

VO-bits might have to be handled separately in this however.

wks · 2025-01-08T11:30:28Z

I think the appropriate API should be:
fn initialize_metadata_for_region(start: Address, size: usize);
(or range instead of region if you prefer that).

This function would be called whenever we go into the slow-path to allocate a block, TLAB, or line. And that should be it I think. Then we should set all the relevant metadata either when the object is allocated or during tracing time.

This is only suitable for unlog bits and pinning bits. The common part between them is that they only need to be 0 for newly allocated objects, and stale bits are benign.

It doesn't work, for example, for the mark bits because (1) free regions are certainly not marked, and (2) occupied regions still needs to be cleared before tracing (before full-heap tracing for StickyImmix).

k-sareen · 2025-01-08T11:34:40Z

stale bits are benign.

Stale unlog bits are definitely not benign as per the StickyImmix bug.

It doesn't work, for example, for the mark bits because (1) free regions are certainly not marked, and (2) occupied regions still needs to be cleared before tracing (before full-heap tracing for StickyImmix).

I don't get the first point, but yes I agree with the second one.

wks · 2025-01-08T11:35:50Z

I don't get the first point, but yes I agree with the second one.

If a region contains mark bits, it must contain live objects, and we can't allocate new objects into it.

k-sareen · 2025-01-08T11:40:56Z

I still a bit confused. I'm not sure why is that relevant. Do you mean if you don't reset the mark bits for StickyImmix then you will not know which regions are empty? That is subsumed by the second point, no?

wks · 2025-01-08T13:27:55Z

I still a bit confused. I'm not sure why is that relevant. Do you mean if you don't reset the mark bits for StickyImmix then you will not know which regions are empty? That is subsumed by the second point, no?

My point is, the time "whenever we go into the slow-path to allocate a block, TLAB, or line" is not the right time to reset the mark bits. Mark bits must be set at the beginning of a major GC in StickyImmix, and may be at the beginning or the end of a major GC for GenImmix. The API fn initialize_metadata_for_region(start: Address, size: usize); is not general enough to cover all kinds of metadata.

k-sareen · 2025-01-09T00:31:02Z

Right. But we already have the MarkState module which abstracts that away. In some ways I feel like if we try to do everything in one API then it'll be just as confusing/annoying to implement. What I was suggesting was to ensure that stale bits are never possible. Not a one-API-to-rule-them-all.

wks · 2025-01-09T02:14:36Z

Right. But we already have the MarkState module which abstracts that away. In some ways I feel like if we try to do everything in one API then it'll be just as confusing/annoying to implement. What I was suggesting was to ensure that stale bits are never possible. Not a one-API-to-rule-them-all.

MarkState provides methods for doing operations, but the exact timing still depends on the concrete plan. For example,

// TODO: Currently only ImmortalSpace uses this struct. Any policy that needs mark bit can use this (immix, mark compact, mark sweep).
// We should do some refactoring for other policies as well.
pub struct MarkState { ... }

impl MarkState
    /// This has to be called when a space resets its memory regions. This can be either called before the GC tracing, or
    /// after a GC tracing (eagerly). This method will reset the mark bit. The policy should not use the mark bit before
    /// doing another tracing.
    pub fn on_block_reset<VM: VMBinding>(&self, start: Address, size: usize) {
        if let crate::util::metadata::MetadataSpec::OnSide(side) =
            *VM::VMObjectModel::LOCAL_MARK_BIT_SPEC
        {
            side.bzero_metadata(start, size);
        }
    }

The comment on on_block_reset says it can be called either before or after GC tracing. As the TODO comment says, it is currently used by ImmortalSpace, and it is always OK for the ImmortalSpace regardless whether we call it before or after GC tracing. But for StickyImmix, we have no choice. It must be called before tracing. If we call it after tracing, it will clear the sticky mark bits, making it impossible to tell old objects from young ones in the next nursery GC.

None the less, on_block_reset, along with on_object_metadata_initialization and on_global_release, provide a good guideline for its user to call the right methods at the right time, although the "right time" is still plan-specific. I think the methods in MarkState and the "framework" I mentioned in this issue are complementary to each other.

k-sareen · 2025-01-09T03:48:46Z

Idk. I feel like having a generic API for all possible metadata will be too complex given the extremely different semantics between all of them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Systematic approach to clear/set side metadata #1256

Systematic approach to clear/set side metadata #1256

wks commented Jan 8, 2025

k-sareen commented Jan 8, 2025 •

edited

Loading

wks commented Jan 8, 2025

k-sareen commented Jan 8, 2025

wks commented Jan 8, 2025

k-sareen commented Jan 8, 2025

wks commented Jan 8, 2025

k-sareen commented Jan 9, 2025 •

edited

Loading

wks commented Jan 9, 2025

k-sareen commented Jan 9, 2025

Systematic approach to clear/set side metadata #1256

Systematic approach to clear/set side metadata #1256

Comments

wks commented Jan 8, 2025

Problem

Characteristics of metadata

StickyImmix

GenImmix

Solution 1: Declarative approach

Programmers provide properties

Programmers decide a time and range

Solution 2: Aspect-oriented programming (AOP)

k-sareen commented Jan 8, 2025 • edited Loading

wks commented Jan 8, 2025

k-sareen commented Jan 8, 2025

wks commented Jan 8, 2025

k-sareen commented Jan 8, 2025

wks commented Jan 8, 2025

k-sareen commented Jan 9, 2025 • edited Loading

wks commented Jan 9, 2025

k-sareen commented Jan 9, 2025

k-sareen commented Jan 8, 2025 •

edited

Loading

k-sareen commented Jan 9, 2025 •

edited

Loading