Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematic approach to clear/set side metadata #1256

Open
wks opened this issue Jan 8, 2025 · 9 comments
Open

Systematic approach to clear/set side metadata #1256

wks opened this issue Jan 8, 2025 · 9 comments

Comments

@wks
Copy link
Collaborator

wks commented Jan 8, 2025

TL;DR: No scattering side metadata clearing/setting operations everywhere! Don't blindly insert those operations before GC / before nursery GC / after GC / when allocating / etc., and per-space / per-chunk / per-block / per-object / etc. Do it systematically. Make a framework for it.

This issue is only about side metadata. Because free spaces are always zeroed, in-header metadata are always 0 upon allocation/copying of an object, and we don't need to worry about bulk-clearing. But we may start worrying about cyclic mark bits if we start to support in-header mark bits for ImmixSpace. (No. We still don't support in-header mark bits now!)

Problem

Take ImmixSpace as an example.

  • In PrepareBlockState (only in major GCs), it
    • bulk clears mark bits (if on the side)
    • bulk clears unlog bits (if on the side)
  • In SweepChunk, it

Some metadata have undergone several refactoring. The forwarding bits, for example, was

This example just shows how difficult it is to get one metadata right. The ImmixSpace has (1) local mark bits, (2) local forwarding bits, (3) local pinning bits (optional), (4) global unlog bits (conditionally needed), and (5) global VO bits (optional). Properly taking care of all of those metadata bits take a lot of effort if we do them one by one.

Characteristics of metadata

Each kind of metadata has several properties that dictate their implementation.

  • Observers: Which thread observes the metadata? Mutator? Or GC worker?
  • Temporal Scope: During which time is the metadata meaningful?
    • Persists Across GC: Is it only meaningful in one GC, or across different GCs?
    • Nursery/Mature GC: Is it only meaningful for nursery GC, mature GC, or both?
    • Copying/Non-copying GC: Is it only meaningful for copying GC, non-copying GC, or both?
  • Spatial Scope: In which part of the space is the metadata meaningful?
    • Nursery/mature space: Is it only meaningful for the nursery, the mature space, or both?
    • From space: Is it only meaningful for the from space / defrag sources?
  • Stale bits: Is it OK to leave stale bits from live objects when they die? (Here we only consider free spaces, and disregard new objects allocated over them.)

And those properties may change for for different plans and different spaces.

StickyImmix

StickyImmix only has an ImmixSpace, and both young and mature objects are allocated into it. But each line either contains only young objects or only old objects, but not a mixture of them. The constraints can be summarized in the following table.

Metadata mark bits forwarding bits pinning bits unlog bits VO bits
When are they set? during GC during copying GC by mutator promoted/survived obj alloc
Observer GC GC GC mutator both
Persists across GC? to next minor no yes yes yes
When must it be observed as clean? before major GC before copying GC obj alloc obj alloc during mutator time
Where must it be observed as clean? whole space from-space free space free space where there's no object
Are stale bits OK to mutators? yes yes yes yes no

Let's look at those metadata one by one.

  • The mark bits are used as sticky bits to distinguish between young and old objects, and must persist from one GC to the next GC, unless the next GC is a major GC. And we don't know if the next GC is a major GC. Therefore, we must clear mark bits in the beginning of a major GC and there is no other choice.
  • The forwarding bits are used to synchronize between GC workers for copying. It is only needed for the from-space (i.e. where objects will be moved out. The problem is, the definition of "from-space" varies between major/minor GC. For StickyImmix, (1) during minor GCs, the nursery is the from-space, and (2) during major GCs, defrag sources are the from-space. So we need to ensure the forwarding bits are clean for at those times in those places.
  • The pinning bits are a per-object property. They are set by the mutator, and remain valid until the object dies. We only need to ensure that newly allocated objects have no pinning bits set.
  • The unlog bits are per-object, too. They are set when the object is promoted, and are refreshed when an object survives a GC, and they remain valid until and object dies. We just need to ensure newly allocated objects have no unlog bits set.
  • The VO bits are per-object, too. It has a unique property that it must be precise w.r.t. the live and death of objects. Not only do we need to ensure that allocatable lines do not contain stale VO bits, but we must clear VO bits for partially-occupied un-allocatable lines, too.

Given those constraints,

  • We must clear mark bits in the beginning of a major GC, and there is no other choice.
  • For forwarding bits, pinning bits and unlog bits, there are quite some flexibility about when and where to clear them.
    • We can clear forwarding bits in the beginning or the end of a GC, for the entire space. But doing so may be costly (measurement needed). In the current code base, we clear the forwarding bits in the end of a GC, for the from-space (defrag source for mature GC, or the entire space for nursery GC). This may be subject to optimization because clearing forwarding bits after every nursery GC for the entire space is costly. Currently we recommend using in-header forwarding bits, but the poor CRuby binding doesn't have spare bits in the header.
    • For pinning bits and unlog bits,
      • We just need to ensure they are 0 when an object is allocated. The possible time to clear the bits include (1) when a line is swept and becomes free, (2) when allocating a new object. We currently do the former.
      • But stale bits are OK when new objects cannot be allocated. So if a line is partially occupied, we don't need to clear stale bits in the line.
  • VO bits are tricky. We may reconstruct the VO bits during tracing, or copy over the on-the-side mark bits. The latter is faster. See Fix VO bits for Immix #849 for more details.

GenImmix

GenImmix is also generational, but we don't use mark bits as sticky bits. Young objects are never allocated into the ImmixSpace. And because GenImmix doesn't support pinning, the pin bits are useless for GenImmix.

For the CopySpace:

| Metadata | forwarding bits | VO bits |
|---|---|---|---|
| When are they set? | during copying GC | obj alloc |
| Observer | GC | both |
| Persists across GC? | no | yes |
| When must it be observed as clean? | before every GC |during mutator time |
| Where must it be observed as clean? | whole space |where there's no object |
| Are stale bits OK to mutators? | yes | no |

The CopySpace only has the forwarding bits and the VO bits (optional). The forwarding bits can be cleared at the end of a GC or at the beginning of a GC. It doesn't matter. In the current code base, we do bulk clearing at the end of a GC (if on the side).

For the ImmixSpace:

Metadata mark bits forwarding bits pinning bits unlog bits VO bits
When are they set? during GC during copying GC / allocated obj alloc
Observer GC GC / mutator both
Persists across GC? no no / yes yes
When must it be observed as clean? before major GC before copying GC / never during mutator time
Where must it be observed as clean? whole space from-space / nowhere where there's no object
Are stale bits OK to mutators? yes yes / yes no

It's mostly the same as StickyImmix, except that:

  • The mark bits are no longer used as the sticky bits, so it no longer persists through GC. We may clear the mark bits in the beginning or the end of a major GC. Currently we do it in the beginning so that it works for StickyImmix, too.
  • Because the ImmixSpace only holds mature objects, all objects in the ImmixSpace have unlog bits set when they are copied into the ImmixSpace. We currently set unlog bits when forwarding. Also because we never allocate young objects into the ImmixSpace, we never need to clear stale unlog bits.

Solution 1: Declarative approach

Programmers provide properties

We may declare the properties of each metadata. For example, in StickyImmix,

  • The mark bits are (1) persistent from one GC to the next nursery GC, and (2) must be observed as clean before a major GC, and (3) set in every GC. After some computation, it will figure out that the only possible place to clear the metadata is in the beginning of a GC, and will do it in prepare for the entire space.
  • The forwarding bits are (1) per-GC, and not persistent, and (2) must be observed as clean before copying GC in from-space, and (3) set during copying GC. The algorithm will find two possible times: (1) before nursery and defrag GC, or (2) after copying GC. And it may pick one solution according to some configuration, such as preferring the end of a GC over the beginning of a GC.
  • The unlog bits are (1) persistent, and (2) must be observed as clean for new objects allocated in the ImmixSpace, and (3) set when promoted, when tracing, or restored in ProcModBuf. It may find several solutions: (1) bulk clearing in the end of GC for free lines, (2) clearing when allocating object.

But this may require some kind of constraint solvers, which may be too general, given that we only have 5 kinds of metadata to deal with for StickyImmix.

Programmers decide a time and range

A simpler but still declarative approach is simply letting the programmer specify a time each metadata is cleared. Possible times can be:

  • In the beginning of a GC (when preparing)
  • In the end of a GC (when releasing)

and possible ranges can be

  • The whole space
  • From space (interpreted as defrag source in major GC, or the whole space in nursery GC for StickyImmix)
  • Nursery
  • Mature space
  • Free lines (only when releasing)
  • Free blocks (only when releasing)

The framework will insert hooks and do something like:

fn prepare_chunk() {
    for block in chunk {
        for (spec, clearing_policy) in space.metadata().zipWith(plan.clearing_policies()) {
            if meta.is_on_side() && clearing_policy.clear_in_prepare() {
                if clearing_policy.should_clear_the_block(block.is_nursery(), block.is_from_space()) {
                    spec.bzero_metadata(block.start(), Block::SIZE);
                } else if clearing_policy.clear_at_line_granularity() {
                    for line in block.lines() {
                        if clearing_policy.should_clear_the_line(line.is_nursery(), line.is_from_space()) {
                            spec.bzero_metadata(line.start(), Line::SIZE);
}}}}}}}

In practice, due to the cost of iterating through all side metadata specs, we may reorder the above loops to skip unneeded metadata.

Solution 2: Aspect-oriented programming (AOP)

This approach simply provides a trait that include callbacks, for example,

  • fn on_prepare_block(block: Block, is_defrag_source: bool, is_nursery_gc: bool, is_copying_gc: bool)
  • fn on_release_block(block: Block, is_defrag_source: bool, is_nursery_gc: bool, is_copying_gc: bool)
  • fn on_release_line(line: Line, is_free: bool, is_nursery_gc: bool, is_copying_gc: bool)
  • ...

And the programmer implements this trait for each Space. Inside each function, it will clear the metadata needed to clear For example,

fn on_release_line(line: Line, is_free: bool, ...) {
    #[cfg(feature="pinning")]
    if pinning_bits.is_on_side() && is_free {
        pinning_bits.bulk_zero_metadata(line);
    }
    if is_sticky_immix && unlog_bits.is_on_side() && is_free {
         unlog_bits.bulk_zero_metadata(line);
    }
    #[cfg(feature="vo_bit")]
    if is_free {
        vo_bits.bulk_zero_metadata(line);
    } else {
        vo_bits.copy_from(mark_bits, line);
    }
}

This simply moves the code into one place, but doesn't change the fact that those code are hand-written. Given that we only have 5 different metadata, this not-so-intelligent approach may still be the most practical one for now.

@k-sareen
Copy link
Collaborator

k-sareen commented Jan 8, 2025

I think the appropriate API should be:

fn initialize_metadata_for_region(start: Address, size: usize);

(or range instead of region if you prefer that).

This function would be called whenever we go into the slow-path to allocate a block, TLAB, or line. And that should be it I think. Then we should set all the relevant metadata either when the object is allocated or during tracing time.

VO-bits might have to be handled separately in this however.

@wks
Copy link
Collaborator Author

wks commented Jan 8, 2025

I think the appropriate API should be:

fn initialize_metadata_for_region(start: Address, size: usize);

(or range instead of region if you prefer that).

This function would be called whenever we go into the slow-path to allocate a block, TLAB, or line. And that should be it I think. Then we should set all the relevant metadata either when the object is allocated or during tracing time.

This is only suitable for unlog bits and pinning bits. The common part between them is that they only need to be 0 for newly allocated objects, and stale bits are benign.

It doesn't work, for example, for the mark bits because (1) free regions are certainly not marked, and (2) occupied regions still needs to be cleared before tracing (before full-heap tracing for StickyImmix).

@k-sareen
Copy link
Collaborator

k-sareen commented Jan 8, 2025

stale bits are benign.

Stale unlog bits are definitely not benign as per the StickyImmix bug.

It doesn't work, for example, for the mark bits because (1) free regions are certainly not marked, and (2) occupied regions still needs to be cleared before tracing (before full-heap tracing for StickyImmix).

I don't get the first point, but yes I agree with the second one.

@wks
Copy link
Collaborator Author

wks commented Jan 8, 2025

I don't get the first point, but yes I agree with the second one.

If a region contains mark bits, it must contain live objects, and we can't allocate new objects into it.

@k-sareen
Copy link
Collaborator

k-sareen commented Jan 8, 2025

I still a bit confused. I'm not sure why is that relevant. Do you mean if you don't reset the mark bits for StickyImmix then you will not know which regions are empty? That is subsumed by the second point, no?

@wks
Copy link
Collaborator Author

wks commented Jan 8, 2025

I still a bit confused. I'm not sure why is that relevant. Do you mean if you don't reset the mark bits for StickyImmix then you will not know which regions are empty? That is subsumed by the second point, no?

My point is, the time "whenever we go into the slow-path to allocate a block, TLAB, or line" is not the right time to reset the mark bits. Mark bits must be set at the beginning of a major GC in StickyImmix, and may be at the beginning or the end of a major GC for GenImmix. The API fn initialize_metadata_for_region(start: Address, size: usize); is not general enough to cover all kinds of metadata.

@k-sareen
Copy link
Collaborator

k-sareen commented Jan 9, 2025

Right. But we already have the MarkState module which abstracts that away. In some ways I feel like if we try to do everything in one API then it'll be just as confusing/annoying to implement. What I was suggesting was to ensure that stale bits are never possible. Not a one-API-to-rule-them-all.

@wks
Copy link
Collaborator Author

wks commented Jan 9, 2025

Right. But we already have the MarkState module which abstracts that away. In some ways I feel like if we try to do everything in one API then it'll be just as confusing/annoying to implement. What I was suggesting was to ensure that stale bits are never possible. Not a one-API-to-rule-them-all.

MarkState provides methods for doing operations, but the exact timing still depends on the concrete plan. For example,

// TODO: Currently only ImmortalSpace uses this struct. Any policy that needs mark bit can use this (immix, mark compact, mark sweep).
// We should do some refactoring for other policies as well.
pub struct MarkState { ... }

impl MarkState
    /// This has to be called when a space resets its memory regions. This can be either called before the GC tracing, or
    /// after a GC tracing (eagerly). This method will reset the mark bit. The policy should not use the mark bit before
    /// doing another tracing.
    pub fn on_block_reset<VM: VMBinding>(&self, start: Address, size: usize) {
        if let crate::util::metadata::MetadataSpec::OnSide(side) =
            *VM::VMObjectModel::LOCAL_MARK_BIT_SPEC
        {
            side.bzero_metadata(start, size);
        }
    }

The comment on on_block_reset says it can be called either before or after GC tracing. As the TODO comment says, it is currently used by ImmortalSpace, and it is always OK for the ImmortalSpace regardless whether we call it before or after GC tracing. But for StickyImmix, we have no choice. It must be called before tracing. If we call it after tracing, it will clear the sticky mark bits, making it impossible to tell old objects from young ones in the next nursery GC.

None the less, on_block_reset, along with on_object_metadata_initialization and on_global_release, provide a good guideline for its user to call the right methods at the right time, although the "right time" is still plan-specific. I think the methods in MarkState and the "framework" I mentioned in this issue are complementary to each other.

@k-sareen
Copy link
Collaborator

k-sareen commented Jan 9, 2025

Idk. I feel like having a generic API for all possible metadata will be too complex given the extremely different semantics between all of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants