Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implements interuption of compactions and adds an IT to verify compaction cancels on tablet migration #5395

Open
wants to merge 4 commits into
base: 2.1
Choose a base branch
from

Conversation

keith-turner
Copy link
Contributor

@keith-turner keith-turner commented Mar 10, 2025

Changed this from adding a test to adding a test and adding functionality. Added the ability to interrupt compaction threads on tablet close. Added an IT that covers the two ways compactions can be interrupted on compaction close.

@keith-turner
Copy link
Contributor Author

After running this test, see the following in the logs.

$ grep 'canceled' *
TabletServer_1449570859.out:2025-03-10T22:00:19,779 309 [compaction.FileCompactor] DEBUG: Compaction canceled 1;005000;004000
TabletServer_1449570859.out:2025-03-10T22:00:19,779 309 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;005000;004000 
TabletServer_1449570859.out:2025-03-10T22:00:22,794 327 [compaction.FileCompactor] DEBUG: Compaction canceled 1;007000;006000
TabletServer_1449570859.out:2025-03-10T22:00:22,794 327 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;007000;006000 
TabletServer_1449570859.out:2025-03-10T22:00:25,801 324 [compaction.FileCompactor] DEBUG: Compaction canceled 1;008000;007000
TabletServer_1449570859.out:2025-03-10T22:00:25,801 324 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;008000;007000 
TabletServer_941199483.out:2025-03-10T22:00:19,760 324 [compaction.FileCompactor] DEBUG: Compaction canceled 1;015000;014000
TabletServer_941199483.out:2025-03-10T22:00:19,760 324 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;015000;014000 
TabletServer_941199483.out:2025-03-10T22:00:28,708 297 [compaction.FileCompactor] DEBUG: Compaction canceled 1;010000;009000
TabletServer_941199483.out:2025-03-10T22:00:28,708 297 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;010000;009000 
TabletServer_941199483.out:2025-03-10T22:00:28,767 322 [compaction.FileCompactor] DEBUG: Compaction canceled 1;016000;015000
TabletServer_941199483.out:2025-03-10T22:00:28,768 322 [tablet.CompactableImpl] DEBUG: Compaction canceled 1;016000;015000 

@keith-turner keith-turner marked this pull request as ready for review March 11, 2025 22:23
@keith-turner keith-turner changed the title adds an IT to verify compaction cancels on tablet migration implements interuption of compactions and adds an IT to verify compaction cancels on tablet migration Mar 11, 2025
is.addOption(SLEEP_UNINTERRUPTIBLY, Boolean.toString(b));
}

private void sleep(long time) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked these changes by jstacking a tablet server running the new test and verifying that compaction threads were sleeping both ways.

.map(ac -> ac.getTablet().getTable())
.filter(tid -> tid.equals(tableId1) || tid.equals(tableId2)).count();
log.debug("Running compactions {}", runningCompactions);
return runningCompactions == 40;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test usually passes for me, every once in a while it will get stuck here w/ less than the expected 40 compactions. Probably some issue w/ the propagation of compaction config changes.

// Limit interrupting compactions to the part that reads and writes files and avoid
// interrupting the metadata table updates.
try (var ignored = new CompactionInterrupter()) {
stats = CompactableUtils.compact(tablet, job, cInfo, compactEnv, compactFiles, tmpFileName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand how this PR is different than what is happening inside CompactableUtils.compact at:

final Runnable compactionCancellerTask = () -> {
if (!cenv.isCompactionEnabled()) {
compactor.interrupt();
}
};
final ScheduledFuture<?> future = tablet.getContext().getScheduledExecutor()
.scheduleWithFixedDelay(compactionCancellerTask, 10, 10, TimeUnit.SECONDS);

When CompactableImpl.close is called and sets closed=true, then cenv.isCompactionEnabled() should return false and interrupt the FileCompactor. I'm thinking the difference here is checking the Thread.interrupted state and throwing an InterruptedException in CompactionInterrupter.close. I'm wondering if the code below is functionally equivalent:

  try {
    stats = CompactableUtils.compact(tablet, job, cInfo, compactEnv, compactFiles, tmpFileName);
  } finally {
    if (Thread.interrupted()) {
        throw new InterruptedException();
    }
  }

Copy link
Contributor Author

@keith-turner keith-turner Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference is that compactor.interrupt() was not calling Thread.interrupt(). However it could call that, it was already tracking the thread running the compaction. Modified it do that in 7d39c8d moving where Thread.interrupt() is called down a level (this will probably benefit external compactions also). The test still pass w/ this change, however they were a bit slower because the interrupting the thread could take up 10 seconds after calling close(). So changed the scheduled timer to run every 3 seconds. The previous code would immediately interrupt the thread when close() was called.

@dlmarion
Copy link
Contributor

I'm wondering if we should change:

public boolean isCompactionEnabled() {
return inexpensiveCheck.get() && expensiveCheck.get();
}

to:

    public boolean isCompactionEnabled() {
      return !Thread.currentThread.isInterrupted() && inexpensiveCheck.get() && expensiveCheck.get();
    }

@keith-turner
Copy link
Contributor Author

For the change to isCompactionEnabled() the way the code is working w/ the latest that changes it would probably not help to check for interrupted there. The interrupt stats will be set on the compaction thread when is compaction isCompactionEnabled() returns false. Also another thead besides the compaction thread could call isCompactionEnabled(), so could not use Thread.currentThread()

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants