Prefetch blocks and place into data BlockCache for major compactions #5302

dlmarion · 2025-02-04T19:36:21Z

Related to #2770

dlmarion · 2025-02-04T19:43:28Z

Looking at the new vectored read API in Hadoop has been on my todo list. Another good resource for understanding it is here. I attempted to use this, but was unable to figure out a good way to use it as we don't directly deal with HDFS blocks. Instead, we deal with RFile blocks, and we cache them, at a much different layer than where the HDFS block is retrieved.

Instead I attempted to create something similar in this PR, prefetching RFile blocks and preemptively caching them. I think this might make sense for operations that perform sequential reads, like compactions. So I wired this up in the FileCompactor for major compactions, and I targeted the main branch because major compactions only run in Compactors. In earlier releases this change would cause churn in the data block cache and might cause a decrease in scan performance due to eviction of other blocks.

There are still some changes to be made, like adding the BlockCache to the Compactor, making the number of blocks to prefetch a property, and moving the ThreadPoolExecutor out of the Reader and somewhere else. But wanted to get early feedback on the concept before putting more work into it.

dlmarion · 2025-03-03T13:17:25Z

Full IT build completed successfully

Prefetch blocks and place into data BlockCache for major compactions

72fab9f

dlmarion added this to the 4.0.0 milestone Feb 4, 2025

dlmarion requested a review from keith-turner February 4, 2025 19:36

dlmarion self-assigned this Feb 4, 2025

dlmarion added 5 commits February 6, 2025 21:27

Added TODOs, fixed some prefetch values for writers

3c4a4a5

Merge branch 'main' into block-readahead-for-compactions

c387a46

Wiring up data block cache in Compactor

92b2823

Changes from testing

121bcbe

Shutdown thread pool when Reader is closed

134d419

dlmarion mentioned this pull request Feb 28, 2025

Modified BlockCacheManager to not create caches when size is zero. #5369

Open

dlmarion marked this pull request as ready for review March 3, 2025 13:17

dlmarion added 2 commits March 3, 2025 13:44

Updated TabletServerResourceManager to prevent NPE on null cache

b80bea7

Merge branch 'main' into block-readahead-for-compactions

42b3a03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefetch blocks and place into data BlockCache for major compactions #5302

Prefetch blocks and place into data BlockCache for major compactions #5302

dlmarion commented Feb 4, 2025

dlmarion commented Feb 4, 2025 •

edited

Loading

dlmarion commented Mar 3, 2025

Prefetch blocks and place into data BlockCache for major compactions #5302

Are you sure you want to change the base?

Prefetch blocks and place into data BlockCache for major compactions #5302

Conversation

dlmarion commented Feb 4, 2025

dlmarion commented Feb 4, 2025 • edited Loading

dlmarion commented Mar 3, 2025

dlmarion commented Feb 4, 2025 •

edited

Loading