Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Allow to reuse local SSD cache on Spark context restart #8891

Open
zhouyuan opened this issue Mar 4, 2025 · 5 comments · May be fixed by #8892
Open

[VL] Allow to reuse local SSD cache on Spark context restart #8891

zhouyuan opened this issue Mar 4, 2025 · 5 comments · May be fixed by #8892
Labels
enhancement New feature or request

Comments

@zhouyuan
Copy link
Contributor

zhouyuan commented Mar 4, 2025

Description

Currently local SSD cache will be discarded after Spark context shutdown. It would better to make it reusable with a new optional config

@zhouyuan zhouyuan added the enhancement New feature or request label Mar 4, 2025
@jackylee-ch
Copy link
Contributor

Curious about the scenarios where reuse would be necessary. In a cluster environment, if a Spark Application is regenerated, the executors will be re-allocated. The cache on the executors may not align, and even if there is cache, the hit rate might not be high.

@zhouyuan
Copy link
Contributor Author

zhouyuan commented Mar 4, 2025

@jackylee-ch Thanks, I was told the soft cache affinity can help to alleviate this issue, but I'm trying to find more resource verify
https://github.com/apache/incubator-gluten/blob/main/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala#L691

@jackylee-ch
Copy link
Contributor

I was told the soft cache affinity can help to alleviate this issue, but I'm trying to find more resource verify

AFAIK, the cache involved in duplicateReading will become invalid after Spark restarts, unless the cache can be reused and the executor rescheduling issue is resolved. Nevertheless, the pr is still helpful for cache reuse within the same application.

@FelixYBW
Copy link
Contributor

FelixYBW commented Mar 4, 2025

@jackylee-ch It's more for benchmark testing actually. So the warm run in second test can get 100% hit.
You are right in production the local ssd cache may be used in the same query but hard to in second query.

@zhouyuan
Copy link
Contributor Author

zhouyuan commented Mar 5, 2025

For single instance based tests it should be still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants