-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Allow to reuse local SSD cache on Spark context restart #8891
Comments
Curious about the scenarios where reuse would be necessary. In a cluster environment, if a Spark Application is regenerated, the executors will be re-allocated. The cache on the executors may not align, and even if there is cache, the hit rate might not be high. |
@jackylee-ch Thanks, I was told the soft cache affinity can help to alleviate this issue, but I'm trying to find more resource verify |
AFAIK, the cache involved in duplicateReading will become invalid after Spark restarts, unless the cache can be reused and the executor rescheduling issue is resolved. Nevertheless, the pr is still helpful for cache reuse within the same application. |
@jackylee-ch It's more for benchmark testing actually. So the warm run in second test can get 100% hit. |
For single instance based tests it should be still useful. |
Description
Currently local SSD cache will be discarded after Spark context shutdown. It would better to make it reusable with a new optional config
The text was updated successfully, but these errors were encountered: