Skip to content

Commit ca2ab6a

Browse files
authored
[DOC] Add doc about experimental feature using off-heap to store broadcast build relation (#8882)
1 parent e2ca934 commit ca2ab6a

File tree

2 files changed

+25
-0
lines changed

2 files changed

+25
-0
lines changed

docs/Configuration.md

+1
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ The following configurations are related to Velox settings.
9898
| spark.gluten.velox.fs.s3a.connect.timeout | Timeout for AWS s3 connection. | 1s |
9999
| spark.gluten.sql.columnar.backend.velox.orc.scan.enabled | Enable velox orc scan. If disabled, vanilla spark orc scan will be used. | true |
100100
| spark.gluten.sql.complexType.scan.fallback.enabled | Force fallback for complex type scan, including struct, map, array. | true |
101+
| spark.gluten.velox.offHeapBroadcastBuildRelation.enabled | Experimental: If enabled, broadcast build relation will use offheap memory. Otherwise, broadcast build relation will use onheap memory, default value is false | |
101102

102103
Additionally, you can control the configurations of gluten at thread level by local property.
103104

docs/get-started/Velox.md

+24
Original file line numberDiff line numberDiff line change
@@ -545,6 +545,30 @@ I20231121 10:19:42.348845 90094332 WholeStageResultIterator.cc:220] Native Plan
545545
queuedWallNanos sum: 2.00us, count: 1, min: 2.00us, max: 2.00us
546546
```
547547

548+
549+
## Broadcast Build Relations to Off-Heap(Experimental)
550+
551+
The experimental feature **Off-Heap Broadcast Build Relations** aims to mitigate out-of-memory (OOM) issues caused by heap memory consumption during broadcast operations. Detailed design
552+
can be found [here](https://docs.google.com/document/d/1eZNWPUEdiz2JPJfhyVn9hrk6SqJFRNzOMZm6u5Yredk/edit?tab=t.0)
553+
554+
### Purpose & how it works
555+
- **Avoid OOM**: Prevent OOM errors when broadcasting large datasets.
556+
- **Reduce Heap Memory Usage**: Store broadcast build relations in Spark off-heap memory instead of on-heap memory
557+
558+
### Configuration
559+
560+
### Enable Off-Heap Broadcast
561+
To enable this feature, you can set the following Spark configuration:
562+
563+
| Property | Default | Description |
564+
|-------------------------------------------------------------|---------|-------------------------------------------------------------------|
565+
| `spark.gluten.velox.offHeapBroadcastBuildRelation.enabled` | `false` | Enable/disable off-heap storage for broadcast build relations. |
566+
567+
This feature has been tested through a series of tests, and we are collecting more feedback from users. If you have memory problem on broadcast build relations, please try this feature and give more feedbacks.
568+
569+
**Note**: This feature will become the default behavior once stabilized. Stay tuned for updates!
570+
571+
548572
# Accelerators
549573

550574
Please refer [HBM](VeloxHBM.md) [QAT](VeloxQAT.md) [IAA](VeloxIAA.md) for details

0 commit comments

Comments
 (0)