Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-8912][VL] Add Offset support for CollectLimitExec #8914

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ArnavBalyan
Copy link
Contributor

What changes were proposed in this pull request?

  • Add offset ability for collect limit exec operator.
  • Also makes it compatible with newer spark versions - 3.4 and 3.5

How was this patch tested?

  • Unit Tests added.

@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Mar 5, 2025
Copy link

github-actions bot commented Mar 5, 2025

#8912

Copy link

github-actions bot commented Mar 5, 2025

Run Gluten Clickhouse CI on x86

@@ -58,7 +58,7 @@ class GlutenSQLCollectLimitExecSuite extends WholeStageTransformerSuite {

testWithSpecifiedSparkVersion(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testWithSpecifiedSparkVersion -> test

@@ -67,7 +67,9 @@ class GlutenSQLCollectLimitExecSuite extends WholeStageTransformerSuite {
assertGlutenOperatorMatch[ColumnarCollectLimitBaseExec](df, checkMatch = true)
}

testWithSpecifiedSparkVersion("ColumnarCollectLimitExec - with filter", Array("3.2", "3.3")) {
testWithSpecifiedSparkVersion(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, so as others

@@ -138,4 +140,34 @@ class GlutenSQLCollectLimitExecSuite extends WholeStageTransformerSuite {

assertGlutenOperatorMatch[ColumnarCollectLimitBaseExec](unionDf, checkMatch = true)
}

testWithSpecifiedSparkVersion("ColumnarCollectLimitExec - offset test", Array("3.4", "3.5")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the result for spark3.3? Is the result also correct but operator not matched?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that, please also add the result check for spark3.2 and spark3.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the test to cover more code path, such as limit(12)

partition => {
val droppedRows = dropLimitedRows(partition, offset)
val adjustedLimit = Math.max(0, limit - offset)
collectLimitedRows(droppedRows, adjustedLimit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enhance the collectLimitedRows, we can slice the input RowVector from offset to adjustedLimit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLICKHOUSE CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants