New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[VL] Support read old ORC file without column names #8862

Open

ccat3z wants to merge 6 commits into apache:main from ccat3z:feat/old-orc

Contributor

ccat3z commented Feb 28, 2025 •

edited

Loading

What changes were proposed in this pull request?

An ORC file written by an old version has no field names in the physical schema. To read it, we must map table schema to file schema using indices.

Pass ScanTransformer#getDataColumns as table schema to Velox.
Enable k{Parquet,Orc}UseColumnNames in Velox to match spark default behavior, which always map table schema to physical file schema using name.

This PR depends on facebookincubator/velox#12489 (old ORC files) and facebookincubator/velox#12490 (match index mapping behavior in spark).

Fixed #5638.

How was this patch tested?

Unit tests.

github-actions bot added CORE BUILD VELOX CLICKHOUSE labels

github-actions bot commented Feb 28, 2025

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions bot commented Feb 28, 2025

Run Gluten Clickhouse CI on x86

ccat3z force-pushed the feat/old-orc branch from 48e928f to a3a5a30 Compare

February 28, 2025 11:35

github-actions bot commented Feb 28, 2025

Run Gluten Clickhouse CI on x86

ccat3z force-pushed the feat/old-orc branch from a3a5a30 to 77bd5be Compare

March 3, 2025 06:20

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

ccat3z marked this pull request as ready for review

March 3, 2025 06:37

Contributor Author

ccat3z commented Mar 3, 2025

ccat3z force-pushed the feat/old-orc branch from 77bd5be to 44c8ea1 Compare

March 3, 2025 07:14

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

ccat3z force-pushed the feat/old-orc branch from 44c8ea1 to 656d113 Compare

March 3, 2025 07:55

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

This was referenced Mar 3, 2025

[VL] Support read ORC using table schema by index #8861

Closed

[VL] Support orc.force.positional.evolution #8876

Draft

ccat3z force-pushed the feat/old-orc branch from 656d113 to e9046a5 Compare

March 3, 2025 08:17

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

ccat3z force-pushed the feat/old-orc branch from e9046a5 to 20c43c9 Compare

March 3, 2025 08:25

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

ccat3z force-pushed the feat/old-orc branch from 20c43c9 to eb66b37 Compare

March 3, 2025 09:10

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

ccat3z force-pushed the feat/old-orc branch from eb66b37 to 6e75d47 Compare

March 3, 2025 09:27

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

ccat3z force-pushed the feat/old-orc branch from 6e75d47 to 7a4767a Compare

March 3, 2025 09:52

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

ccat3z and others added 2 commits

March 4, 2025 11:38


          Passthrough table schema from scan transformer to velox

47fd935


          Fix ut

421dbee

ccat3z added 3 commits

March 4, 2025 11:38


          Revert unused change

9aabe0c


          Add ut for schema evolution

55cb4f2


          Set UseColumnNames

09fa5a8

ccat3z force-pushed the feat/old-orc branch from 7a4767a to ae6562b Compare

March 4, 2025 03:39

github-actions bot commented Mar 4, 2025

Run Gluten Clickhouse CI on x86


          [DNM] Change velox repo for test

ccat3z force-pushed the feat/old-orc branch from ae6562b to 7787528 Compare

March 4, 2025 05:23

github-actions bot commented Mar 4, 2025

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BUILD CLICKHOUSE CORE VELOX