Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Support read old ORC file without column names #8862

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ccat3z
Copy link
Contributor

@ccat3z ccat3z commented Feb 28, 2025

What changes were proposed in this pull request?

An ORC file written by an old version has no field names in the physical schema. To read it, we must map table schema to file schema using indices.

  1. Pass ScanTransformer#getDataColumns as table schema to Velox.
  2. Enable k{Parquet,Orc}UseColumnNames in Velox to match spark default behavior, which always map table schema to physical file schema using name.

This PR depends on facebookincubator/velox#12489 (old ORC files) and facebookincubator/velox#12490 (match index mapping behavior in spark).

Fixed #5638.

How was this patch tested?

Unit tests.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

@ccat3z ccat3z marked this pull request as ready for review March 3, 2025 06:37
@ccat3z
Copy link
Contributor Author

ccat3z commented Mar 3, 2025

cc @kecookier

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 3, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 4, 2025

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Mar 4, 2025

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Read OrcFile error when schema in Orc file and the table file don't consist
2 participants