-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support scan filter for ORC decimal reader #11067
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to add some tests to E2EFilterTest
for decimal types
// Fill decimals before applying filter. | ||
fillDecimals(); | ||
|
||
const auto rawNulls = nullsInReadRange_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this logic to a static function or trait class so that it can be later reused in parquet as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet decimal uses SelectiveIntegerColumnReader::processFilter
so it might don't need to reuse this logic. How do you think? Thanks.
if constexpr (std::is_same_v<DataT, int64_t>) { | ||
processFilter(filter, rows, rawNulls); | ||
} else { | ||
VELOX_UNSUPPORTED("Unsupported filter: {}.", (int)filterKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be the case for schema evolution. Just throw VELOX_FAIL
here or VELOX_NYI
for these and add the requested type and file type information to the error message.
@Yuhta I find the E2EFilterTest relies on DWRF writer, but for now bigint-decimal will be written as int64_t, and hugeint-decimal is not supported. Do we need to support them first? velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Lines 2039 to 2041 in 3a1a60a
velox/velox/dwio/dwrf/writer/ColumnWriter.cpp Line 2090 in 3a1a60a
|
@rui-mo Yes it would be ideal to support writing decimals first. Otherwise the tests are very limited and we have no confidence the new code would work correctly. |
2937bac
to
e1b7ef8
Compare
e1b7ef8
to
3117603
Compare
3117603
to
edfb582
Compare
edfb582
to
86a8f95
Compare
86a8f95
to
1a86c34
Compare
1a86c34
to
6750596
Compare
6750596
to
6e1ad03
Compare
6e1ad03
to
ed04a8e
Compare
Summary: 'SelectiveDecimalColumnReader' is typically used for reading decimal column in ORC file. This PR supports corresponding writer for decimal column in ORC file. Adds 'format' in 'WriterOptions' to distinguish between DWRF and ORC when writing. #11067 (comment) Pull Request resolved: #11431 Reviewed By: Yuhta Differential Revision: D69859556 Pulled By: xiaoxmeng fbshipit-source-id: a09b06a0d47fb6095a9ebe26b5b2ade91b683cbb
a38ca17
to
e063f7f
Compare
@rui-mo do we still need to check in the orc files if we now have the writer? |
e063f7f
to
fb700fe
Compare
cee517e
to
5552315
Compare
Hi @majetideepak, I removed the orc file tests and added E2E filter tests. I find current footer writing is incompatible with orc file format, seeing details in #12418. To enable the orc decimal E2E tests, some status adjustment only for testing are added in |
5552315
to
10c280a
Compare
@rui-mo I feel if we use the proto::orc::Footer for writes, we can then do the right thing and generate valid ORC files? |
@majetideepak I suppose so. My workaround in the test is all about the incompatible footer. |
@Yuhta can you please help review this change? Thanks. |
10c280a
to
4baad3d
Compare
@Yuhta @majetideepak I integrated this PR with the ORC writer support in #12487 and the E2E filter tests work, seeing rui-mo@f04fbfc. Would you rather we merge this PR first, or should I wait for the ORC writer to be landed? This PR contains a few workarounds on the footer for testing purposes only, as the ORC writer is not supported. Thanks. |
4baad3d
to
f61dc1f
Compare
'leafCallToSubfieldFilter' converts decimal filter as Subfield filter, but
'SelectiveDecimalColumnReader::read' rejects scan spec filter. This PR supports
scan filter for ORC decimal reader.