[SPARK-51296][SQL] Support collecting corrupt data in singleVariantColumn mode #50051
+60
−11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently, if the
singleVariantColumn
is specified, the schema will be a single variant column. It is then impossible to collect corrupt data, which requires the schema to contain a column for corrupt data. This PR enables collecting corrupt data insingleVariantColumn
mode by adding a new option,corruptRecordColumnWithSingleVariantColumn
. It only takes effect whensingleVariantColumn
is specified. It defines the column name for the corrupt record, and the schema will contain exactly two columns: the single variant column to capture valid data, and the corrupt record column to capture corrupt data.Why are the changes needed?
It allows collecting corrupt data in
singleVariantColumn
mode.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.
Was this patch authored or co-authored using generative AI tooling?
No.