-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsafe improvements: arrow/{array_reader,buffer}. #6025
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -173,6 +173,7 @@ impl ArrayReader for StructArrayReader { | |
array_data_builder.null_bit_buffer(Some(bitmap_builder.into())); | ||
} | ||
|
||
// SAFETY: FIXME: document why this call is safe. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The argument basically goes something like the code in this module correctly constructed the struct array values and validated the input during the construction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, that's what I assumed, but the method was a bit too long and I am not familiar enough with the library to really figure out the details here... |
||
let array_data = unsafe { array_data_builder.build_unchecked() }; | ||
Ok(Arc::new(StructArray::from(array_data))) | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,6 +162,10 @@ impl<K: ArrowNativeType + Ord, V: OffsetSizeTrait> DictionaryBuffer<K, V> { | |
|
||
let data = match cfg!(debug_assertions) { | ||
true => builder.build().unwrap(), | ||
// SAFETY: FIXME: this is unsound. data_type is passed by the caller without | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will review this more carefully -- I think this code is still sound as the only callers of However, maybe we could double check the data_type with an assert There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Checking data_type SGTM. Technically speaking, relying on callers doing "the right thing" is unsound (which is not the same as "exhibits undefined behaviour"): unsafe code is sound if no safe code could cause UB, regardless of current usage. Of course, if this is not exposed to the user, it is less important to fix, but would still be nice IMO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW this PR is still on my list to help clean, but I am struggling to find time. Basically i plan to do some more code investigation to improve the comments here. If there are any specific API changes / proposal to improve safety I think we should file a separate ticket to discuss them. I will do so if I find one during analysis There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, take your time! I find with these things speediness should not trump thoroughness :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @veluca93 -- I wonder if you have had any more time to do some more code investigation? If you think this is actually unsound I will try and find more time to review this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't had a lot of time - I've been busy and I am at a programming competition this week - but I can confirm that the code as written is unsound (even if the unsoundness cannot trigger undefined behaviour): validating the data type, or marking the function as unsafe and writing down in the caller why the data type is correct would fix the issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @alamb - any updates on when you might be able to dig a bit more into the safety invariants of this code? :-) unfortunately I don't think I am familiar enough with the codebase to effectively look more into it myself... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am likewise not familar with this part of the code -- and I haven't been able to find time to spend studying it in detail. One of the nice things about open source code is that it is all available for everyone to look -- I welcome additional thoughts / reviews / assistance on this item FRom my perspective, this ticket is current a reasearch item as it is a seemingly theoretical problem. Thus it will likely will remain a relatively low priority for me personally unless someone can cook up a rust example showing actual unsafe/unsound behavior resulting thank you again for the help. It would be great if you could find time to analyze the code or make an example showing a problem |
||
// any validation, which might result in creating invalid arrays, and this is a | ||
// safe function which cannot have preconditions. Similar considerations apply | ||
// to null_buffer. | ||
false => unsafe { builder.build_unchecked() }, | ||
}; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the safety argument goes something like the RecordReader writes only valid values for type
T
intorecord_data
and that the record_reader's null buffer was correctly created.