-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align buffers of array data imported through the FFI if they aren't aligned #7137
Conversation
I wonder if we could make this behaviour opt-in, whilst I accept systems may produce unaligned data, and people may want to accommodate this (although historically the consensus has been this is a bug), IMO I would not expect an FFI interface to be silently reallocating memory buffers without being explicitly asked to do so. Returning an error instead of panicking would be a reasonable parallel improvement |
I considered that, but couldn't think of any scenario where a Rust program importing data from another system would not want to make sure the buffers are aligned. And note that buffers are checked already, that's how the panic is triggered after all.
It's important to consider here that reallocation won't happen if the producer is a good citizen and produces aligned buffers already. Often by reallocation on their side as well. That's what I think is being currently implemented in the C++ implementation of Flight. The buffers coming from gRPC can't possibly be aligned according to Arrow types so no matter who performs the copy for alignment, it will have to be done at least once.
I can't see how that error could be handled at a higher level layer |
I think it boils down to a subjective judgement over which of the following is better.
Or
I personally think the former is better as it drives the ecosystem forwards, but appreciate some people are more in the camp of "I just want it to work".
Neither, but some people have an aversion to panics
Whilst this technically isn't true, I agree practically speaking it is unavoidable in this instance, and tbh given network IO is involved is likely irrelevant. My concern would more be for systems where this copy is unnecessary, e.g. simply not using aligned_alloc. |
Note that [1] https://github.com/apache/arrow-rs/blob/main/arrow-data/src/data.rs#L1591 |
Most arrow implementations actually provide 64 byte alignment
You can always obtain an aligned allocation, by over allocating by the alignment, and then slicing. This is in fact what many arrow implementations do, Go included https://github.com/apache/arrow-go/blob/main/arrow/memory/go_allocator.go#L23 |
So it seems like the request for next steps on this PR is to add a opt-in flag. Something like let stream_reader = ArrowArrayStreamReader::try_new(ffi_stream)
.with_align_buffers() Which would set a flag that would do the opt in automatic data alignment |
If this is the conclusion after the discussion, I should probably close this PR. It will require a lot of work to pipe this flag all the way from ADBC to the code that instantiates the stream readers. More importantly, I don't see a scenario where it would make sense to not set this flag -- reallocation only happens if necessary and the check is cheap. It's easier to maintain a set of locally applied patches than to argue about the need for robustness in software. |
Closes #7136.
This will also fix this issue in
arrow-adbc
once thearrow-array
crate version is bumped: apache/arrow-adbc#2526