Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(rust): Trim sliced-out memory from ListArrays in list arithmetic #19276

Merged
merged 3 commits into from
Oct 17, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Oct 17, 2024

For example, given this ListArray:

values: 0..10_000
offsets: [5000, 5001, 5002]

repr: [[5000], [5001]]

Currently, the list arithmetic kernel would:

  • Allocate a result array of 10_000 values, when we only need 5002 - 5000 = 2 values
  • If the one side is a broadcasting primitive, we would end up calling the ArithmeticKernel on all 10,000 values, when we only need to call it on a slice values[5000..5002] of 2 values.

This can be a performance footgun if one performs list arithmetic across sliced chunks of a DataFrame - we would end up performing allocation/compute on the entire DataFrame for every chunk.

This PR trims the sliced-out memory from the ListArrays so that we don't over-allocate / perform compute on sliced-out memory.

@github-actions github-actions bot added internal An internal refactor or improvement rust Related to Rust Polars labels Oct 17, 2024
@nameexhaustion nameexhaustion marked this pull request as ready for review October 17, 2024 10:16
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 93.61702% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.13%. Comparing base (7472a76) to head (758d266).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-arrow/src/array/list/mod.rs 90.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19276      +/-   ##
==========================================
+ Coverage   80.11%   80.13%   +0.01%     
==========================================
  Files        1526     1526              
  Lines      209338   209359      +21     
  Branches     2418     2418              
==========================================
+ Hits       167707   167761      +54     
+ Misses      41081    41048      -33     
  Partials      550      550              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -58,10 +58,12 @@ impl NumericListOp {
rhs.len(),
{
let (a, b) = lhs.list_offsets_and_validities_recursive();
assert!(a.iter().all(|x| *x.first() as usize == 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this debug_assert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

/// Returns an iterator over the offsets of this chunked array.
///
/// The offsets are returned as though the array consisted of a single chunk.
pub fn iter_offsets(&self) -> impl Iterator<Item = i64> + '_ {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused code

@ritchie46 ritchie46 merged commit 8cb6539 into pola-rs:main Oct 17, 2024
20 checks passed
@c-peters c-peters added the accepted Ready for implementation label Oct 21, 2024
@nameexhaustion nameexhaustion deleted the sliced-list-trim branch October 28, 2024 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation internal An internal refactor or improvement rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants