refactor(rust): Trim sliced-out memory from ListArrays in list arithmetic #19276

nameexhaustion · 2024-10-17T09:50:41Z

For example, given this ListArray:

values: 0..10_000
offsets: [5000, 5001, 5002]

repr: [[5000], [5001]]

Currently, the list arithmetic kernel would:

Allocate a result array of 10_000 values, when we only need 5002 - 5000 = 2 values
If the one side is a broadcasting primitive, we would end up calling the ArithmeticKernel on all 10,000 values, when we only need to call it on a slice values[5000..5002] of 2 values.

This can be a performance footgun if one performs list arithmetic across sliced chunks of a DataFrame - we would end up performing allocation/compute on the entire DataFrame for every chunk.

This PR trims the sliced-out memory from the ListArrays so that we don't over-allocate / perform compute on sliced-out memory.

codecov · 2024-10-17T10:26:24Z

Codecov Report

Attention: Patch coverage is 93.61702% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.13%. Comparing base (7472a76) to head (758d266).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/polars-arrow/src/array/list/mod.rs	90.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #19276      +/-   ##
==========================================
+ Coverage   80.11%   80.13%   +0.01%     
==========================================
  Files        1526     1526              
  Lines      209338   209359      +21     
  Branches     2418     2418              
==========================================
+ Hits       167707   167761      +54     
+ Misses      41081    41048      -33     
  Partials      550      550

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ritchie46 · 2024-10-17T10:58:23Z

crates/polars-core/src/series/arithmetic/list_borrowed.rs

@@ -58,10 +58,12 @@ impl NumericListOp {
                rhs.len(),
                {
                    let (a, b) = lhs.list_offsets_and_validities_recursive();
+                    assert!(a.iter().all(|x| *x.first() as usize == 0));


Can we make this debug_assert?

crates/polars-core/src/series/arithmetic/list_borrowed.rs

nameexhaustion · 2024-10-17T12:14:57Z

crates/polars-core/src/chunked_array/list/mod.rs

-    /// Returns an iterator over the offsets of this chunked array.
-    ///
-    /// The offsets are returned as though the array consisted of a single chunk.
-    pub fn iter_offsets(&self) -> impl Iterator<Item = i64> + '_ {


remove unused code

c

81a0fd3

github-actions bot added internal An internal refactor or improvement rust Related to Rust Polars labels Oct 17, 2024

c

758d266

nameexhaustion marked this pull request as ready for review October 17, 2024 10:16

nameexhaustion requested review from ritchie46, orlp and c-peters as code owners October 17, 2024 10:16

ritchie46 reviewed Oct 17, 2024

View reviewed changes

debug_assert

4496ece

nameexhaustion commented Oct 17, 2024

View reviewed changes

ritchie46 merged commit 8cb6539 into pola-rs:main Oct 17, 2024
20 checks passed

c-peters added the accepted Ready for implementation label Oct 21, 2024

c-peters assigned nameexhaustion Oct 21, 2024

nameexhaustion deleted the sliced-list-trim branch October 28, 2024 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(rust): Trim sliced-out memory from ListArrays in list arithmetic #19276

refactor(rust): Trim sliced-out memory from ListArrays in list arithmetic #19276

nameexhaustion commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 17, 2024

ritchie46 Oct 17, 2024

nameexhaustion Oct 17, 2024

nameexhaustion Oct 17, 2024

refactor(rust): Trim sliced-out memory from ListArrays in list arithmetic #19276

refactor(rust): Trim sliced-out memory from ListArrays in list arithmetic #19276

Conversation

nameexhaustion commented Oct 17, 2024 • edited Loading

codecov bot commented Oct 17, 2024

Codecov Report

ritchie46 Oct 17, 2024

Choose a reason for hiding this comment

nameexhaustion Oct 17, 2024

Choose a reason for hiding this comment

nameexhaustion Oct 17, 2024

Choose a reason for hiding this comment

nameexhaustion commented Oct 17, 2024 •

edited

Loading