Not clear if Parquet statistics are used when filter applied #16740
Labels
A-io-parquet
Area: reading/writing Parquet files
bug
Something isn't working
P-medium
Priority: medium
python
Related to Python Polars
Checks
Reproducible example
Log output
Issue description
I've been trying to demonstrate the effect of statistics in Parquet files, but I'm not finding any effect - the query takes the same amount of time when reading with and without stats. With help from @deanm0000 I've seen that the verbose logging "statistics not sufficient for predicate" appears for each row group.
No verbose output appears if we use a pl.datetime instead of a python datetime, but the query is still no faster.
Any ideas what's happening?
Expected behavior
I would expect the statistics to be used resulting in a faster query.
P.S. - if I sort by the
grp
string column and do an equality filter the verbose output shows the statistics do get used and the query is 2x fasterInstalled versions
The text was updated successfully, but these errors were encountered: