Result mismatch with Spark in statistical aggregation function when DivideByZero occurs during expression evaluation #12542

NEUpanning · 2025-03-05T11:27:20Z

Bug description

In Spark, if the configuration setting spark.sql.legacy.statisticalAggregate is set to true, statistical aggregation function will return Double.NaN instead of NULL when DivideByZero occurs during expression evaluation. For example, stddev(2.0) will return NaN in Spark but return NULL in Velox. Statistical aggregation function includes stddev, stddev_samp, variance, var_samp, skewness, kurtosis, covar_samp, corr.

Spark PR: apache/spark#29983

System information

/

Relevant logs

The text was updated successfully, but these errors were encountered:

NEUpanning added bug Something isn't working triage Newly created issue that needs attention. labels Mar 5, 2025

NEUpanning mentioned this issue Mar 6, 2025

fix(function): Support Spark legacy behavior for central moments functions when 'divide by zero' occurs during expression evaluation #12566

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result mismatch with Spark in statistical aggregation function when DivideByZero occurs during expression evaluation #12542

Result mismatch with Spark in statistical aggregation function when DivideByZero occurs during expression evaluation #12542

NEUpanning commented Mar 5, 2025 •

edited

Loading

Result mismatch with Spark in statistical aggregation function when DivideByZero occurs during expression evaluation #12542

Result mismatch with Spark in statistical aggregation function when DivideByZero occurs during expression evaluation #12542

Comments

NEUpanning commented Mar 5, 2025 • edited Loading

Bug description

System information

Relevant logs

NEUpanning commented Mar 5, 2025 •

edited

Loading