Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result mismatch with Spark in statistical aggregation function when DivideByZero occurs during expression evaluation #12542

Open
NEUpanning opened this issue Mar 5, 2025 · 0 comments
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@NEUpanning
Copy link
Contributor

NEUpanning commented Mar 5, 2025

Bug description

In Spark, if the configuration setting spark.sql.legacy.statisticalAggregate is set to true, statistical aggregation function will return Double.NaN instead of NULL when DivideByZero occurs during expression evaluation. For example, stddev(2.0) will return NaN in Spark but return NULL in Velox. Statistical aggregation function includes stddev, stddev_samp, variance, var_samp, skewness, kurtosis, covar_samp, corr.

Spark PR: apache/spark#29983

System information

/

Relevant logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

1 participant