Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: date_part to extract only the requested part (not the overall interval) #7189

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

delamarch3
Copy link
Contributor

@delamarch3 delamarch3 commented Feb 24, 2025

Which issue does this PR close?

Rationale for this change

date_part with an interval returns results inconsistent with implementations in duckdb and postgres.

What changes are included in this PR?

Parts are excluded from the interval:
Milliseconds, microseconds and nanoseconds will exclude the minutes and hours from the date part, but not seconds, so date_part('ms', interval '61s') will return 1000
Seconds will exclude minutes and hours, so date_part('s', interval '3600s') will return 0
Hours will not exclude days, so date_part('h', interval '25h') returns 25
Months will exclude years, so date_part('month', interval '13 months') returns 1

Are there any user-facing changes?

Yes, date_part with IntervalDayTimeType and IntervalMonthDayNanoType will return a different result

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 24, 2025
@@ -488,25 +492,31 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalMonthDayNanoType> {
fn date_part(&self, part: DatePart) -> Result<Int32Array, ArrowError> {
match part {
DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D SELECT datepart('year', interval '12 month');
┌────────────────────────────────────────────────┐
│ datepart('year', CAST('12 month' AS INTERVAL)) │
│                     int64                      │
├────────────────────────────────────────────────┤
│                       1                        │
└────────────────────────────────────────────────┘
D SELECT datepart('month', interval '12 month');
┌─────────────────────────────────────────────────┐
│ datepart('month', CAST('12 month' AS INTERVAL)) │
│                      int64                      │
├─────────────────────────────────────────────────┤
│                        0                        │
└─────────────────────────────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres agrees

postgres=# SELECT extract (month from interval '12 month');
 extract
---------
       0
(1 row)

Before this PR arrow also currently says 12.

> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12                                                                                                            |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

I think this change makes sense

}
DatePart::Nanosecond => {
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000)))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D SELECT datepart('m', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('m', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                        │
├─────────────────────────────────────────────────────┤
│                          2                          │
└─────────────────────────────────────────────────────┘
D SELECT datepart('s', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                        │
├─────────────────────────────────────────────────────┤
│                          1                          │
└─────────────────────────────────────────────────────┘
D SELECT datepart('ms', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('ms', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                         │
├──────────────────────────────────────────────────────┤
│                         1033                         │
└──────────────────────────────────────────────────────┘
D SELECT datepart('us', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('us', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│                        int64                         │
├──────────────────────────────────────────────────────┤
│                       1033044                        │
│                    (1.03 million)                    │
└──────────────────────────────────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change for arrow-rs.

duckdb says 1

SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘

postgres says almost 1:

postgres=# SELECT extract (seconds from  interval '1m 61s 33ms 44us');
 extract
----------
 1.033044
(1 row)

Arrow says 121 (before this PR)

> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for millisecond and microseconds

@@ -464,11 +464,15 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalDayTimeType> {
DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))),
DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))),
DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D SELECT datepart('hour', interval '25 hour');
┌───────────────────────────────────────────────┐
│ datepart('hour', CAST('25 hour' AS INTERVAL)) │
│                     int64                     │
├───────────────────────────────────────────────┤
│                      25                       │
└───────────────────────────────────────────────┘
D SELECT datepart('day', interval '25 hour');
┌──────────────────────────────────────────────┐
│ datepart('day', CAST('25 hour' AS INTERVAL)) │
│                    int64                     │
├──────────────────────────────────────────────┤
│                      0                       │
└──────────────────────────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the current arrow behavior is the same it seems:

> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.

> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------+

@delamarch3 delamarch3 marked this pull request as ready for review February 27, 2025 21:34
@alamb alamb changed the title Extract date parts out of intervals Fix: date_part to extract only the requested part (not the overall interval) Mar 7, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @delamarch3 -- this makes sense to me

cc @Omega359

Related

Thank you very much

@@ -464,11 +464,15 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalDayTimeType> {
DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))),
DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))),
DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the current arrow behavior is the same it seems:

> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.

> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------+

}
DatePart::Nanosecond => {
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behavior change for arrow-rs.

duckdb says 1

SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘

postgres says almost 1:

postgres=# SELECT extract (seconds from  interval '1m 61s 33ms 44us');
 extract
----------
 1.033044
(1 row)

Arrow says 121 (before this PR)

> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.

}
DatePart::Nanosecond => {
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for millisecond and microseconds

@@ -488,25 +492,31 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalMonthDayNanoType> {
fn date_part(&self, part: DatePart) -> Result<Int32Array, ArrowError> {
match part {
DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))),
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres agrees

postgres=# SELECT extract (month from interval '12 month');
 extract
---------
       0
(1 row)

Before this PR arrow also currently says 12.

> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12                                                                                                            |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

I think this change makes sense

@Omega359
Copy link
Contributor

Omega359 commented Mar 7, 2025

I agree this change makes sense for consistency with established systems.

@tustvold
Copy link
Contributor

tustvold commented Mar 8, 2025

As this is a behaviour change, even if the consensus appears to be that the current behaviour is somewhere between undesirable and a bug, should this wait until the next major release?

@alamb
Copy link
Contributor

alamb commented Mar 8, 2025

As this is a behaviour change, even if the consensus appears to be that the current behaviour is somewhere between undesirable and a bug, should this wait until the next major release?

That probably makes sense

That will be in April so not that long:

@alamb alamb added the next-major-release the PR has API changes and it waiting on the next major version label Mar 8, 2025
@alamb
Copy link
Contributor

alamb commented Mar 8, 2025

Marking with label

@alamb alamb added the api-change Changes to the arrow API label Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate next-major-release the PR has API changes and it waiting on the next major version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_part using an interval returns an incorrect result
4 participants