-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: date_part
to extract only the requested part (not the overall interval)
#7189
base: main
Are you sure you want to change the base?
Conversation
@@ -488,25 +492,31 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalMonthDayNanoType> { | |||
fn date_part(&self, part: DatePart) -> Result<Int32Array, ArrowError> { | |||
match part { | |||
DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))), | |||
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))), | |||
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D SELECT datepart('year', interval '12 month');
┌────────────────────────────────────────────────┐
│ datepart('year', CAST('12 month' AS INTERVAL)) │
│ int64 │
├────────────────────────────────────────────────┤
│ 1 │
└────────────────────────────────────────────────┘
D SELECT datepart('month', interval '12 month');
┌─────────────────────────────────────────────────┐
│ datepart('month', CAST('12 month' AS INTERVAL)) │
│ int64 │
├─────────────────────────────────────────────────┤
│ 0 │
└─────────────────────────────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres agrees
postgres=# SELECT extract (month from interval '12 month');
extract
---------
0
(1 row)
Before this PR arrow also currently says 12.
> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12 |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
I think this change makes sense
} | ||
DatePart::Nanosecond => { | ||
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D SELECT datepart('m', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('m', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├─────────────────────────────────────────────────────┤
│ 2 │
└─────────────────────────────────────────────────────┘
D SELECT datepart('s', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├─────────────────────────────────────────────────────┤
│ 1 │
└─────────────────────────────────────────────────────┘
D SELECT datepart('ms', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('ms', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├──────────────────────────────────────────────────────┤
│ 1033 │
└──────────────────────────────────────────────────────┘
D SELECT datepart('us', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('us', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├──────────────────────────────────────────────────────┤
│ 1033044 │
│ (1.03 million) │
└──────────────────────────────────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change for arrow-rs.
duckdb says 1
SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘
postgres says almost 1:
postgres=# SELECT extract (seconds from interval '1m 61s 33ms 44us');
extract
----------
1.033044
(1 row)
Arrow says 121
(before this PR)
> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing for millisecond and microseconds
@@ -464,11 +464,15 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalDayTimeType> { | |||
DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))), | |||
DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))), | |||
DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D SELECT datepart('hour', interval '25 hour');
┌───────────────────────────────────────────────┐
│ datepart('hour', CAST('25 hour' AS INTERVAL)) │
│ int64 │
├───────────────────────────────────────────────┤
│ 25 │
└───────────────────────────────────────────────┘
D SELECT datepart('day', interval '25 hour');
┌──────────────────────────────────────────────┐
│ datepart('day', CAST('25 hour' AS INTERVAL)) │
│ int64 │
├──────────────────────────────────────────────┤
│ 0 │
└──────────────────────────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, the current arrow behavior is the same it seems:
> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25 |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.
> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------------------------------------------------------+
date_part
to extract only the requested part (not the overall interval)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -464,11 +464,15 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalDayTimeType> { | |||
DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))), | |||
DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))), | |||
DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, the current arrow behavior is the same it seems:
> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25 |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.
> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------------------------------------------------------+
} | ||
DatePart::Nanosecond => { | ||
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change for arrow-rs.
duckdb says 1
SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘
postgres says almost 1:
postgres=# SELECT extract (seconds from interval '1m 61s 33ms 44us');
extract
----------
1.033044
(1 row)
Arrow says 121
(before this PR)
> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
} | ||
DatePart::Nanosecond => { | ||
Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing for millisecond and microseconds
@@ -488,25 +492,31 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalMonthDayNanoType> { | |||
fn date_part(&self, part: DatePart) -> Result<Int32Array, ArrowError> { | |||
match part { | |||
DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))), | |||
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))), | |||
DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres agrees
postgres=# SELECT extract (month from interval '12 month');
extract
---------
0
(1 row)
Before this PR arrow also currently says 12.
> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12 |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
I think this change makes sense
I agree this change makes sense for consistency with established systems. |
As this is a behaviour change, even if the consensus appears to be that the current behaviour is somewhere between undesirable and a bug, should this wait until the next major release? |
That probably makes sense That will be in April so not that long: |
Marking with label |
Which issue does this PR close?
date_part
using an interval returns an incorrect result #7182Rationale for this change
date_part
with an interval returns results inconsistent with implementations in duckdb and postgres.What changes are included in this PR?
Parts are excluded from the interval:
Milliseconds, microseconds and nanoseconds will exclude the minutes and hours from the date part, but not seconds, so
date_part('ms', interval '61s')
will return 1000Seconds will exclude minutes and hours, so
date_part('s', interval '3600s')
will return 0Hours will not exclude days, so
date_part('h', interval '25h')
returns 25Months will exclude years, so
date_part('month', interval '13 months')
returns 1Are there any user-facing changes?
Yes,
date_part
withIntervalDayTimeType
andIntervalMonthDayNanoType
will return a different result