Spark quarter function implementation by kazantsev-maksim · Pull Request #20808 · apache/datafusion

kazantsev-maksim · 2026-03-08T16:25:02Z

Which issue does this PR close?

N/A

Rationale for this change

Add new spark function: https://spark.apache.org/docs/latest/api/sql/index.html#quarter

What changes are included in this PR?

Implementation
SLT tests

Are these changes tested?

Yes, tests added as part of this PR.

Are there any user-facing changes?

No, these are new function.

kosiew

👋 @kazantsev-maksim,

Thanks for putting this together — the new quarter UDF is headed in a good direction, but I think there are a couple of compatibility and robustness issues we should address before merging.

The main blocker is that the current registration uses an exact Date32 signature, which looks narrower than Spark’s documented behavior for quarter. I also left a couple of smaller follow-ups around reusing the existing date_part("quarter", ...) path and avoiding the unwrap() panic path.

I’d also love to see coverage added for Spark’s documented string-literal example so we lock that compatibility in.

kosiew · 2026-03-20T09:33:09Z

datafusion/spark/src/function/datetime/quarter.rs

+impl SparkQuarter {
+    pub fn new() -> Self {
+        Self {
+            signature: Signature::exact(vec![DataType::Date32], Volatility::Immutable),


I think this is the main thing we should fix before merging. Right now the UDF is registered with an exact Date32 signature, which means we no longer preserve Spark’s documented call shape for quarter.

Spark’s SQL docs show SELECT quarter('2016-08-31'); returning 3, and this SLT file used to carry that example before it was replaced with explicit ::DATE casts. With the current signature, we only validate the casted form and could end up rejecting the plain string-literal case that Spark accepts.

Could we switch this to a coercible signature, or possibly just route through the existing date_part('quarter', ...) behavior, and add coverage for the uncasted query?

kosiew · 2026-03-20T09:33:09Z

datafusion/spark/src/function/datetime/quarter.rs

+    }
+
+    fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
+        let [arg] = take_function_args("quarter", args.args)?;


Small suggestion here: this seems to repeat the same scalar/array Date32 dispatching pattern we already have in other datetime helpers, while datafusion_functions::datetime::date_part() already supports "quarter".

Would it make sense to delegate to that existing implementation instead? It feels like that would help keep coercion rules, null handling, and any future date-part behavior aligned in one place.

kosiew · 2026-03-20T09:33:09Z

datafusion/spark/src/function/datetime/quarter.rs

+}
+
+fn spark_quarter(days: i32) -> Result<i32> {
+    let quarter = Date32Type::to_naive_date_opt(days).unwrap().quarter();


One thing that made me a little nervous here is the unwrap(). That introduces a panic path if a malformed Date32 value ever makes it down to this helper.

last_day handles the same kind of conversion by returning an explicit error instead, which feels a bit safer for a public UDF. Could we do the same here so bad inputs show up as query errors rather than aborting execution?

I tried to rework it

kazantsev-maksim · 2026-03-26T15:30:14Z

Thanks for the review @kosiew. Could you please another see, when you have a time?

kosiew

@kazantsev-maksim
Thanks for the update.

The unwrap panic concern in quarter.rs looks resolved, and accepting string inputs is a good step forward.
I still see two follow-ups that seem important before this is ready.

kosiew · 2026-03-27T06:21:02Z

datafusion/spark/src/function/datetime/quarter.rs

+                    TypeSignature::Exact(vec![DataType::Utf8View]),
+                    TypeSignature::Exact(vec![DataType::LargeUtf8]),
+                    TypeSignature::Exact(vec![DataType::Date32]),
+                    TypeSignature::Exact(vec![DataType::Timestamp(


I think there is still one important gap here.

quarter is still declared with an exact Timestamp(Millisecond, None) signature, while Spark's date_part wrapper already uses the broader coercible timestamp path.
Because of that, timestamp inputs with other units or timezones can still get rejected during planning, even though the implementation below handles DataType::Timestamp(_, _) once execution starts.

Could we align this with the existing Spark datetime coercion model so quarter behaves consistently with the rest of that path?

kosiew · 2026-03-27T06:22:35Z

datafusion/sqllogictest/test_files/spark/datetime/quarter.slt

+1
+
+query I
+SELECT quarter('2009-01-12'::string);


Nice to see the string coverage added here.

I think we still need the specific regression case from Spark's documented uncasted form.

Right now this file checks quarter('2009-01-12'::string), but it does not restore a plain string literal query like quarter('2016-08-31').

Since preserving that call shape was the reason for broadening the signature, could we add that case back as well?

kosiew

@kazantsev-maksim
Thanks for the patch.
I noticed a few issues that look worth fixing before this lands.

kosiew · 2026-03-31T10:03:40Z

datafusion/spark/src/function/datetime/quarter.rs

+        Ok(Arc::new(Field::new(
+            self.name(),
+            DataType::Int32,
+            args.arg_fields[0].is_nullable(),


I think the return-field nullability needs to be loosened here.

Right now return_field_from_args mirrors the input field nullability, but the new string path can produce NULL even when the input is non-null. This patch adds cases like quarter('abc'::string) and quarter(''::string) returning NULL, so quarter(non_null_utf8_col) would still be advertised as Int32 NOT NULL even though execution can yield nulls.

That looks like a schema contract bug. It also differs from existing Spark helpers like next_day, which force nullable output when invalid strings can map to NULL.

Thanks. fixed

kosiew · 2026-03-31T10:05:05Z

datafusion/spark/src/function/datetime/quarter.rs

+        }
+        DataType::Utf8 | DataType::Utf8View | DataType::LargeUtf8 => {
+            let date_array =
+                cast_with_options(array, &DataType::Date32, &CastOptions::default())?;


I am a bit concerned that the new string handling is narrower than the shared datetime coercion path.

This currently forces every string through a Date32 cast before calling date_part. That can reject valid timestamp-shaped strings that date_part already accepts elsewhere, for example date_part('second', '2020-09-08T12:00:12.12345678+00:00') in datafusion/sqllogictest/test_files/datetime/date_part.slt.

Because this does not route through the existing date_part('quarter', ...) behavior, quarter can still diverge from the rest of the datetime coercion model for string inputs. Could we reuse the same coercion path here so the behavior stays aligned?

@kosiew datafusion's date_part does not support string types, you'll still have to cast to a date type first.

kosiew · 2026-03-31T10:16:26Z

datafusion/spark/src/function/datetime/quarter.rs

+use arrow::array::{Array, ArrayRef};
+use arrow::compute::{CastOptions, DatePart, cast_with_options, date_part};
+use arrow::datatypes::{DataType, Field, FieldRef};
+use datafusion::logical_expr::{


Nice addition overall. One thing to fix here is the direct import from datafusion::logical_expr.

In datafusion/spark/Cargo.toml, datafusion is still an optional dependency behind the core feature, so this regresses a CI-critical configuration. cargo check -p datafusion-spark --no-default-features now fails with use of unresolved module or unlinked crate 'datafusion'.

The rest of datafusion-spark pulls these expression types from datafusion_expr, which keeps the no-default-features build working. Could we switch this import to match the existing pattern?

Thanks, fixed

This reverts commit 88a8503.

Impl spark compability quarter function

ad1576e

kazantsev-maksim marked this pull request as draft March 8, 2026 16:25

github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Mar 8, 2026

Kazantsev Maksim added 4 commits March 8, 2026 21:21

Fix tests

fd47524

Fix tests

d11e1ff

Fix tests

43eef61

add more tests

41fa4d1

kazantsev-maksim changed the title ~~park quarter function implementation~~ Spark quarter function implementation Mar 8, 2026

kazantsev-maksim marked this pull request as ready for review March 8, 2026 18:05

kosiew requested changes Mar 20, 2026

View reviewed changes

Kazantsev Maksim added 7 commits March 25, 2026 22:12

fix PR issues

fd6588e

fix PR issues

bbd95c5

fix PR issues

9ac943a

fix PR issues

55a9c28

test

dcea4d0

test

c1917c1

add more tests

97a47a6

Kazantsev Maksim and others added 3 commits March 26, 2026 19:32

add more tests

6fb1455

Merge branch 'main' into spark_quarter

8a43624

fix

7cb2b9f

kosiew requested changes Mar 27, 2026

View reviewed changes

Kazantsev Maksim added 4 commits March 28, 2026 09:04

fix

a6517fe

fix

67e7093

fix

528c6fc

fix

4830962

kazantsev-maksim requested a review from kosiew March 30, 2026 18:15

kosiew requested changes Mar 31, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into spark_quarter

231e662

Kazantsev Maksim added 5 commits March 31, 2026 14:52

Fix PR issues

88a8503

Revert "Fix PR issues"

167d747

This reverts commit 88a8503.

Fix PR issues

91738da

Fix PR issues

90dcdfa

Fix PR issues

b53949d

Conversation

kazantsev-maksim commented Mar 8, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kosiew left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kazantsev-maksim commented Mar 26, 2026

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kazantsev-maksim Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kosiew left a comment •

edited

Loading

kazantsev-maksim Mar 31, 2026 •

edited

Loading