Skip to content

[SPARK-57168][SQL] Simplify GetTimestamp codegen under ANSI mode#56218

Open
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-get-timestamp-codegen
Open

[SPARK-57168][SQL] Simplify GetTimestamp codegen under ANSI mode#56218
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-get-timestamp-codegen

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Add DateTimeExpressionUtils.parseToTimestampExact(TimestampFormatter formatter, String input, long downScaleFactor, boolean forTimestampNTZ, String suggestedFuncOnFail) and route the ANSI (failOnError = true) eval and codegen paths of ToTimestamp (the base of GetTimestamp / to_timestamp, unix_timestamp, etc.) through it. The helper parses via parseWithoutTimeZone (TIMESTAMP_NTZ, no down-scaling) or parse + / downScaleFactor, and translates a DateTimeException (which also covers DateTimeParseException) or a ParseException to ansiDateTimeParseError; any other exception (e.g. IllegalStateException) propagates unchanged, matching the previous behavior.

ToTimestamp.doGenCode previously emitted the same 5-line try { parse } catch (DateTimeException) catch (ParseException) block at both string call sites (the cached-formatter path and the per-row-formatter path). A local parseTimestampCode now dispatches on failOnError: the ANSI branch emits a single parseToTimestampExact(...) call, while the non-ANSI branch keeps the inline try/catch -> isNull form (the same shape used by the already-merged MakeDate / MakeInterval cleanups). The eval path delegates to the same helper for consistency.

Why are the changes needed?

Part of SPARK-56908 (umbrella). Collapsing the duplicated inline try/catch to a single helper call shrinks the generated Java for the common to_timestamp / unix_timestamp family, helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *DateExpressionsSuite"

75/75 pass (exercised both with and without whole-stage codegen).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

@gengliangwang gengliangwang requested a review from LuciferYang May 31, 2026 02:31
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failures appear unrelated to this PR. Still, it's better to wait until CI turns green before merging.

if (failOnError) {
s"""${ev.value} = $utils.parseToTimestampExact(""" +
s"""$formatterExpr, $inputExpr, ${downScaleFactor}L, $forTimestampNTZ, """ +
s""""$suggestedFuncOnFail");"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s"""${ev.value} = $utils.parseToTimestampExact(
    |    $formatterExpr, $inputExpr, ${downScaleFactor}L,
    |    $forTimestampNTZ, "$suggestedFuncOnFail");""".stripMargin

Add `DateTimeExpressionUtils.parseToTimestampExact(TimestampFormatter formatter,
String input, long downScaleFactor, boolean forTimestampNTZ, String
suggestedFuncOnFail)` and route the ANSI (`failOnError = true`) eval and codegen
paths of `ToTimestamp` (the base of `GetTimestamp` / `to_timestamp` etc.) through
it. The helper parses via `parseWithoutTimeZone` (TIMESTAMP_NTZ, no down-scaling)
or `parse` + `/ downScaleFactor`, and translates a `DateTimeException` (which
covers `DateTimeParseException`) or `ParseException` to `ansiDateTimeParseError`;
any other exception (e.g. `IllegalStateException`) propagates unchanged, matching
the previous behavior.

`ToTimestamp.doGenCode` previously emitted the same 5-line
`try { parse } catch (DateTimeException) catch (ParseException)` block at both
string call sites (the cached-formatter path and the per-row-formatter path). A
local `parseTimestampCode` now dispatches on `failOnError`: the ANSI branch emits
a single `parseToTimestampExact(...)` call; the non-ANSI branch keeps the inline
`try/catch -> isNull`. The eval path delegates to the same helper for consistency.

Part of SPARK-56908 (umbrella). Collapsing the duplicated inline try/catch to a
single helper call shrinks the generated Java for the common `to_timestamp` /
`unix_timestamp` family.

No. The compiled behavior is identical; only the emitted Java source text changes.

```
build/sbt "catalyst/testOnly *DateExpressionsSuite"
```

75/75 pass (exercised both with and without whole-stage codegen).

Generated-by: Claude Code (Opus 4.8)

Co-authored-by: Isaac
@gengliangwang gengliangwang force-pushed the spark-get-timestamp-codegen branch from e12ec96 to 8b5c18e Compare May 31, 2026 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants