Skip to content

[SPARK-57149][SQL] Skip statically-dead common sub-expression scaffold in Project and aggregate codegen#56210

Closed
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-project-cse-codegen
Closed

[SPARK-57149][SQL] Skip statically-dead common sub-expression scaffold in Project and aggregate codegen#56210
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-project-cse-codegen

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This is a sub-task of SPARK-56908.

ProjectExec.doConsume and the three aggregate codegen sites (AggregateCodegenSupport and two sites in HashAggregateExec) unconditionally emit a // common sub-expressions scaffold -- the comment plus the subexpression-elimination (CSE) code blocks -- into the generated Java, even when subexpression elimination is disabled or finds no common subexpressions. In that (overwhelmingly common) case the CSE interpolations are empty strings, so each operator contributes a dead comment line plus blank lines to every generated stage.

This PR gates the scaffold so it is only emitted when CSE actually produced code, mirroring how FilterExec already gates its CSE block (SPARK-56032). ProjectExec still calls evaluateVariables unconditionally for its code-clearing side effect.

Why are the changes needed?

Reduces the size and Janino parse cost of the Java emitted by whole-stage codegen, per the SPARK-56908 umbrella, by not emitting text that is statically dead at codegen time.

Does this PR introduce any user-facing change?

No. Only the generated code's dead comments/blank scaffolding change; behavior and results are identical.

How was this patch tested?

New test in WholeStageCodegenSuite asserting the scaffold is absent for a plain projection and a simple aggregate, present for a projection with a repeated subexpression, with checkAnswer for correctness. Also ran WholeStageCodegenSuite, SubexpressionEliminationSuite, and DataFrameAggregateSuite (207 tests passed).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

…d in Project and aggregate codegen

### What changes were proposed in this pull request?

This is a sub-task of [SPARK-56908](https://issues.apache.org/jira/browse/SPARK-56908).

`ProjectExec.doConsume` and the three aggregate codegen sites
(`AggregateCodegenSupport` and two sites in `HashAggregateExec`) unconditionally emit a
`// common sub-expressions` scaffold -- the comment plus the subexpression-elimination (CSE)
code blocks -- into the generated Java, even when subexpression elimination is disabled or
finds no common subexpressions. In that (overwhelmingly common) case the CSE interpolations
are empty strings, so each operator contributes a dead comment line plus blank lines to every
generated stage.

This PR gates the scaffold so it is only emitted when CSE actually produced code, mirroring
how `FilterExec` already gates its CSE block (SPARK-56032). `ProjectExec` still calls
`evaluateVariables` unconditionally for its code-clearing side effect.

### Why are the changes needed?

Reduces the size and Janino parse cost of the Java emitted by whole-stage codegen, per the
SPARK-56908 umbrella, by not emitting text that is statically dead at codegen time.

### Does this PR introduce _any_ user-facing change?

No. Only the generated code's dead comments/blank scaffolding change; behavior and results
are identical.

### How was this patch tested?

New test in `WholeStageCodegenSuite` asserting the scaffold is absent for a plain projection
and a simple aggregate, present for a projection with a repeated subexpression, with
`checkAnswer` for correctness. Also ran `WholeStageCodegenSuite`, `SubexpressionEliminationSuite`,
and `DataFrameAggregateSuite` (207 tests passed).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Co-authored-by: Isaac
@gengliangwang
Copy link
Copy Markdown
Member Author

The value of this one is small. I just just close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant