[SPARK-57149][SQL] Skip statically-dead common sub-expression scaffold in Project and aggregate codegen#56210
Closed
gengliangwang wants to merge 1 commit into
Closed
Conversation
…d in Project and aggregate codegen ### What changes were proposed in this pull request? This is a sub-task of [SPARK-56908](https://issues.apache.org/jira/browse/SPARK-56908). `ProjectExec.doConsume` and the three aggregate codegen sites (`AggregateCodegenSupport` and two sites in `HashAggregateExec`) unconditionally emit a `// common sub-expressions` scaffold -- the comment plus the subexpression-elimination (CSE) code blocks -- into the generated Java, even when subexpression elimination is disabled or finds no common subexpressions. In that (overwhelmingly common) case the CSE interpolations are empty strings, so each operator contributes a dead comment line plus blank lines to every generated stage. This PR gates the scaffold so it is only emitted when CSE actually produced code, mirroring how `FilterExec` already gates its CSE block (SPARK-56032). `ProjectExec` still calls `evaluateVariables` unconditionally for its code-clearing side effect. ### Why are the changes needed? Reduces the size and Janino parse cost of the Java emitted by whole-stage codegen, per the SPARK-56908 umbrella, by not emitting text that is statically dead at codegen time. ### Does this PR introduce _any_ user-facing change? No. Only the generated code's dead comments/blank scaffolding change; behavior and results are identical. ### How was this patch tested? New test in `WholeStageCodegenSuite` asserting the scaffold is absent for a plain projection and a simple aggregate, present for a projection with a repeated subexpression, with `checkAnswer` for correctness. Also ran `WholeStageCodegenSuite`, `SubexpressionEliminationSuite`, and `DataFrameAggregateSuite` (207 tests passed). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8) Co-authored-by: Isaac
Member
Author
|
The value of this one is small. I just just close it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This is a sub-task of SPARK-56908.
ProjectExec.doConsumeand the three aggregate codegen sites (AggregateCodegenSupportand two sites inHashAggregateExec) unconditionally emit a// common sub-expressionsscaffold -- the comment plus the subexpression-elimination (CSE) code blocks -- into the generated Java, even when subexpression elimination is disabled or finds no common subexpressions. In that (overwhelmingly common) case the CSE interpolations are empty strings, so each operator contributes a dead comment line plus blank lines to every generated stage.This PR gates the scaffold so it is only emitted when CSE actually produced code, mirroring how
FilterExecalready gates its CSE block (SPARK-56032).ProjectExecstill callsevaluateVariablesunconditionally for its code-clearing side effect.Why are the changes needed?
Reduces the size and Janino parse cost of the Java emitted by whole-stage codegen, per the SPARK-56908 umbrella, by not emitting text that is statically dead at codegen time.
Does this PR introduce any user-facing change?
No. Only the generated code's dead comments/blank scaffolding change; behavior and results are identical.
How was this patch tested?
New test in
WholeStageCodegenSuiteasserting the scaffold is absent for a plain projection and a simple aggregate, present for a projection with a repeated subexpression, withcheckAnswerfor correctness. Also ranWholeStageCodegenSuite,SubexpressionEliminationSuite, andDataFrameAggregateSuite(207 tests passed).Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)