[SPARK-57143][SQL][TESTS] Extend SQL test coverage for grouping analytics#56202
Closed
vladimirg-db wants to merge 1 commit into
Closed
[SPARK-57143][SQL][TESTS] Extend SQL test coverage for grouping analytics#56202vladimirg-db wants to merge 1 commit into
vladimirg-db wants to merge 1 commit into
Conversation
0c9f5d3 to
15195c7
Compare
Contributor
Author
6fb5e26 to
94206df
Compare
### What changes were proposed in this pull request? This PR extends `group-analytics.sql` with additional query-level coverage for GROUPING SETS / CUBE / ROLLUP, exercising combinations that were previously under-covered: - Aggregate functions in `HAVING` and `ORDER BY` over grouping analytics, including filtering/sorting rolled-up groups and aggregate arguments that are also grouping keys. - The no-argument `grouping_id()` function and lateral column aliases that reference `grouping()` / `grouping_id()` results. - `DISTINCT` aggregates and aggregate `FILTER (WHERE ...)` over grouping analytics, and a grouping function combined with an aggregate predicate in `HAVING`. - Struct field access inside aggregates over grouping analytics. - Uncorrelated subqueries (scalar / `IN` in the SELECT list, `IN` / `EXISTS` / `NOT IN` in `WHERE`) combined with grouping analytics, including the `NULL` grouping key of the grand-total row on the left side of `IN`. - Multiple, nested, and complex subqueries with grouping analytics: subqueries in several clauses at once, subqueries nested inside subqueries, subqueries whose inner query itself uses grouping analytics, subquery values combined with aggregates, and pre-aggregation correlated subqueries in `WHERE`. - Ordinal references inside ROLLUP / GROUPING SETS, a wide (34-column) grouping set, and the empty grouping set. - Negative cases: `grouping()` / `grouping_id()` on a non-grouping column, and a window function in `GROUP BY`. Input data is defined as temporary views, each query is formatted multi-line for readability, and all temporary views are dropped at the end of the file. ### Why are the changes needed? These combinations were not covered by the existing golden tests. Locking down the current behavior guards against regressions. ### Does this PR introduce any user-facing change? No. Test-only change. ### How was this patch tested? Golden files regenerated with `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z group-analytics.sql"` and the suite passes. ### Was this patch authored or co-authored using generative AI tooling? Yes.
94206df to
639ef42
Compare
cloud-fan
approved these changes
May 29, 2026
mihailotim-db
approved these changes
May 29, 2026
Contributor
|
LGTM, merging to master and 4.x |
dtenedor
pushed a commit
that referenced
this pull request
May 29, 2026
### What changes were proposed in this pull request? This PR extends `group-analytics.sql` with additional query-level coverage for GROUPING SETS / CUBE / ROLLUP, exercising scenarios that were previously under-covered: - `grouping_id()` (no-arg and explicit-arg) across GROUPING SETS, CUBE, and ROLLUP. - Lateral column aliases that reference `grouping()` / `grouping_id()` results. - Aggregate functions in `HAVING` and `ORDER BY` over grouping analytics (including rolled-up groups and aggregate arguments that are also grouping keys). - Expression grouping keys, `SELECT *` with CUBE, and ordinal references inside ROLLUP / GROUPING SETS. - Struct field access inside aggregates over grouping analytics. - Scalar / EXISTS / NOT IN subqueries combined with grouping analytics. The input data is defined as temporary views and each query is formatted multi-line for readability. ### Why are the changes needed? These combinations (notably aggregate functions in HAVING/ORDER BY over rolled-up groups, lateral column aliases over grouping functions, and struct field access) were not covered by the existing golden tests. Locking down the current, correct behavior guards against regressions. ### Does this PR introduce any user-facing change? No. Test-only change. ### How was this patch tested? Golden files regenerated with `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z group-analytics.sql"` and the suite passes. ### Was this patch authored or co-authored using generative AI tooling? Yes. Co-authored-by: Claude Closes #56202 from vladimirg-db/import-grouping-analytics-goldens. Authored-by: Vladimir Golubev <vladimir.golubev@databricks.com> Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> (cherry picked from commit 54dbb38) Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR extends
group-analytics.sqlwith additional query-level coverage for GROUPING SETS / CUBE / ROLLUP, exercising scenarios that were previously under-covered:grouping_id()(no-arg and explicit-arg) across GROUPING SETS, CUBE, and ROLLUP.grouping()/grouping_id()results.HAVINGandORDER BYover grouping analytics (including rolled-up groups and aggregate arguments that are also grouping keys).SELECT *with CUBE, and ordinal references inside ROLLUP / GROUPING SETS.The input data is defined as temporary views and each query is formatted multi-line for readability.
Why are the changes needed?
These combinations (notably aggregate functions in HAVING/ORDER BY over rolled-up groups, lateral column aliases over grouping functions, and struct field access) were not covered by the existing golden tests. Locking down the current, correct behavior guards against regressions.
Does this PR introduce any user-facing change?
No. Test-only change.
How was this patch tested?
Golden files regenerated with
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z group-analytics.sql"and the suite passes.Was this patch authored or co-authored using generative AI tooling?
Yes.
Co-authored-by: Claude