[SPARK-57143][SQL][TESTS] Extend SQL test coverage for grouping analytics by vladimirg-db · Pull Request #56202 · apache/spark

vladimirg-db · 2026-05-29T12:34:46Z

What changes were proposed in this pull request?

This PR extends group-analytics.sql with additional query-level coverage for GROUPING SETS / CUBE / ROLLUP, exercising scenarios that were previously under-covered:

grouping_id() (no-arg and explicit-arg) across GROUPING SETS, CUBE, and ROLLUP.
Lateral column aliases that reference grouping() / grouping_id() results.
Aggregate functions in HAVING and ORDER BY over grouping analytics (including rolled-up groups and aggregate arguments that are also grouping keys).
Expression grouping keys, SELECT * with CUBE, and ordinal references inside ROLLUP / GROUPING SETS.
Struct field access inside aggregates over grouping analytics.
Scalar / EXISTS / NOT IN subqueries combined with grouping analytics.

The input data is defined as temporary views and each query is formatted multi-line for readability.

Why are the changes needed?

These combinations (notably aggregate functions in HAVING/ORDER BY over rolled-up groups, lateral column aliases over grouping functions, and struct field access) were not covered by the existing golden tests. Locking down the current, correct behavior guards against regressions.

Does this PR introduce any user-facing change?

No. Test-only change.

How was this patch tested?

Golden files regenerated with
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z group-analytics.sql" and the suite passes.

Was this patch authored or co-authored using generative AI tooling?

Yes.

Co-authored-by: Claude

vladimirg-db · 2026-05-29T12:57:04Z

cc @stefankandic @mihailoale-db @mihailotim-db

### What changes were proposed in this pull request? This PR extends `group-analytics.sql` with additional query-level coverage for GROUPING SETS / CUBE / ROLLUP, exercising combinations that were previously under-covered: - Aggregate functions in `HAVING` and `ORDER BY` over grouping analytics, including filtering/sorting rolled-up groups and aggregate arguments that are also grouping keys. - The no-argument `grouping_id()` function and lateral column aliases that reference `grouping()` / `grouping_id()` results. - `DISTINCT` aggregates and aggregate `FILTER (WHERE ...)` over grouping analytics, and a grouping function combined with an aggregate predicate in `HAVING`. - Struct field access inside aggregates over grouping analytics. - Uncorrelated subqueries (scalar / `IN` in the SELECT list, `IN` / `EXISTS` / `NOT IN` in `WHERE`) combined with grouping analytics, including the `NULL` grouping key of the grand-total row on the left side of `IN`. - Multiple, nested, and complex subqueries with grouping analytics: subqueries in several clauses at once, subqueries nested inside subqueries, subqueries whose inner query itself uses grouping analytics, subquery values combined with aggregates, and pre-aggregation correlated subqueries in `WHERE`. - Ordinal references inside ROLLUP / GROUPING SETS, a wide (34-column) grouping set, and the empty grouping set. - Negative cases: `grouping()` / `grouping_id()` on a non-grouping column, and a window function in `GROUP BY`. Input data is defined as temporary views, each query is formatted multi-line for readability, and all temporary views are dropped at the end of the file. ### Why are the changes needed? These combinations were not covered by the existing golden tests. Locking down the current behavior guards against regressions. ### Does this PR introduce any user-facing change? No. Test-only change. ### How was this patch tested? Golden files regenerated with `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z group-analytics.sql"` and the suite passes. ### Was this patch authored or co-authored using generative AI tooling? Yes.

dtenedor · 2026-05-29T21:43:25Z

LGTM, merging to master and 4.x

### What changes were proposed in this pull request? This PR extends `group-analytics.sql` with additional query-level coverage for GROUPING SETS / CUBE / ROLLUP, exercising scenarios that were previously under-covered: - `grouping_id()` (no-arg and explicit-arg) across GROUPING SETS, CUBE, and ROLLUP. - Lateral column aliases that reference `grouping()` / `grouping_id()` results. - Aggregate functions in `HAVING` and `ORDER BY` over grouping analytics (including rolled-up groups and aggregate arguments that are also grouping keys). - Expression grouping keys, `SELECT *` with CUBE, and ordinal references inside ROLLUP / GROUPING SETS. - Struct field access inside aggregates over grouping analytics. - Scalar / EXISTS / NOT IN subqueries combined with grouping analytics. The input data is defined as temporary views and each query is formatted multi-line for readability. ### Why are the changes needed? These combinations (notably aggregate functions in HAVING/ORDER BY over rolled-up groups, lateral column aliases over grouping functions, and struct field access) were not covered by the existing golden tests. Locking down the current, correct behavior guards against regressions. ### Does this PR introduce any user-facing change? No. Test-only change. ### How was this patch tested? Golden files regenerated with `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z group-analytics.sql"` and the suite passes. ### Was this patch authored or co-authored using generative AI tooling? Yes. Co-authored-by: Claude Closes #56202 from vladimirg-db/import-grouping-analytics-goldens. Authored-by: Vladimir Golubev <vladimir.golubev@databricks.com> Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com> (cherry picked from commit 54dbb38) Signed-off-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>

vladimirg-db force-pushed the import-grouping-analytics-goldens branch 2 times, most recently from 0c9f5d3 to 15195c7 Compare May 29, 2026 12:55

vladimirg-db changed the title ~~[SPARK-57143][SQL][TESTS] Add SQL test coverage for grouping analytics~~ [SPARK-57143][SQL][TESTS] Extend SQL test coverage for grouping analytics May 29, 2026

vladimirg-db force-pushed the import-grouping-analytics-goldens branch 2 times, most recently from 6fb5e26 to 94206df Compare May 29, 2026 13:01

vladimirg-db force-pushed the import-grouping-analytics-goldens branch from 94206df to 639ef42 Compare May 29, 2026 13:41

cloud-fan approved these changes May 29, 2026

View reviewed changes

mihailotim-db approved these changes May 29, 2026

View reviewed changes

dtenedor closed this in 54dbb38 May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57143][SQL][TESTS] Extend SQL test coverage for grouping analytics#56202

[SPARK-57143][SQL][TESTS] Extend SQL test coverage for grouping analytics#56202
vladimirg-db wants to merge 1 commit into
apache:masterfrom
vladimirg-db:import-grouping-analytics-goldens

vladimirg-db commented May 29, 2026

Uh oh!

vladimirg-db commented May 29, 2026

Uh oh!

dtenedor commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vladimirg-db commented May 29, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

vladimirg-db commented May 29, 2026

Uh oh!

dtenedor commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants