[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager createTable(tableName, path) by haoyangeng-db · Pull Request #56211 · apache/spark

haoyangeng-db · 2026-05-29T20:02:09Z

What changes were proposed in this pull request?

SPARK-52812 (#56064) made Spark Connect Catalog.createTable eager by re-routing the two-argument createTable(tableName, path) overload through createTable(tableName, path, "parquet"). That hardcodes the parquet provider and drops the spark.sql.sources.default fallback that the overload previously relied on.

This PR restores the original behavior: the two-argument overload again leaves the source unset so the server resolves spark.sql.sources.default, while keeping the eager execution introduced by SPARK-52812. A regression test is added to CatalogSuite.

Why are the changes needed?

The two-argument createTable(tableName, path) overload is documented as "It will use the default data source configured by spark.sql.sources.default." After SPARK-52812 it always used parquet regardless of that configuration, contradicting its own contract and the classic Catalog behavior.

Does this PR introduce any user-facing change?

Yes, within the unreleased master branch. spark.catalog.createTable(tableName, path) on Spark Connect once again honors spark.sql.sources.default instead of always creating a parquet table. The eager-execution behavior from SPARK-52812 is preserved.

How was this patch tested?

Added a regression test in CatalogSuite that sets spark.sql.sources.default to json, writes JSON data, creates the table via the two-argument overload, and asserts the resulting table uses the json provider and is readable. The test fails on the previous hardcoded-parquet behavior.

Was this patch authored or co-authored using generative AI tooling?

Co-authored with Claude Code.

…reateTable(tableName, path) ### What changes were proposed in this pull request? SPARK-52812 (apache#56064) made Spark Connect `Catalog.createTable` eager by re-routing the two-argument `createTable(tableName, path)` overload through `createTable(tableName, path, "parquet")`. That hardcodes the parquet provider and drops the `spark.sql.sources.default` fallback that the overload previously relied on. This PR restores the original behavior: the two-argument overload again leaves the source unset so the server resolves `spark.sql.sources.default`, while keeping the eager execution introduced by SPARK-52812. A regression test is added to `CatalogSuite`. ### Why are the changes needed? The two-argument `createTable(tableName, path)` overload is documented as "It will use the default data source configured by spark.sql.sources.default." After SPARK-52812 it always used parquet regardless of that configuration, contradicting its own contract and the classic Catalog behavior. ### Does this PR introduce _any_ user-facing change? Yes, within the unreleased master branch. `spark.catalog.createTable(tableName, path)` on Spark Connect once again honors `spark.sql.sources.default` instead of always creating a parquet table. The eager-execution behavior from SPARK-52812 is preserved. ### How was this patch tested? Added a regression test in `CatalogSuite` that sets `spark.sql.sources.default` to `json`, writes JSON data, creates the table via the two-argument overload, and asserts the resulting table uses the json provider and is readable. The test fails on the previous hardcoded-parquet behavior. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager createTable(tableName, path)#56211

[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager createTable(tableName, path)#56211
haoyangeng-db wants to merge 1 commit into
apache:masterfrom
haoyangeng-db:spark-52812-followup-createtable-default-source

haoyangeng-db commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

haoyangeng-db commented May 29, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant