[SPARK-52812][CONNECT] Preserve spark.sql.sources.default for eager createTable(tableName, path)#56211
Open
haoyangeng-db wants to merge 1 commit into
Conversation
…reateTable(tableName, path) ### What changes were proposed in this pull request? SPARK-52812 (apache#56064) made Spark Connect `Catalog.createTable` eager by re-routing the two-argument `createTable(tableName, path)` overload through `createTable(tableName, path, "parquet")`. That hardcodes the parquet provider and drops the `spark.sql.sources.default` fallback that the overload previously relied on. This PR restores the original behavior: the two-argument overload again leaves the source unset so the server resolves `spark.sql.sources.default`, while keeping the eager execution introduced by SPARK-52812. A regression test is added to `CatalogSuite`. ### Why are the changes needed? The two-argument `createTable(tableName, path)` overload is documented as "It will use the default data source configured by spark.sql.sources.default." After SPARK-52812 it always used parquet regardless of that configuration, contradicting its own contract and the classic Catalog behavior. ### Does this PR introduce _any_ user-facing change? Yes, within the unreleased master branch. `spark.catalog.createTable(tableName, path)` on Spark Connect once again honors `spark.sql.sources.default` instead of always creating a parquet table. The eager-execution behavior from SPARK-52812 is preserved. ### How was this patch tested? Added a regression test in `CatalogSuite` that sets `spark.sql.sources.default` to `json`, writes JSON data, creates the table via the two-argument overload, and asserts the resulting table uses the json provider and is readable. The test fails on the previous hardcoded-parquet behavior. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
SPARK-52812 (#56064) made Spark Connect
Catalog.createTableeager by re-routing the two-argumentcreateTable(tableName, path)overload throughcreateTable(tableName, path, "parquet"). That hardcodes the parquet provider and drops thespark.sql.sources.defaultfallback that the overload previously relied on.This PR restores the original behavior: the two-argument overload again leaves the source unset so the server resolves
spark.sql.sources.default, while keeping the eager execution introduced by SPARK-52812. A regression test is added toCatalogSuite.Why are the changes needed?
The two-argument
createTable(tableName, path)overload is documented as "It will use the default data source configured by spark.sql.sources.default." After SPARK-52812 it always used parquet regardless of that configuration, contradicting its own contract and the classic Catalog behavior.Does this PR introduce any user-facing change?
Yes, within the unreleased master branch.
spark.catalog.createTable(tableName, path)on Spark Connect once again honorsspark.sql.sources.defaultinstead of always creating a parquet table. The eager-execution behavior from SPARK-52812 is preserved.How was this patch tested?
Added a regression test in
CatalogSuitethat setsspark.sql.sources.defaulttojson, writes JSON data, creates the table via the two-argument overload, and asserts the resulting table uses the json provider and is readable. The test fails on the previous hardcoded-parquet behavior.Was this patch authored or co-authored using generative AI tooling?
Co-authored with Claude Code.