[AURON #2252] Add support for auron.never.convert.reason in Hudi scan#2293
[AURON #2252] Add support for auron.never.convert.reason in Hudi scan#2293guixiaowen wants to merge 10 commits into
Conversation
…i scan scenarios
…i scan scenarios
…i scan scenarios
|
@slfan1989 @cxzl25 @yew1eb y Could you please take a look at this PR? Thank you very much! |
| } yield major == 3 && minor >= 0 && minor <= 5).getOrElse(false) | ||
| assert( | ||
| SparkAuronConfiguration.ENABLE_HUDI_SCAN.get(), | ||
| "Conversion disabled: auron.enable.hudi.scan=false.") |
There was a problem hiding this comment.
Can auron.enable.hudi.scan be used directly with ENABLE_HUDI_SCAN.key()
There was a problem hiding this comment.
Can
auron.enable.hudi.scanbe used directly withENABLE_HUDI_SCAN.key()
@cxzl25 Thank you so much for reviewing the code. I've updated this part accordingly. Really appreciate your help!
| } | ||
| assert( | ||
| SparkAuronConfiguration.ENABLE_SCAN_PARQUET.get(), | ||
| "Conversion disabled: auron.enable.scan.parquet=false.") |
There was a problem hiding this comment.
ditto
@cxzl25 Thank you so much for reviewing the code. I've updated this part accordingly. Really appreciate your help!
There was a problem hiding this comment.
Pull request overview
This PR improves Spark plan conversion diagnostics for Hudi-backed FileSourceScanExec by ensuring the auron.never.convert.reason tag is preserved/recorded when Hudi scan conversion is skipped or falls back, enabling clearer explanations in Spark UI.
Changes:
- Add assertion-based failure reasons in the Hudi scan convert provider so fallback scenarios carry an explicit “why” message.
- Preserve an existing
auron.never.convert.reasontag rather than overwriting it with a generic “not supported yet” message. - Add a Hudi unit test that validates
auron.never.convert.reasonis set for a timestamp-based fallback case.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
thirdparty/auron-hudi/src/main/scala/org/apache/spark/sql/auron/hudi/HudiConvertProvider.scala |
Emit specific failure reasons via assertions during Hudi scan enable/support/convert checks. |
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala |
Catch provider enable/support failures for scans and record auron.never.convert.reason. |
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConvertStrategy.scala |
Avoid overwriting an already-populated auron.never.convert.reason. |
thirdparty/auron-hudi/src/test/scala/org/apache/spark/sql/auron/hudi/HudiScanSupportSuite.scala |
Add a test asserting the never-convert reason is recorded for a Hudi fallback scenario. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case Some(HudiScanSupport.OrcFormat) => | ||
| if (!SparkAuronConfiguration.ENABLE_SCAN_ORC.get()) { | ||
| return exec | ||
| } | ||
| assert( | ||
| SparkAuronConfiguration.ENABLE_SCAN_PARQUET.get(), | ||
| "Conversion disabled: auron.enable.scan.orc=false.") | ||
| // ORC follows the same timestamp fallback rule as Parquet. | ||
| assert( | ||
| SparkAuronConfiguration.ENABLE_SCAN_ORC_TIMESTAMP.get(), | ||
| "Conversion disabled: auron.enable.scan.orc.timestamp=false.") | ||
| if (!SparkAuronConfiguration.ENABLE_SCAN_ORC_TIMESTAMP.get()) { |
| return exec | ||
| } | ||
| assert( | ||
| SparkAuronConfiguration.ENABLE_SCAN_PARQUET.get(), |
There was a problem hiding this comment.
Bug: wrong config checked here. This is the ORC branch but it asserts ENABLE_SCAN_PARQUET instead of ENABLE_SCAN_ORC. Should be:
assert(
SparkAuronConfiguration.ENABLE_SCAN_ORC.get(),
s"Conversion disabled: ${SparkAuronConfiguration.ENABLE_SCAN_ORC.key()} is false.")There was a problem hiding this comment.
Bug: wrong config checked here. This is the ORC branch but it asserts
ENABLE_SCAN_PARQUETinstead ofENABLE_SCAN_ORC. Should be:assert( SparkAuronConfiguration.ENABLE_SCAN_ORC.get(), s"Conversion disabled: ${SparkAuronConfiguration.ENABLE_SCAN_ORC.key()} is false.")
@yew1eb ye Thank you very much for the code review. I have made the changes accordingly. Really appreciate your guidance!
| SparkAuronConfiguration.ENABLE_HUDI_SCAN.get(), | ||
| s"Conversion disabled: ${SparkAuronConfiguration.ENABLE_HUDI_SCAN.key()} is false.") | ||
| assert(supported, "Conversion disabled: Supported Spark versions: 3.0 to 3.5.") | ||
| SparkAuronConfiguration.ENABLE_HUDI_SCAN.get() && supported |
There was a problem hiding this comment.
The asserts in isEnabled change the contract of this method: it's supposed to return false when not enabled (so find can skip it), but now it throws instead. The exception is caught upstream in AuronConverters, so it works — but it's fragile: if both isEnabled and isSupported throw, only the first reason is ever recorded. Also, the final SparkAuronConfiguration.ENABLE_HUDI_SCAN.get() && supported is now dead code — it can never be reached.
Consider keeping isEnabled/isSupported as pure predicates (return false) and recording the reason only in convert, where the context is clearer.
There was a problem hiding this comment.
The asserts in
isEnabledchange the contract of this method: it's supposed to returnfalsewhen not enabled (sofindcan skip it), but now it throws instead. The exception is caught upstream inAuronConverters, so it works — but it's fragile: if bothisEnabledandisSupportedthrow, only the first reason is ever recorded. Also, the finalSparkAuronConfiguration.ENABLE_HUDI_SCAN.get() && supportedis now dead code — it can never be reached.Consider keeping
isEnabled/isSupportedas pure predicates (returnfalse) and recording the reason only inconvert, where the context is clearer.
@yew1eb ye Thank you very much for the code review.
Regarding isEnabled and isSupported – they are called within AuronConverters. If isEnabled returns false, isSupported should not be called at all, so I believe both methods will not throw exceptions at the same time. The reasons for not being converted are different between isEnabled and isSupported, and those reasons are only known within these methods, which is why I added the assertions there.
May I ask if your suggestion is to keep isEnabled and isSupported as pure predicates that simply return true or false – without handling too many scenarios inside them – and move the logic for checking Spark version and Hudi format type into the convert method instead?
Is my understanding correct? If so, I will make the changes accordingly. I would really appreciate your confirmation.
Thank you again for your guidance!
… scenarios
Which issue does this PR close?
Closes #2252
Rationale for this change
In issue #1419, the reasons for fallback (i.e., why a conversion was not applied) are recorded using the property auron.never.convert.reason.
In issue #1471, these reasons can be observed in the Spark UI, helping users understand why the physical execution plan was not converted.
However, in the current Hudi fallback scenarios, the property auron.never.convert.reason is not recorded.
The purpose of this issue is to fill this gap by ensuring that auron.never.convert.reason is properly recorded in Hudi-related fallback cases as well.
What changes are included in this PR?
In scenarios where Hudi conversion fails, throw an exception to provide a prompt indicating why the conversion was not performed.
Are there any user-facing changes?
Nothing.
How was this patch tested?
UT