Skip to content

[AURON #2252] Add support for auron.never.convert.reason in Hudi scan#2293

Open
guixiaowen wants to merge 10 commits into
apache:masterfrom
guixiaowen:add_hudi_convert
Open

[AURON #2252] Add support for auron.never.convert.reason in Hudi scan#2293
guixiaowen wants to merge 10 commits into
apache:masterfrom
guixiaowen:add_hudi_convert

Conversation

@guixiaowen
Copy link
Copy Markdown
Contributor

… scenarios

Which issue does this PR close?

Closes #2252

Rationale for this change

In issue #1419, the reasons for fallback (i.e., why a conversion was not applied) are recorded using the property auron.never.convert.reason.

In issue #1471, these reasons can be observed in the Spark UI, helping users understand why the physical execution plan was not converted.

However, in the current Hudi fallback scenarios, the property auron.never.convert.reason is not recorded.

The purpose of this issue is to fill this gap by ensuring that auron.never.convert.reason is properly recorded in Hudi-related fallback cases as well.

What changes are included in this PR?

In scenarios where Hudi conversion fails, throw an exception to provide a prompt indicating why the conversion was not performed.

Are there any user-facing changes?

Nothing.

How was this patch tested?

UT

@guixiaowen guixiaowen changed the title [AURON #2252] Add support for auron.never.convert.reason in Hudi scan… [AURON #2252] Add support for auron.never.convert.reason in Hudi scan May 31, 2026
@github-actions github-actions Bot added the spark label May 31, 2026
@guixiaowen
Copy link
Copy Markdown
Contributor Author

@slfan1989 @cxzl25 @yew1eb y Could you please take a look at this PR? Thank you very much!

} yield major == 3 && minor >= 0 && minor <= 5).getOrElse(false)
assert(
SparkAuronConfiguration.ENABLE_HUDI_SCAN.get(),
"Conversion disabled: auron.enable.hudi.scan=false.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can auron.enable.hudi.scan be used directly with ENABLE_HUDI_SCAN.key()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can auron.enable.hudi.scan be used directly with ENABLE_HUDI_SCAN.key()

@cxzl25 Thank you so much for reviewing the code. I've updated this part accordingly. Really appreciate your help!

}
assert(
SparkAuronConfiguration.ENABLE_SCAN_PARQUET.get(),
"Conversion disabled: auron.enable.scan.parquet=false.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@cxzl25 Thank you so much for reviewing the code. I've updated this part accordingly. Really appreciate your help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Spark plan conversion diagnostics for Hudi-backed FileSourceScanExec by ensuring the auron.never.convert.reason tag is preserved/recorded when Hudi scan conversion is skipped or falls back, enabling clearer explanations in Spark UI.

Changes:

  • Add assertion-based failure reasons in the Hudi scan convert provider so fallback scenarios carry an explicit “why” message.
  • Preserve an existing auron.never.convert.reason tag rather than overwriting it with a generic “not supported yet” message.
  • Add a Hudi unit test that validates auron.never.convert.reason is set for a timestamp-based fallback case.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
thirdparty/auron-hudi/src/main/scala/org/apache/spark/sql/auron/hudi/HudiConvertProvider.scala Emit specific failure reasons via assertions during Hudi scan enable/support/convert checks.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Catch provider enable/support failures for scans and record auron.never.convert.reason.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConvertStrategy.scala Avoid overwriting an already-populated auron.never.convert.reason.
thirdparty/auron-hudi/src/test/scala/org/apache/spark/sql/auron/hudi/HudiScanSupportSuite.scala Add a test asserting the never-convert reason is recorded for a Hudi fallback scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 80 to 88
case Some(HudiScanSupport.OrcFormat) =>
if (!SparkAuronConfiguration.ENABLE_SCAN_ORC.get()) {
return exec
}
assert(
SparkAuronConfiguration.ENABLE_SCAN_PARQUET.get(),
"Conversion disabled: auron.enable.scan.orc=false.")
// ORC follows the same timestamp fallback rule as Parquet.
assert(
SparkAuronConfiguration.ENABLE_SCAN_ORC_TIMESTAMP.get(),
"Conversion disabled: auron.enable.scan.orc.timestamp=false.")
if (!SparkAuronConfiguration.ENABLE_SCAN_ORC_TIMESTAMP.get()) {
Comment thread spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Outdated
return exec
}
assert(
SparkAuronConfiguration.ENABLE_SCAN_PARQUET.get(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: wrong config checked here. This is the ORC branch but it asserts ENABLE_SCAN_PARQUET instead of ENABLE_SCAN_ORC. Should be:

assert(
  SparkAuronConfiguration.ENABLE_SCAN_ORC.get(),
  s"Conversion disabled: ${SparkAuronConfiguration.ENABLE_SCAN_ORC.key()} is false.")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: wrong config checked here. This is the ORC branch but it asserts ENABLE_SCAN_PARQUET instead of ENABLE_SCAN_ORC. Should be:

assert(
  SparkAuronConfiguration.ENABLE_SCAN_ORC.get(),
  s"Conversion disabled: ${SparkAuronConfiguration.ENABLE_SCAN_ORC.key()} is false.")

@yew1eb ye Thank you very much for the code review. I have made the changes accordingly. Really appreciate your guidance!

SparkAuronConfiguration.ENABLE_HUDI_SCAN.get(),
s"Conversion disabled: ${SparkAuronConfiguration.ENABLE_HUDI_SCAN.key()} is false.")
assert(supported, "Conversion disabled: Supported Spark versions: 3.0 to 3.5.")
SparkAuronConfiguration.ENABLE_HUDI_SCAN.get() && supported
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts in isEnabled change the contract of this method: it's supposed to return false when not enabled (so find can skip it), but now it throws instead. The exception is caught upstream in AuronConverters, so it works — but it's fragile: if both isEnabled and isSupported throw, only the first reason is ever recorded. Also, the final SparkAuronConfiguration.ENABLE_HUDI_SCAN.get() && supported is now dead code — it can never be reached.

Consider keeping isEnabled/isSupported as pure predicates (return false) and recording the reason only in convert, where the context is clearer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts in isEnabled change the contract of this method: it's supposed to return false when not enabled (so find can skip it), but now it throws instead. The exception is caught upstream in AuronConverters, so it works — but it's fragile: if both isEnabled and isSupported throw, only the first reason is ever recorded. Also, the final SparkAuronConfiguration.ENABLE_HUDI_SCAN.get() && supported is now dead code — it can never be reached.

Consider keeping isEnabled/isSupported as pure predicates (return false) and recording the reason only in convert, where the context is clearer.

@yew1eb ye Thank you very much for the code review.

Regarding isEnabled and isSupported – they are called within AuronConverters. If isEnabled returns false, isSupported should not be called at all, so I believe both methods will not throw exceptions at the same time. The reasons for not being converted are different between isEnabled and isSupported, and those reasons are only known within these methods, which is why I added the assertions there.

May I ask if your suggestion is to keep isEnabled and isSupported as pure predicates that simply return true or false – without handling too many scenarios inside them – and move the logic for checking Spark version and Hudi format type into the convert method instead?

Is my understanding correct? If so, I will make the changes accordingly. I would really appreciate your confirmation.

Thank you again for your guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for auron.never.convert.reason in Hudi scan scenarios

4 participants