Unify cast handling by removing CastColumnExpr branches in pruning and ordering equivalence#21545
Unify cast handling by removing CastColumnExpr branches in pruning and ordering equivalence#21545adriangb merged 5 commits intoapache:mainfrom
CastColumnExpr branches in pruning and ordering equivalence#21545Conversation
Eliminate CastColumnExpr compatibility in pruning logic. Consolidate equivalence cast substitution to handle only unified CastExpr. Remove redundant CastColumn-specific equivalence regression from dependency imports.
Rebuild unified CastExpr to preserve target field metadata using new_with_target_field(...). Update regression test in dependency.rs to utilize this field-aware CastExpr for better consistency and reliability in test cases.
Renamed the private helper `substitute_cast_like_ordering` to `substitute_cast_ordering`. Replaced the mutable substitution push loop with an iterator-based collection for better performance. Inlined single-use `arrow_schema` and `from_type` locals in cast pruning branches. Simplified the monotonic sort regression test by removing over-general `sort_columns`, collapsing one-element equality vectors, and using distinct case names for clarity.
Remove the unnecessary intermediate Vec in mod.rs. Implement rewrite_cast_child_to_prunable in pruning_predicate.rs and reuse it for both CastExpr and TryCastExpr to streamline casting operation handling.
|
@alamb |
| // Predicate pruning / statistics generally don't support struct columns yet. | ||
| // In the future we may want to support pruning on nested fields, in which case we probably need to | ||
| // do something more sophisticated here. | ||
| // But for now since we don't support pruning on nested fields, we can just cast to the target type directly. | ||
| let left = Arc::new(phys_expr::CastExpr::new(left, to_type.clone(), None)); |
There was a problem hiding this comment.
Since this comment was removed I'm curious what the current state of this is.
AFAIK:
- PruningPredicate doesn't support nested fields (did that change in this PR? if so we should have tests)
- Statistics extraction for Parquet doesn't support nested fields (hence even if PruningPredicate did e2e would still not work); since PruningPredicate ultimately produces a stats expression to evaluate against a RecordBatch of stats we'd need some agreement on the column names for nested expressions (
"get_field("col", 'field')_min"?).
If this comment is still true lets preserve it.
There was a problem hiding this comment.
Good point. This PR does not add nested-field pruning support, and I agree the old comment was still documenting a real limitation rather than a CastColumnExpr-only detail. The branch was collapsed as part of the CastExpr unification, but the nested-field/statistics naming question is still unchanged.
I will restore the note in the unified CastExpr handling path, adjusted to avoid referencing CastColumnExpr, and keep it clear that this PR only preserves existing behavior. I do not plan to add nested-field pruning tests here because this PR intentionally does not change that behavior; those should come with the future nested-field pruning/statistics work.
Include an adapted limitation comment alongside the unified CastExpr reconstruction to clarify behavior regarding nested fields in type casting operations. This enhances code readability and maintainability.
Which issue does this PR close?
Rationale for this change
Downstream components in DataFusion currently branch on both
CastExprandCastColumnExpr, even though they represent equivalent semantics after recent cast unification work. This duplication increases maintenance burden, introduces unnecessary complexity, and raises the risk of inconsistent behavior when evolving cast-related logic.This PR simplifies downstream logic by collapsing these dual branches into a single
CastExpr-based handling path, aligning with the broader cast unification effort.What changes are included in this PR?
Removed
CastColumnExprhandling in downstream logic:Eliminated branching specific to
CastColumnExprin:equivalence/properties/mod.rspruning/pruning_predicate.rsUnified cast substitution logic in ordering equivalence:
Replaced
substitute_cast_like_orderingwithsubstitute_cast_orderingSimplified implementation to only consider
CastExprPreserved correctness by ensuring:
Refactored pruning rewrite logic:
rewrite_cast_child_to_prunableCastExprandTryCastExprhandling to reuse the helperTest updates and cleanup:
CastExpr::new_with_target_fieldCastColumnExprMinor code cleanups:
Are these changes tested?
Yes.
Existing pruning and equivalence tests were updated to reflect the unified cast handling.
Obsolete tests relying on
CastColumnExprwere removed.Test cases were adjusted to ensure behavior remains unchanged, including:
These updates ensure no regression in behavior while validating the simplified implementation.
Are there any user-facing changes?
No.
This change is an internal refactor and does not modify user-facing APIs or query behavior. It preserves existing semantics while improving maintainability.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.