Widen expression predicate term field type for mypy compatibility#3421
Widen expression predicate term field type for mypy compatibility#3421aashish-g03 wants to merge 2 commits into
Conversation
Without the pydantic mypy plugin, mypy generates __init__ signatures from field declarations rather than from explicit __init__ overrides. This caused EqualTo(term="col", value=42) to produce: error: Argument "term" has incompatible type "str"; expected "UnboundTerm" Fix: widen field type from UnboundTerm to str | UnboundTerm with a BeforeValidator that coerces str to Reference (matching the existing _to_unbound_term helper). The validator ensures the stored value is always UnboundTerm at runtime. Closes apache#3101
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates UnboundPredicate/LiteralPredicate term field validation to accept raw string terms via Pydantic v2 BeforeValidator, improving model parsing ergonomics.
Changes:
- Introduces
Annotated+BeforeValidator(_to_unbound_term)ontermfields to coercestrintoReference. - Adds mypy
type: ignore[union-attr]inbind()methods due to widenedtermtype. - Updates imports to include
AnnotatedandBeforeValidator.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| model_config = ConfigDict(arbitrary_types_allowed=True) | ||
|
|
||
| term: UnboundTerm | ||
| term: Annotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)] |
There was a problem hiding this comment.
Thanks for the suggestion. I considered Annotated[UnboundTerm, BeforeValidator(_to_unbound_term)] first, but it does not fix the issue.
Without the pydantic mypy plugin (which this project does not use), mypy generates __init__ signatures from the outer Annotated type. So Annotated[UnboundTerm, ...] still produces:
error: Argument "term" has incompatible type "str"; expected "UnboundTerm" [arg-type]
I verified this with a minimal repro — only str | UnboundTerm as the outer type makes mypy accept str in the constructor without the plugin. The type: ignore[union-attr] on the 3 bind() calls is the trade-off, but the validator guarantees the runtime type is always UnboundTerm.
|
|
||
| def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundUnaryPredicate: | ||
| bound_term = self.term.bind(schema, case_sensitive) | ||
| bound_term = self.term.bind(schema, case_sensitive) # type: ignore[union-attr] |
|
|
||
| def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundSetPredicate: | ||
| bound_term = self.term.bind(schema, case_sensitive) | ||
| bound_term = self.term.bind(schema, case_sensitive) # type: ignore[union-attr] |
| class LiteralPredicate(UnboundPredicate, ABC): | ||
| type: TypingLiteral["lt", "lt-eq", "gt", "gt-eq", "eq", "not-eq", "starts-with", "not-starts-with"] = Field(alias="type") | ||
| term: UnboundTerm | ||
| term: Annotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)] |
|
|
||
| def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundLiteralPredicate: | ||
| bound_term = self.term.bind(schema, case_sensitive) | ||
| bound_term = self.term.bind(schema, case_sensitive) # type: ignore[union-attr] |
SetPredicate.__getnewargs__ return annotation used UnboundTerm, but the field is now str | UnboundTerm. Widen the tuple type to match.
Rationale for this change
Without the pydantic mypy plugin, mypy generates
__init__signatures from field declarations rather than from the explicit__init__overrides onUnboundPredicateand its subclasses. This causes downstream users to get type errors when constructing predicates with string column names:Closes #3101.
What changes were included in this PR?
Widen the
termfield declaration fromUnboundTermtoAnnotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)]inUnboundPredicateandLiteralPredicate. This uses the sameAnnotated+BeforeValidatorpattern already established inpartitioning.pyfortransformfield coercion.The
BeforeValidatorcalls the existing_to_unbound_termhelper, which coercesstrtoReference. The stored value is alwaysUnboundTermat runtime.Changes:
UnboundPredicate.term:UnboundTerm->Annotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)]LiteralPredicate.term: same widening (re-declares the field)# type: ignore[union-attr]on 3bind()call sites whereself.term.bind()is called -- the validator guarantees the runtime type, but mypy sees the union__init__methods are preserved unchanged (they handle more than just term coercion)How was this patch tested?
All 757 existing expression and conversion tests pass. Verified with mypy that keyword construction
EqualTo(term="col", value=42)no longer produces thearg-typeerror onterm.Note: Positional construction
EqualTo("col", 42)still produces mypy errors (Too many positional arguments) because mypy without the pydantic plugin cannot infer positional signatures from explicit__init__overrides. That is a broader issue affecting all Pydantic model constructors in the codebase and is outside the scope of this fix.