Skip to content

fix(postgres): correctly infer nullability for LEFT JOIN rewritten as RIGHT JOIN#4285

Open
luca992 wants to merge 2 commits into
transact-rs:mainfrom
luca992:fix/postgres-left-join-rewrite-nullability
Open

fix(postgres): correctly infer nullability for LEFT JOIN rewritten as RIGHT JOIN#4285
luca992 wants to merge 2 commits into
transact-rs:mainfrom
luca992:fix/postgres-left-join-rewrite-nullability

Conversation

@luca992
Copy link
Copy Markdown

@luca992 luca992 commented May 28, 2026

PostgreSQL's planner may execute A LEFT JOIN B as a hash join with Join Type: Right to put the smaller relation on the hash-build side.

This is documented behavior:

  • [Postgres docs, 14.3 Controlling the Planner with Explicit JOIN Clauses]

    "Most practical cases involving LEFT JOIN or RIGHT JOIN can be
    rearranged to some extent."
    https://www.postgresql.org/docs/current/explicit-joins.html

  • [Postgres Pro, "Queries in PostgreSQL: 6. Hashing"]

    "On the physical level, the planner determines which set is the
    inner one and which is the outer one not by their positions in
    the query, but by the relative join cost. ... So, the join type
    switches from left to right in the plan."
    https://postgrespro.com/blog/pgsql/5969673

After the swap the SQL right operand (the nullable side under LEFT JOIN semantics) ends up as the plan's Outer child instead of the Inner child. The old visit_plan only marked Inner children nullable, which on a Join Type: Right plan:

  • marked the SQL left operand (always preserved) as nullable, causing spurious Option<T> in macro output for NOT NULL columns
  • failed to mark the SQL right operand as nullable, masking real NULLs and panicking at decode time when no LEFT JOIN row matched

The old visit_plan also only inspected children of outer joins, never the join node's own Output. Computed expressions like b.x || 'y' or COALESCE(batx.X, NULL) materialize at the join node itself; the nullable child only carries raw column refs, so nullability inference silently dropped these columns. Postgres further deparses these expressions with an extra outer paren pair at root (((expr))) compared to the join node ((expr)), so exact-string matching missed them.

Fix threads in_nullable through visit_plan and picks the NULL-fill side from the current node's join type:

  • Left -> Inner child is nullable
  • Right -> Outer child is nullable
  • Full -> both children are nullable

It also marks the join node's own outputs that aren't pass-throughs from the preserved-side child, catching computed expressions that don't appear on the nullable-side child. Output comparison normalizes redundant outer parens (Postgres-lexical tokenizer for '…', "…", E'…', B'…' / X'…', U&'…' / U&"…", $tag$…$tag$) so root ((expr)) matches join (expr).

It also recurses into all child plans, not only when the current node is Left/Right, so nested joins reached through non-join intermediates like Hash are walked.

Does your PR solve an issue?

fixes #3202

Is this a breaking change?

Behavior change yes. The old behavior was wrong inference, not a documented contract.

For queries with a LEFT JOIN that postgres rewrites as a Hash Right Join (driven by pg_statistic cost estimates, fires on production sized data with plan_cache_mode = force_generic_plan which sqlx-macros-core itself sets), generated types change:

  • Columns from the SQL left operand that are NOT NULL in their base table go from Option<T> to T. So code handling these unneeded nulls would need to be dropped.

  • Columns from the SQL right operand (the LEFT JOIN nullable side) go from T to Option<T>. Code treating these as non-null was already at risk of unexpected null; try decoding as an Option panics at runtime whenever a row had no matching join partner. After this it stops compiling and the field needs to become Option<T>.

  • Computed expressions on the nullable side of any outer join (COALESCE(b.x, …), b.x || y, function calls, CASE, etc.) go from T to Option<T>. Same latent panic risk and same migration as the case above.

The right-operand and computed-expression cases are the bigger ones. Code passed type checks before only because LEFT JOINs happened to always find a match in test data. After this fix the type system exposes the real nullability.

… RIGHT JOIN

PostgreSQL's planner may execute `A LEFT JOIN B` as a hash join with
`Join Type: Right` to put the smaller relation on the hash-build side.

This is documented behavior of the planner:

* [Postgres docs, 14.3 Controlling the Planner with Explicit JOIN Clauses]
  > "Most practical cases involving LEFT JOIN or RIGHT JOIN can be
  >  rearranged to some extent."
  https://www.postgresql.org/docs/current/explicit-joins.html

* [Postgres Pro, "Queries in PostgreSQL: 6. Hashing"]
  > "On the physical level, the planner determines which set is the
  >  inner one and which is the outer one not by their positions in
  >  the query, but by the relative join cost. ... So, the join type
  >  switches from left to right in the plan."
  https://postgrespro.com/blog/pgsql/5969673

After the swap, the SQL right operand (the nullable side under LEFT JOIN
semantics) appears as the plan's `Outer` child rather than the `Inner`
child. The previous `visit_plan` only marked `Inner` children as
nullable, which on a `Join Type: Right` plan:

  * incorrectly marked the SQL left operand (always preserved) as
    nullable — causing spurious `Option<T>` in macro output for
    NOT NULL columns; and
  * failed to mark the SQL right operand as nullable — masking real
    NULLs and panicking at decode time when no LEFT JOIN row matched.

Thread the parent join type into `visit_plan` and decide which child is
the NULL-fill side based on it:
  * `Left`  → `Inner` child is nullable (no change)
  * `Right` → `Outer` child is nullable (new)
  * `Full`  → both children are nullable (no change)

Also recurse into all child plans (not only when the current node is
`Left`/`Right`), so nested joins reached through non-join intermediates
like `Hash` are walked.

Closes transact-rs#3202.
@luca992 luca992 force-pushed the fix/postgres-left-join-rewrite-nullability branch from 977cbbb to 691420b Compare May 28, 2026 16:58
@die-herdplatte
Copy link
Copy Markdown

I observed an issue with cargo sqlx prepare returning different metadata between a populated and a non-populated DB.
This PR fixes the original discrepancy (which was related to a LEFT JOIN) but opens up another discrepancy elsewhere.

In my case, the query plan on the unpopulated DB contains a root output of the shape ((<expr>)).
During traversal via visit_plan, there's a plan node where parent_nulls_this_side evaluates to true and contains the root output in its own outputs.
The only problem here is that the plan output is represented as (<expr>) (with only one pair of outer parentheses), so the exact match with the root output will fail and the the recorded nullability will remain unknown / None.

The generated query plan for the populated DB is different and doesn't contain such an expression as one of its root outputs.
Here, nullability for the same output will be detected as true.
This results in cargo sqlx prepare --check to fail depending on the state of the reference DB.

@luca992
Copy link
Copy Markdown
Author

luca992 commented May 29, 2026

@die-herdplatte hmm is there a way you can share for me to reproduce it?

@luca992 luca992 force-pushed the fix/postgres-left-join-rewrite-nullability branch from 7f594c8 to 179277d Compare May 29, 2026 15:34
@luca992
Copy link
Copy Markdown
Author

luca992 commented May 29, 2026

@die-herdplatte I made changes which I think should handle your issues if you want to check

…in boundaries

Walk each Left/Right join node's own outputs (not just its children),
marking those not pass-through from the preserved-side child. Tolerate
redundant outer parens in EXPLAIN deparse so root `((expr))` matches the
join's `(expr)`. Add real-PG-derived regression tests.
@luca992 luca992 force-pushed the fix/postgres-left-join-rewrite-nullability branch from 179277d to b9b36ae Compare May 29, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sqlx::query_as!() returns unexpected null; try decoding as an Option when multiple (left) joins are used

2 participants