Skip to content

Backport #8541 to release-14 via new sql path 14.0-1--14.1-1#8558

Closed
onurctirtir wants to merge 39 commits intorelease-14.0from
release-14.0-cp-mx-new
Closed

Backport #8541 to release-14 via new sql path 14.0-1--14.1-1#8558
onurctirtir wants to merge 39 commits intorelease-14.0from
release-14.0-cp-mx-new

Conversation

@onurctirtir
Copy link
Copy Markdown
Member

@onurctirtir onurctirtir commented May 4, 2026

Changes different than #8541:

Also,

  • Tests under "Test Citus Lib N-1" are failing because we're adding new UDFs and those tests are testing the scenarios where the Citus binary is behind the SQL version, but we're okay with those failures.
  • check-style is failing because it incorrectly thinks that "src/backend/distributed/sql/citus--14.0-1--15.0-1.sql" is updated in this PR, which is not the case as we don't even have it in this release branch.
  • Finally, isolation tests under N-1 tests are failing because isolation_schema_based_sharding_from_any_node is testing a new feature and we don't have sth like multi_1_create_citus_schedule for isolation tests that we can use to temporarily put the tests that will cause "expected" N-1 incompatibilities.

onurctirtir and others added 30 commits April 29, 2026 15:01
DESCRIPTION: PR description that will go into the change log, up to 78
characters

(cherry picked from commit 48e83f7)

Conflicts:
	.devcontainer/src/test/regress/Pipfile
	.devcontainer/src/test/regress/Pipfile.lock
	src/test/regress/Pipfile
	src/test/regress/Pipfile.lock
…ate distributed-schema tables

- create table
- create table as
- create table partition of
- create table partitioned by
Even before, run_command_on_master_and_workers_temp() UDF that we
were using to create those views was silently failing when creating
the views on workers. This is because, "EXECUTE p_sql;" on the
coordinator was already propagating the command to the workers.
However, we were not checking "PERFORM run_command_on_workers(p_sql);"
for errors, so any failure there was ignored.
.. not when we stop syncing metadata to it.

Resetting local group id means setting it 0, which is same as setting it
to COORDINATOR_GROUP_ID. But when we stop syncing metadata to a worker, we
don't want it to assume it's the coordinator until we actually remove it
as it's still part of this cluster and can be re-added later. This mostly
doesn't cause an issue because we distinguish between the commands to be
sent to worker nodes with metadata vs without metadata, so we mostly don't
send a command to a worker without metadata if the commands has to check,
e.g., if the node is coordinator or not.

However, it seems that in some cases we don't make such a distinction.
For example, we create sequences that a table depends on due to having a
serial column or using a column that defaults to nextval('my_sequence')
on all remote nodes, including the ones that we marked as
"hasmetadata = false", when creating prerequisite objects using
PropagatePrerequisiteObjectsForDistributedTable() before creating shards.
This actually doesn't make much sense because a non-MX worker doesn't need
sequences as we always resolve nextval() calls on coordinator
and MX workers before preparing shard-level queries, so actually we never
need sequences when executing shard-level queries on non-MX workers. And
with next commit, worker_apply_sequence_command() will need to check if
the node it's sent to is a coordinator or not during execution and doing
so would be problematic if we don't reset local group id when stopping
metadata sync to a worker, as such a worker would then assume it's a
coordinator and would try to execute coordinator specific logic in
worker_apply_sequence_command() too.
…buted-schema tables from workers

A further refactor should ideally make the rest of the code-base be using
SetLocalEnableDDLPropagation() as well.

Also, another further refactor should ideally get rid of extra args of
AlterSequenceMinMax().

Also, ideally, sequence-processing related parts in PostprocessAlterTableStmt
should be moved into a seperate function etc.
We never meant to support this UDF from workers because
EnsureReferenceTablesExistOnAllNodesExtended() always assumed
that it's run from the coordinator, which might change very
soon but still we wouldn't want to support
replicate_reference_tables() from workers for scoping purposes.
During the tests where we create distributed-schema tables from a
worker node, altering citus.next_shard_id / citus.next_placement_id
using run_command_on_coordinator() doesn't guarantee that we'll use the
values we set them to, e.g., if the citus internal connection that we
use to connect to the coordinator changes. For this reason, let's always
directly alter pg_dist_shardid_seq / pg_dist_placement_placementid_seq
at the system level by connecting to the coordinator.
…un from a worker

Given the recent support added to create distributed-schema tables from
any node; in the code-paths that can now acquire a colocation id lock
from a worker, make sure to acquire the lock on the coordinator, maybe
by using a remote connection that won't be closed until the end of the
transaction.

EnsureReferenceTablesExistOnAllNodesExtended() is also one of those
code-paths, but normally it needs to release the lock at certain points,
so it'll be handled in a separate commit differently. The code-paths
that are handled in this commit are the ones that don't need to release
the lock until the end of the transaction.
…n from a worker

Given the support for some operations from any node any node; in the
code-paths that can acquire a lock pg_dist_node from a worker, make
sure to acquire the lock on the coordinator as well, maybe by using
a remote connection that won't be closed until the end of the
transaction.

EnsureReferenceTablesExistOnAllNodesExtended() is also one of those
code-paths but it'll be handled in a separate commit.
…on is run from a worker

Given the recent support added to create distributed-schema tables from
any node; in the code-paths that can now acquire a placement colocation
lock from a worker, make sure to acquire the lock on the coordinator,
maybe by using a remote connection that won't be closed until the end of
the transaction.
…ing a distributed-schema table from a worker

Acquire a transactional advisory lock and release it by commiting the
remote transaction, because, in Postgres there is no other way to
release a transcational advisory lock in a separate command.
…mand via session-level conn

.. during isolation tests.

See the code comments.
…node

- Add tests for reference table replication as well as
  a failure test for #6592.
- Make sure to test create / drop schema / table for non-super
  user in schema_based_sharding_from_workers_a.sql.
- Add more non-super user tests.
- Add tests for the cases when the relation name is not qualified.
- Add tests for schema / table names that need proper quoting.
- Add more tests for foreign key constraints.
- Add isolation tests.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 87.75510% with 90 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.72%. Comparing base (d5731d7) to head (498bfdc).

Additional details and impacted files
@@               Coverage Diff                @@
##           release-14.0    #8558      +/-   ##
================================================
- Coverage         88.73%   88.72%   -0.01%     
================================================
  Files               287      287              
  Lines             63238    63725     +487     
  Branches           7925     8004      +79     
================================================
+ Hits              56114    56540     +426     
- Misses             4862     4888      +26     
- Partials           2262     2297      +35     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

onurctirtir added a commit that referenced this pull request May 4, 2026
14.1-1. We should reflect the state at 14.1-1 in main branch too. And while doing
that, we can delete the changes that have been made on main to introduce them at
15.0-1, because we don't need to re-introduce SQL objects introduced at 14.1-1
again at 15.0-1.
onurctirtir added a commit that referenced this pull request May 4, 2026
14.1-1. We should reflect the state at 14.1-1 in main branch too. And while doing
that, we can delete the changes that have been made on main to introduce them at
15.0-1, because we don't need to re-introduce SQL objects introduced at 14.1-1
again at 15.0-1.
onurctirtir added a commit that referenced this pull request May 4, 2026
@onurctirtir onurctirtir requested a review from eaydingol May 4, 2026 10:07
@onurctirtir onurctirtir changed the title backport #8541 to release-14 Backport #8541 to release-14 via new sql path 14.0-1--14.1-1 May 4, 2026
@onurctirtir onurctirtir force-pushed the release-14.0-cp-mx-new branch from 952bf18 to 9555806 Compare May 5, 2026 12:13
@ihalatci ihalatci added this to the 14.1 milestone May 5, 2026
This needs to be taken to main branch too.

1. Move more tests that are creating Citus extension into multi_1_create_citus_schedule.
2. Remove unnecessary worker tests from schema_based_sharding.sql as we're anyway
   testing them a lot in other test files.
3. Also add a placeholder section at the end of multi_1_create_citus_schedule for the
   tests that need to be moved back to their original schedules with the next major
   version of Citus.
@onurctirtir onurctirtir force-pushed the release-14.0-cp-mx-new branch from 9555806 to 498bfdc Compare May 6, 2026 08:45
@onurctirtir
Copy link
Copy Markdown
Member Author

closing this for now

@onurctirtir onurctirtir closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants