From db4dc3239ea33f222ce3706980ef60d0a652b373 Mon Sep 17 00:00:00 2001 From: Jinpei Su Date: Thu, 18 Jun 2026 06:01:28 +0000 Subject: [PATCH 1/2] docs: add PostgreSQL KB how-to and troubleshooting guides (MIDDLEWARE-31526) Precipitate historical internal KB solutions into the product manual, modernized to the current acid.zalan.do/v1 postgresql CR and verified live on ACP 4.2/4.3: how_to: install pgvector, install zhparser, configure pg_hba whitelist, run as root, disable NodePort exposure. trouble_shooting: connection SSL off, pg_wal disk full, coredump from huge pages, repair streaming replica. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/en/how_to/configure_pg_hba_whitelist.mdx | 72 ++++++++++++ docs/en/how_to/disable_nodeport_exposure.mdx | 74 ++++++++++++ docs/en/how_to/install_pgvector_extension.mdx | 109 ++++++++++++++++++ docs/en/how_to/install_zhparser_extension.mdx | 102 ++++++++++++++++ docs/en/how_to/run_postgresql_as_root.mdx | 75 ++++++++++++ .../trouble_shooting/connection_ssl_off.mdx | 66 +++++++++++ .../trouble_shooting/coredump_huge_pages.mdx | 56 +++++++++ .../fix_streaming_replication.mdx | 65 +++++++++++ docs/en/trouble_shooting/pg_wal_disk_full.mdx | 63 ++++++++++ 9 files changed, 682 insertions(+) create mode 100644 docs/en/how_to/configure_pg_hba_whitelist.mdx create mode 100644 docs/en/how_to/disable_nodeport_exposure.mdx create mode 100644 docs/en/how_to/install_pgvector_extension.mdx create mode 100644 docs/en/how_to/install_zhparser_extension.mdx create mode 100644 docs/en/how_to/run_postgresql_as_root.mdx create mode 100644 docs/en/trouble_shooting/connection_ssl_off.mdx create mode 100644 docs/en/trouble_shooting/coredump_huge_pages.mdx create mode 100644 docs/en/trouble_shooting/fix_streaming_replication.mdx create mode 100644 docs/en/trouble_shooting/pg_wal_disk_full.mdx diff --git a/docs/en/how_to/configure_pg_hba_whitelist.mdx b/docs/en/how_to/configure_pg_hba_whitelist.mdx new file mode 100644 index 0000000..805aab3 --- /dev/null +++ b/docs/en/how_to/configure_pg_hba_whitelist.mdx @@ -0,0 +1,72 @@ +--- +weight: 42 +title: Configuring the pg_hba Client Authentication Whitelist +--- + +# Configuring the pg_hba Client Authentication Whitelist + +## Overview + +PostgreSQL client authentication is controlled by `pg_hba.conf`. In a cluster +managed by the PostgreSQL Operator, this file is rendered and managed by +Patroni — **editing `pg_hba.conf` inside the container has no effect** because +Patroni overwrites it. Instead, declare the rules in the `postgresql` custom +resource under `spec.patroni.pg_hba`, and the Operator/Patroni will apply and +reload them. + +## Prerequisites + +- A running PostgreSQL cluster managed by the PostgreSQL Operator. +- Permission to edit the `postgresql` custom resource. + +## Procedure + +### 1. Locate the custom resource + +```bash +kubectl get postgresql -n $NAMESPACE +``` + +### 2. Set the pg_hba rules + +Edit the `postgresql` resource and add the whitelist under `spec.patroni.pg_hba`. +Keep the internal Patroni/replication entries, and append your own rules. Order +matters — the first matching rule wins. + +```yaml +spec: + patroni: + pg_hba: + - local all all trust + - hostssl all +zalandos 127.0.0.1/32 pam + - host all all 127.0.0.1/32 md5 + - hostssl all +zalandos ::1/128 pam + - host all all ::1/128 md5 + - hostssl replication standby all md5 + - hostssl all +zalandos all pam + - hostssl all all all md5 + - host all all 0.0.0.0/0 md5 + - host all all ::0/0 md5 +``` + +Apply with `kubectl apply` / `kubectl edit`. Patroni reloads the configuration +without a database restart. + +### 3. Verify + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \ + psql -U postgres -c "SELECT type, database, user_name, address, auth_method FROM pg_hba_file_rules ORDER BY line_number;" +``` + +The output should reflect the rules you declared. `pg_hba_file_rules` also +reports parse errors in the `error` column if a rule is malformed. + +## Notes + +- Prefer `hostssl ... md5` over plain `host ... md5` when exposing the database + beyond the cluster, so that credentials are not sent over an unencrypted + connection. See also + [Connection fails with "SSL off"](../trouble_shooting/connection_ssl_off.mdx). +- `+zalandos` is an internal role group used by the Operator; do not remove the + `+zalandos` lines or internal components may lose access. diff --git a/docs/en/how_to/disable_nodeport_exposure.mdx b/docs/en/how_to/disable_nodeport_exposure.mdx new file mode 100644 index 0000000..c49254f --- /dev/null +++ b/docs/en/how_to/disable_nodeport_exposure.mdx @@ -0,0 +1,74 @@ +--- +weight: 44 +title: Disabling NodePort Exposure for a PostgreSQL Cluster +--- + +# Disabling NodePort Exposure for a PostgreSQL Cluster + +## Overview + +By default the Service that fronts a PostgreSQL cluster is of type `NodePort`, +which opens a port on every node. In environments where exposing a node port is +not acceptable, you can switch the Service to type `LoadBalancer` and disable +node-port allocation, so the database is no longer reachable through a node +port. + +:::info +This requires the platform to provide a LoadBalancer implementation (for example +MetalLB). If no `IPAddressPool` is configured, the Service's `EXTERNAL-IP` stays +`` — the node port is still removed, but no external address is +assigned until a pool exists. On OpenShift Container Platform, prefer exposing +the database through a Route / passthrough instead of a node port. +::: + +## Prerequisites + +- A running PostgreSQL cluster managed by the PostgreSQL Operator. +- A LoadBalancer provider on the cluster if external reachability is required. + +## Procedure + +Set `$CLUSTER_NAME` and `$NAMESPACE` for the target cluster. + +### 1. Switch the Services to LoadBalancer + +```bash +kubectl patch postgresql -n $NAMESPACE $CLUSTER_NAME --type merge \ + -p '{"spec":{"enableMasterLoadBalancer":true,"enableReplicaLoadBalancer":true}}' +``` + +Wait ~30 seconds for the Operator to reconcile and the Service type to change to +`LoadBalancer`: + +```bash +kubectl get svc -n $NAMESPACE $CLUSTER_NAME -o jsonpath='{.spec.type}{"\n"}' +``` + +### 2. Remove node-port allocation + +Patch the master Service (and the `-repl` Service if you enabled the replica +LoadBalancer) to stop allocating node ports: + +```bash +kubectl patch service -n $NAMESPACE $CLUSTER_NAME \ + -p '{"spec":{"allocateLoadBalancerNodePorts":false,"ports":[{"name":"postgresql","nodePort":null,"port":5432,"protocol":"TCP","targetPort":5432}]}}' + +kubectl patch service -n $NAMESPACE $CLUSTER_NAME-repl \ + -p '{"spec":{"allocateLoadBalancerNodePorts":false,"ports":[{"name":"postgresql","nodePort":null,"port":5432,"protocol":"TCP","targetPort":5432}]}}' +``` + +### 3. Verify + +```bash +kubectl get svc -n $NAMESPACE $CLUSTER_NAME \ + -o custom-columns=NAME:.metadata.name,TYPE:.spec.type,NODEPORT:.spec.ports[0].nodePort,ALLOC:.spec.allocateLoadBalancerNodePorts +``` + +Expected: `TYPE=LoadBalancer`, `NODEPORT=`, `ALLOC=false`. + +:::note +The `ports[].name` in the patch must match the existing port name on the +Service. Inspect it first with +`kubectl get svc -n $NAMESPACE $CLUSTER_NAME -o jsonpath='{.spec.ports[*].name}'` +and adjust the patch accordingly. +::: diff --git a/docs/en/how_to/install_pgvector_extension.mdx b/docs/en/how_to/install_pgvector_extension.mdx new file mode 100644 index 0000000..b07d21f --- /dev/null +++ b/docs/en/how_to/install_pgvector_extension.mdx @@ -0,0 +1,109 @@ +--- +weight: 40 +title: Installing the pgvector Extension +--- + +# Installing the pgvector Extension + +## Overview + +[pgvector](https://github.com/pgvector/pgvector) adds a `vector` data type and +nearest-neighbor search to PostgreSQL, which is commonly used for embedding / +similarity-search workloads. The extension is pre-bundled in the Spilo image +shipped with the PostgreSQL Operator, so no image rebuild is required — you only +need to create the extension inside the target database. + +## Prerequisites + +- A running PostgreSQL cluster managed by the PostgreSQL Operator. +- A database user with privileges to create extensions (the `postgres` + superuser, used below). + +## Procedure + +### 1. Verify the extension is available + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \ + psql -U postgres -tAc \ + "SELECT name, default_version FROM pg_available_extensions WHERE name = 'vector';" +``` + +Expected output (version may differ depending on the operand release): + +``` +vector|0.8.2 +``` + +### 2. Create the extension + +```sql +CREATE EXTENSION IF NOT EXISTS vector; +``` + +### 3. Smoke test + +```sql +-- Create a table with a 3-dimensional vector column +CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3)); + +-- Insert sample data +INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]'); + +-- Order by L2 distance to a query vector +SELECT id, embedding <-> '[3,1,2]' AS l2_distance FROM items ORDER BY l2_distance; +``` + +The distance operators are: + +| Operator | Distance | +|----------|----------| +| `<->` | L2 (Euclidean) | +| `<#>` | negative inner product | +| `<=>` | cosine | + +## Indexing for approximate nearest-neighbor search + +By default pgvector performs an exact search (perfect recall). For larger +datasets you can add an approximate index, trading some recall for speed. + +### IVFFlat + +Build the index **after** the table contains data. A good starting point for +the number of lists is `rows / 1000` (up to 1M rows) or `sqrt(rows)` beyond +that. + +```sql +-- L2 distance +CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100); + +-- Tune probes at query time (higher = better recall, slower) +SET ivfflat.probes = 10; +``` + +### HNSW + +HNSW has slower build time and higher memory usage than IVFFlat but better +query performance, and can be created on an empty table. + +```sql +CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64); + +-- Tune the search candidate list at query time (default 40) +SET hnsw.ef_search = 100; +``` + +Use `vector_ip_ops` (inner product) or `vector_cosine_ops` (cosine) instead of +`vector_l2_ops` to index the corresponding distance function. + +## Upgrading the extension + +```sql +ALTER EXTENSION vector UPDATE; +``` + +## Verification + +```sql +SELECT extname, extversion FROM pg_extension WHERE extname = 'vector'; +``` diff --git a/docs/en/how_to/install_zhparser_extension.mdx b/docs/en/how_to/install_zhparser_extension.mdx new file mode 100644 index 0000000..756c0ae --- /dev/null +++ b/docs/en/how_to/install_zhparser_extension.mdx @@ -0,0 +1,102 @@ +--- +weight: 41 +title: Installing the zhparser Extension +--- + +# Installing the zhparser Extension + +## Overview + +[zhparser](https://github.com/amutu/zhparser) is a PostgreSQL full-text search +parser for Chinese, based on SCWS. It is pre-bundled in the Spilo image shipped +with the PostgreSQL Operator, so you only need to create the extension and a +text-search configuration that uses it. + +## Prerequisites + +- A running PostgreSQL cluster managed by the PostgreSQL Operator. +- A database user with privileges to create extensions (the `postgres` + superuser, used below). Managing the custom dictionary requires superuser + privileges. + +## Procedure + +### 1. Create the extension + +```sql +CREATE EXTENSION IF NOT EXISTS zhparser; +``` + +### 2. Create a text-search configuration + +```sql +CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser); +ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple; +``` + +### 3. Tokenize and build search vectors + +```sql +-- Inspect raw tokenization +SELECT * FROM ts_parse('zhparser', '保障房资金压力'); + +-- Build a tsvector using the configuration created above +SELECT to_tsvector('testzhcfg', '2011年保障房进入了更大规模的建设阶段'); + +-- Build a tsquery +SELECT to_tsquery('testzhcfg', '保障房资金压力'); +``` + +## Custom dictionary + +The custom dictionary is scoped per **database** (not per instance) and is +stored under the data directory. Adding custom words requires superuser +privileges. + +```sql +-- Add a custom word +INSERT INTO zhparser.zhprs_custom_word VALUES ('资金压力'); + +-- Synchronize the dictionary +SELECT sync_zhprs_custom_word(); +``` + +Re-establish your session (reconnect) for the change to take effect. After that, +`资金压力` is tokenized as a single word instead of `资金` + `压力`. + +## Parser configuration + +The following options control dictionary loading and tokenization behavior +(PostgreSQL 9.2+). All default to `false`: + +| Option | Purpose | +|--------|---------| +| `zhparser.punctuation_ignore` | Ignore punctuation and special symbols | +| `zhparser.seg_with_duality` | Aggregate loose characters using bigram segmentation | +| `zhparser.dict_in_memory` | Load the whole dictionary into memory | +| `zhparser.multi_short` | Compound short words | +| `zhparser.multi_duality` | Compound loose characters into bigrams | +| `zhparser.multi_zmain` | Compound important single characters | +| `zhparser.multi_zall` | Compound all single characters | + +```sql +SHOW zhparser.punctuation_ignore; +ALTER SYSTEM SET zhparser.punctuation_ignore = true; +SELECT pg_reload_conf(); +``` + +`zhparser.extra_dicts` and `zhparser.dict_in_memory` must be set before the +backend starts (set them in the configuration and reload; new connections pick +them up). The other options can be set per session. + +## Upgrading the extension + +```sql +ALTER EXTENSION zhparser UPDATE; +``` + +## Verification + +```sql +SELECT extname, extversion FROM pg_extension WHERE extname = 'zhparser'; +``` diff --git a/docs/en/how_to/run_postgresql_as_root.mdx b/docs/en/how_to/run_postgresql_as_root.mdx new file mode 100644 index 0000000..ca89e2a --- /dev/null +++ b/docs/en/how_to/run_postgresql_as_root.mdx @@ -0,0 +1,75 @@ +--- +weight: 43 +title: Running PostgreSQL Internal Processes as root +--- + +# Running PostgreSQL Internal Processes as root + +## Overview + +By default the PostgreSQL Operator runs the database container as a non-root +user for security. Some integrations — for example traditional storage backends +that require root to mount or access volumes — only work when the container runs +as root. This guide shows how to opt into running as root. + +:::warning +Running the database as root is **not recommended**. It increases the attack +surface and violates least-privilege principles: container-escape / privilege +escalation become more damaging, account isolation is weakened, and it may +violate security-compliance requirements. Only enable this when an integration +genuinely requires it. +::: + +## Prerequisites + +- A running PostgreSQL cluster managed by the PostgreSQL Operator. +- Permission to edit the `postgresql` custom resource. +- On OpenShift Container Platform (OCP): the target namespace's ServiceAccount + must be allowed to run privileged pods (for example by binding the + `privileged` SCC). Without this, the pods will be rejected by the Security + Context Constraints admission. + +## Procedure + +### 1. Set the root security fields + +Edit the `postgresql` resource and set the following fields: + +```yaml +spec: + spiloRunAsUser: 0 + spiloRunAsGroup: 0 + spiloPrivileged: true + spiloAllowPrivilegeEscalation: true +``` + +Apply the change. The Operator rolls the pods so the new pod security context +takes effect. + +### 2. Verify + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- id +``` + +Expected output (uid 0): + +``` +uid=0(root) gid=0(root) groups=0(root),103(postgres) +``` + +You can also confirm the pod security context: + +```bash +kubectl get pod $CLUSTER_NAME-0 -n $NAMESPACE \ + -o jsonpath='{.spec.securityContext}{"\n"}{.spec.containers[0].securityContext}{"\n"}' +``` + +It should show `runAsUser: 0`, `runAsGroup: 0`, `privileged: true` and +`allowPrivilegeEscalation: true`. + +## Reverting + +Remove the four fields (or set `spiloRunAsUser`/`spiloRunAsGroup` back to the +non-root defaults `101`/`103` and the privileged flags to `false`) and apply. +The Operator rolls the pods back to the non-root security context. diff --git a/docs/en/trouble_shooting/connection_ssl_off.mdx b/docs/en/trouble_shooting/connection_ssl_off.mdx new file mode 100644 index 0000000..bf9ce4f --- /dev/null +++ b/docs/en/trouble_shooting/connection_ssl_off.mdx @@ -0,0 +1,66 @@ +--- +weight: 40 +title: Connection Fails with "SSL off" +--- + +# Connection Fails with "SSL off" + +## Problem Description + +A client fails to connect to PostgreSQL and the server rejects the connection +with an error similar to: + +``` +[PostgreSQL error] failed to retrieve PostgreSQL server_version_num: +FATAL: pg_hba.conf rejects connection for host "172.x.x.x", user "postgres", database "iapi", SSL off +``` + +The key part is `SSL off`: the client connected without SSL, and no `pg_hba.conf` +rule matches a non-SSL (`host`) connection for that client, so PostgreSQL +rejects it. + +## Root Cause + +`pg_hba.conf` only contains `hostssl` (SSL-only) entries for the client's +address range, or is missing a catch-all rule for the client. A client that +does not negotiate SSL therefore has no matching rule. + +## Diagnosis + +Inspect the effective rules: + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \ + psql -U postgres -c "SELECT type, database, user_name, address, auth_method, error FROM pg_hba_file_rules ORDER BY line_number;" +``` + +Confirm there is no `host` (or `hostssl` if the client does use SSL) rule that +matches the client's address. + +## Resolution + +Add a matching rule under `spec.patroni.pg_hba` in the `postgresql` custom +resource. Prefer requiring SSL where possible: + +```yaml +spec: + patroni: + pg_hba: + - local all all trust + - host all all 127.0.0.1/32 md5 + - hostssl replication standby all md5 + - hostssl all +zalandos all pam + - hostssl all all all md5 + # Add this if the client cannot use SSL: + - host all all 0.0.0.0/0 md5 +``` + +Patroni reloads the configuration without a restart. See +[Configuring the pg_hba Client Authentication Whitelist](../how_to/configure_pg_hba_whitelist.mdx) +for the full procedure and verification. + +:::warning +Adding `host all all 0.0.0.0/0 md5` allows unencrypted password authentication +from any address. Prefer fixing the **client** to use SSL and keeping only +`hostssl` rules whenever possible. +::: diff --git a/docs/en/trouble_shooting/coredump_huge_pages.mdx b/docs/en/trouble_shooting/coredump_huge_pages.mdx new file mode 100644 index 0000000..cdab3b8 --- /dev/null +++ b/docs/en/trouble_shooting/coredump_huge_pages.mdx @@ -0,0 +1,56 @@ +--- +weight: 60 +title: PostgreSQL Coredump Caused by Huge Pages +--- + +# PostgreSQL Coredump Caused by Huge Pages + +## Problem Description + +PostgreSQL crashes on start-up with a bus error / coredump. The bootstrap log +ends with: + +``` +selecting default shared_buffers ... 400kB +selecting default time zone ... Etc/UTC +creating configuration files ... ok +running bootstrap script ... Bus error (core dumped) +``` + +## Root Cause + +Huge pages are enabled on the host, but the pod has no huge-page resource +allocated, so the container cannot use them. PostgreSQL tries to request huge +pages by default; the kernel sends `SIGBUS`, which produces the coredump. + +## Resolution + +Disable huge pages for the database. Set the `huge_pages` parameter to `off` in +the `postgresql` custom resource: + +```yaml +spec: + postgresql: + parameters: + huge_pages: "off" +``` + +Apply the change and let the Operator reconcile. Verify: + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \ + psql -U postgres -tAc "SHOW huge_pages;" +``` + +Expected output: + +``` +off +``` + +:::note +On current operand releases, setting the `huge_pages` parameter is sufficient — +it is applied both at runtime and during database initialization. (Older +guidance that mounted a `postgresql.conf.sample` ConfigMap per PostgreSQL major +version is no longer required.) +::: diff --git a/docs/en/trouble_shooting/fix_streaming_replication.mdx b/docs/en/trouble_shooting/fix_streaming_replication.mdx new file mode 100644 index 0000000..afebd57 --- /dev/null +++ b/docs/en/trouble_shooting/fix_streaming_replication.mdx @@ -0,0 +1,65 @@ +--- +weight: 65 +title: Repairing a Broken Streaming Replica +--- + +# Repairing a Broken Streaming Replica + +## Problem Description + +A standby in a Patroni-managed PostgreSQL cluster is not replicating: it shows a +large lag, is stuck, or is otherwise out of sync with the leader. The leader has +no active streaming standby for it. + +## Diagnosis + +### 1. Inspect the cluster topology + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- patronictl list +``` + +A member with a large `Lag in MB`, a `Pending restart`, or a non-`running` +state is the broken replica. + +### 2. Check replication state on the leader + +```sql +-- On the leader: a healthy standby appears here in state 'streaming' +SELECT application_name, state, sent_lsn, replay_lsn, sync_state +FROM pg_stat_replication; + +-- An inactive slot / stale restart_lsn indicates a stuck standby +SELECT slot_name, active, restart_lsn FROM pg_replication_slots; +``` + +If `pg_stat_replication` returns no row for the standby, it is not streaming. + +## Resolution + +Reinitialize the broken member from the leader. This re-clones the standby's +data directory from the current leader. + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \ + patronictl reinit $CLUSTER_NAME $CLUSTER_NAME-1 --force +``` + +Replace `$CLUSTER_NAME-1` with the name of the broken member. Without `--force`, +`patronictl` prompts for confirmation. + +After the reinit completes, confirm the member is healthy: + +```bash +kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- patronictl list +``` + +The repaired member should show role `Replica`, state `running`/`streaming`, +and `Lag in MB` of `0`. On the leader, `pg_stat_replication` should now list the +member in state `streaming`. + +:::note +`patronictl reinit` performs a fresh base backup of the member from the leader. +On large databases this can take a while and consumes leader I/O; run it during +a low-traffic window where possible. +::: diff --git a/docs/en/trouble_shooting/pg_wal_disk_full.mdx b/docs/en/trouble_shooting/pg_wal_disk_full.mdx new file mode 100644 index 0000000..21c4204 --- /dev/null +++ b/docs/en/trouble_shooting/pg_wal_disk_full.mdx @@ -0,0 +1,63 @@ +--- +weight: 50 +title: Disk Full Due to pg_wal Accumulation +--- + +# Disk Full Due to pg_wal Accumulation + +## Problem Description + +The data volume fills up because Write-Ahead Log (WAL) segments under the +`pg_wal` directory accumulate and are not recycled. The data cannot simply be +deleted — removing WAL files by hand can corrupt the cluster. + +## Root Cause + +WAL segments are retained until they are no longer needed by every consumer +(replicas, replication slots, archiver). The most common cause is that a standby +cannot keep up — for example because of slow disk I/O — so replication lag grows +and the primary must retain WAL for the lagging standby, causing `pg_wal` to +grow without bound. + +## Diagnosis + +1. Confirm the cluster is otherwise healthy: + + ```bash + kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- patronictl list + ``` + + A large, growing `Lag in MB` on a replica points to replication lag as the + cause. + +2. Check replication slots and current WAL position: + + ```sql + SELECT slot_name, active, restart_lsn FROM pg_replication_slots; + SELECT * FROM pg_stat_replication; + ``` + + An inactive slot whose `restart_lsn` is far behind pins WAL on the primary. + +## Resolution + +1. **Reduce the write rate.** Lower the application's insert/update throughput + (for example from 10 rows/s to 5 rows/s, or pause non-essential writers) so + the standby can catch up and WAL can be recycled. + +2. **Reduce to a single node temporarily**, if acceptable to the customer, so + there is no lagging standby retaining WAL. Edit the `postgresql` resource: + + ```bash + kubectl get postgresql -A + # set spec.numberOfInstances: 1 + ``` + + After the lag clears, WAL is archived/recycled automatically and the disk + space is released. Scale back up once the situation is stable. + +:::danger +Never delete files under `pg_wal` manually. Removing WAL that the database still +needs will corrupt the cluster. Always resolve the underlying retention cause +(lagging standby, stale replication slot, or stalled archiver) instead. +::: From 2580206f7c70ba05b63918e6320bb4f281f72cd8 Mon Sep 17 00:00:00 2001 From: Jinpei Su Date: Thu, 18 Jun 2026 07:34:26 +0000 Subject: [PATCH 2/2] docs: address CodeRabbit review on PG KB guides - pg_hba whitelist: warn about permissive catch-all 0.0.0.0/0 / ::0/0 rules - pg_wal disk full: show concrete kubectl patch instead of a get + comment - zhparser: add zhparser.extra_dicts to the options table Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/en/how_to/configure_pg_hba_whitelist.mdx | 11 +++++++++++ docs/en/how_to/install_zhparser_extension.mdx | 1 + docs/en/trouble_shooting/pg_wal_disk_full.mdx | 11 +++++++---- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/docs/en/how_to/configure_pg_hba_whitelist.mdx b/docs/en/how_to/configure_pg_hba_whitelist.mdx index 805aab3..d7fd64b 100644 --- a/docs/en/how_to/configure_pg_hba_whitelist.mdx +++ b/docs/en/how_to/configure_pg_hba_whitelist.mdx @@ -45,6 +45,8 @@ spec: - hostssl replication standby all md5 - hostssl all +zalandos all pam - hostssl all all all md5 + # The two catch-all rules below permit UNENCRYPTED password auth from any + # address. Include them only if clients cannot use SSL (see the warning). - host all all 0.0.0.0/0 md5 - host all all ::0/0 md5 ``` @@ -52,6 +54,15 @@ spec: Apply with `kubectl apply` / `kubectl edit`. Patroni reloads the configuration without a database restart. +:::warning +`host all all 0.0.0.0/0 md5` (and its IPv6 form `::0/0`) allow unencrypted +password authentication from any address, exposing credentials to network +sniffing. Prefer the `hostssl ... md5` rules and require clients to use SSL. +Only add the plain `host` catch-all rules when a client genuinely cannot use +SSL — see +[Connection fails with "SSL off"](../trouble_shooting/connection_ssl_off.mdx). +::: + ### 3. Verify ```bash diff --git a/docs/en/how_to/install_zhparser_extension.mdx b/docs/en/how_to/install_zhparser_extension.mdx index 756c0ae..115ec6b 100644 --- a/docs/en/how_to/install_zhparser_extension.mdx +++ b/docs/en/how_to/install_zhparser_extension.mdx @@ -78,6 +78,7 @@ The following options control dictionary loading and tokenization behavior | `zhparser.multi_duality` | Compound loose characters into bigrams | | `zhparser.multi_zmain` | Compound important single characters | | `zhparser.multi_zall` | Compound all single characters | +| `zhparser.extra_dicts` | Comma-separated extra dictionary files (`.txt`/`.xdb`) loaded in addition to the built-in dictionary; must be set before the backend starts | ```sql SHOW zhparser.punctuation_ignore; diff --git a/docs/en/trouble_shooting/pg_wal_disk_full.mdx b/docs/en/trouble_shooting/pg_wal_disk_full.mdx index 21c4204..7bf13f7 100644 --- a/docs/en/trouble_shooting/pg_wal_disk_full.mdx +++ b/docs/en/trouble_shooting/pg_wal_disk_full.mdx @@ -46,15 +46,18 @@ grow without bound. the standby can catch up and WAL can be recycled. 2. **Reduce to a single node temporarily**, if acceptable to the customer, so - there is no lagging standby retaining WAL. Edit the `postgresql` resource: + there is no lagging standby retaining WAL. Patch the `postgresql` resource to + one instance: ```bash - kubectl get postgresql -A - # set spec.numberOfInstances: 1 + # Find the cluster name/namespace first if needed: kubectl get postgresql -A + kubectl patch postgresql -n $NAMESPACE $CLUSTER_NAME --type merge \ + -p '{"spec":{"numberOfInstances":1}}' ``` After the lag clears, WAL is archived/recycled automatically and the disk - space is released. Scale back up once the situation is stable. + space is released. Scale back up (restore `numberOfInstances`) once the + situation is stable. :::danger Never delete files under `pg_wal` manually. Removing WAL that the database still