Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/en/how_to/configure_pg_hba_whitelist.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
weight: 42
title: Configuring the pg_hba Client Authentication Whitelist
---

# Configuring the pg_hba Client Authentication Whitelist

## Overview

PostgreSQL client authentication is controlled by `pg_hba.conf`. In a cluster
managed by the PostgreSQL Operator, this file is rendered and managed by
Patroni — **editing `pg_hba.conf` inside the container has no effect** because
Patroni overwrites it. Instead, declare the rules in the `postgresql` custom
resource under `spec.patroni.pg_hba`, and the Operator/Patroni will apply and
reload them.

## Prerequisites

- A running PostgreSQL cluster managed by the PostgreSQL Operator.
- Permission to edit the `postgresql` custom resource.

## Procedure

### 1. Locate the custom resource

```bash
kubectl get postgresql -n $NAMESPACE
```

### 2. Set the pg_hba rules

Edit the `postgresql` resource and add the whitelist under `spec.patroni.pg_hba`.
Keep the internal Patroni/replication entries, and append your own rules. Order
matters — the first matching rule wins.

```yaml
spec:
patroni:
pg_hba:
- local all all trust
- hostssl all +zalandos 127.0.0.1/32 pam
- host all all 127.0.0.1/32 md5
- hostssl all +zalandos ::1/128 pam
- host all all ::1/128 md5
- hostssl replication standby all md5
- hostssl all +zalandos all pam
- hostssl all all all md5
# The two catch-all rules below permit UNENCRYPTED password auth from any
# address. Include them only if clients cannot use SSL (see the warning).
- host all all 0.0.0.0/0 md5
- host all all ::0/0 md5
Comment thread
SuJinpei marked this conversation as resolved.
```

Apply with `kubectl apply` / `kubectl edit`. Patroni reloads the configuration
without a database restart.

:::warning
`host all all 0.0.0.0/0 md5` (and its IPv6 form `::0/0`) allow unencrypted
password authentication from any address, exposing credentials to network
sniffing. Prefer the `hostssl ... md5` rules and require clients to use SSL.
Only add the plain `host` catch-all rules when a client genuinely cannot use
SSL — see
[Connection fails with "SSL off"](../trouble_shooting/connection_ssl_off.mdx).
:::

### 3. Verify

```bash
kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \
psql -U postgres -c "SELECT type, database, user_name, address, auth_method FROM pg_hba_file_rules ORDER BY line_number;"
```

The output should reflect the rules you declared. `pg_hba_file_rules` also
reports parse errors in the `error` column if a rule is malformed.

## Notes

- Prefer `hostssl ... md5` over plain `host ... md5` when exposing the database
beyond the cluster, so that credentials are not sent over an unencrypted
connection. See also
[Connection fails with "SSL off"](../trouble_shooting/connection_ssl_off.mdx).
- `+zalandos` is an internal role group used by the Operator; do not remove the
`+zalandos` lines or internal components may lose access.
74 changes: 74 additions & 0 deletions docs/en/how_to/disable_nodeport_exposure.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
weight: 44
title: Disabling NodePort Exposure for a PostgreSQL Cluster
---

# Disabling NodePort Exposure for a PostgreSQL Cluster

## Overview

By default the Service that fronts a PostgreSQL cluster is of type `NodePort`,
which opens a port on every node. In environments where exposing a node port is
not acceptable, you can switch the Service to type `LoadBalancer` and disable
node-port allocation, so the database is no longer reachable through a node
port.

:::info
This requires the platform to provide a LoadBalancer implementation (for example
MetalLB). If no `IPAddressPool` is configured, the Service's `EXTERNAL-IP` stays
`<pending>` — the node port is still removed, but no external address is
assigned until a pool exists. On OpenShift Container Platform, prefer exposing
the database through a Route / passthrough instead of a node port.
:::

## Prerequisites

- A running PostgreSQL cluster managed by the PostgreSQL Operator.
- A LoadBalancer provider on the cluster if external reachability is required.

## Procedure

Set `$CLUSTER_NAME` and `$NAMESPACE` for the target cluster.

### 1. Switch the Services to LoadBalancer

```bash
kubectl patch postgresql -n $NAMESPACE $CLUSTER_NAME --type merge \
-p '{"spec":{"enableMasterLoadBalancer":true,"enableReplicaLoadBalancer":true}}'
```

Wait ~30 seconds for the Operator to reconcile and the Service type to change to
`LoadBalancer`:

```bash
kubectl get svc -n $NAMESPACE $CLUSTER_NAME -o jsonpath='{.spec.type}{"\n"}'
```

### 2. Remove node-port allocation

Patch the master Service (and the `-repl` Service if you enabled the replica
LoadBalancer) to stop allocating node ports:

```bash
kubectl patch service -n $NAMESPACE $CLUSTER_NAME \
-p '{"spec":{"allocateLoadBalancerNodePorts":false,"ports":[{"name":"postgresql","nodePort":null,"port":5432,"protocol":"TCP","targetPort":5432}]}}'

kubectl patch service -n $NAMESPACE $CLUSTER_NAME-repl \
-p '{"spec":{"allocateLoadBalancerNodePorts":false,"ports":[{"name":"postgresql","nodePort":null,"port":5432,"protocol":"TCP","targetPort":5432}]}}'
```

### 3. Verify

```bash
kubectl get svc -n $NAMESPACE $CLUSTER_NAME \
-o custom-columns=NAME:.metadata.name,TYPE:.spec.type,NODEPORT:.spec.ports[0].nodePort,ALLOC:.spec.allocateLoadBalancerNodePorts
```

Expected: `TYPE=LoadBalancer`, `NODEPORT=<none>`, `ALLOC=false`.

:::note
The `ports[].name` in the patch must match the existing port name on the
Service. Inspect it first with
`kubectl get svc -n $NAMESPACE $CLUSTER_NAME -o jsonpath='{.spec.ports[*].name}'`
and adjust the patch accordingly.
:::
109 changes: 109 additions & 0 deletions docs/en/how_to/install_pgvector_extension.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
weight: 40
title: Installing the pgvector Extension
---

# Installing the pgvector Extension

## Overview

[pgvector](https://github.com/pgvector/pgvector) adds a `vector` data type and
nearest-neighbor search to PostgreSQL, which is commonly used for embedding /
similarity-search workloads. The extension is pre-bundled in the Spilo image
shipped with the PostgreSQL Operator, so no image rebuild is required — you only
need to create the extension inside the target database.

## Prerequisites

- A running PostgreSQL cluster managed by the PostgreSQL Operator.
- A database user with privileges to create extensions (the `postgres`
superuser, used below).

## Procedure

### 1. Verify the extension is available

```bash
kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \
psql -U postgres -tAc \
"SELECT name, default_version FROM pg_available_extensions WHERE name = 'vector';"
```

Expected output (version may differ depending on the operand release):

```
vector|0.8.2
```

### 2. Create the extension

```sql
CREATE EXTENSION IF NOT EXISTS vector;
```

### 3. Smoke test

```sql
-- Create a table with a 3-dimensional vector column
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));

-- Insert sample data
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');

-- Order by L2 distance to a query vector
SELECT id, embedding <-> '[3,1,2]' AS l2_distance FROM items ORDER BY l2_distance;
```

The distance operators are:

| Operator | Distance |
|----------|----------|
| `<->` | L2 (Euclidean) |
| `<#>` | negative inner product |
| `<=>` | cosine |

## Indexing for approximate nearest-neighbor search

By default pgvector performs an exact search (perfect recall). For larger
datasets you can add an approximate index, trading some recall for speed.

### IVFFlat

Build the index **after** the table contains data. A good starting point for
the number of lists is `rows / 1000` (up to 1M rows) or `sqrt(rows)` beyond
that.

```sql
-- L2 distance
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

-- Tune probes at query time (higher = better recall, slower)
SET ivfflat.probes = 10;
```

### HNSW

HNSW has slower build time and higher memory usage than IVFFlat but better
query performance, and can be created on an empty table.

```sql
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64);

-- Tune the search candidate list at query time (default 40)
SET hnsw.ef_search = 100;
```

Use `vector_ip_ops` (inner product) or `vector_cosine_ops` (cosine) instead of
`vector_l2_ops` to index the corresponding distance function.

## Upgrading the extension

```sql
ALTER EXTENSION vector UPDATE;
```

## Verification

```sql
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
```
103 changes: 103 additions & 0 deletions docs/en/how_to/install_zhparser_extension.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
weight: 41
title: Installing the zhparser Extension
---

# Installing the zhparser Extension

## Overview

[zhparser](https://github.com/amutu/zhparser) is a PostgreSQL full-text search
parser for Chinese, based on SCWS. It is pre-bundled in the Spilo image shipped
with the PostgreSQL Operator, so you only need to create the extension and a
text-search configuration that uses it.

## Prerequisites

- A running PostgreSQL cluster managed by the PostgreSQL Operator.
- A database user with privileges to create extensions (the `postgres`
superuser, used below). Managing the custom dictionary requires superuser
privileges.

## Procedure

### 1. Create the extension

```sql
CREATE EXTENSION IF NOT EXISTS zhparser;
```

### 2. Create a text-search configuration

```sql
CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;
```

### 3. Tokenize and build search vectors

```sql
-- Inspect raw tokenization
SELECT * FROM ts_parse('zhparser', '保障房资金压力');

-- Build a tsvector using the configuration created above
SELECT to_tsvector('testzhcfg', '2011年保障房进入了更大规模的建设阶段');

-- Build a tsquery
SELECT to_tsquery('testzhcfg', '保障房资金压力');
```

## Custom dictionary

The custom dictionary is scoped per **database** (not per instance) and is
stored under the data directory. Adding custom words requires superuser
privileges.

```sql
-- Add a custom word
INSERT INTO zhparser.zhprs_custom_word VALUES ('资金压力');

-- Synchronize the dictionary
SELECT sync_zhprs_custom_word();
```

Re-establish your session (reconnect) for the change to take effect. After that,
`资金压力` is tokenized as a single word instead of `资金` + `压力`.

## Parser configuration

The following options control dictionary loading and tokenization behavior
(PostgreSQL 9.2+). All default to `false`:

| Option | Purpose |
|--------|---------|
| `zhparser.punctuation_ignore` | Ignore punctuation and special symbols |
| `zhparser.seg_with_duality` | Aggregate loose characters using bigram segmentation |
| `zhparser.dict_in_memory` | Load the whole dictionary into memory |
| `zhparser.multi_short` | Compound short words |
| `zhparser.multi_duality` | Compound loose characters into bigrams |
| `zhparser.multi_zmain` | Compound important single characters |
| `zhparser.multi_zall` | Compound all single characters |
| `zhparser.extra_dicts` | Comma-separated extra dictionary files (`.txt`/`.xdb`) loaded in addition to the built-in dictionary; must be set before the backend starts |

```sql
SHOW zhparser.punctuation_ignore;
ALTER SYSTEM SET zhparser.punctuation_ignore = true;
SELECT pg_reload_conf();
```

`zhparser.extra_dicts` and `zhparser.dict_in_memory` must be set before the
backend starts (set them in the configuration and reload; new connections pick
them up). The other options can be set per session.

## Upgrading the extension

```sql
ALTER EXTENSION zhparser UPDATE;
```

## Verification

```sql
SELECT extname, extversion FROM pg_extension WHERE extname = 'zhparser';
```
Loading