-
Notifications
You must be signed in to change notification settings - Fork 0
docs: add PostgreSQL KB how-to and troubleshooting guides (MIDDLEWARE-31526) #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SuJinpei
wants to merge
2
commits into
master
Choose a base branch
from
docs/add-pg-kb-howtos-31526
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| --- | ||
| weight: 42 | ||
| title: Configuring the pg_hba Client Authentication Whitelist | ||
| --- | ||
|
|
||
| # Configuring the pg_hba Client Authentication Whitelist | ||
|
|
||
| ## Overview | ||
|
|
||
| PostgreSQL client authentication is controlled by `pg_hba.conf`. In a cluster | ||
| managed by the PostgreSQL Operator, this file is rendered and managed by | ||
| Patroni — **editing `pg_hba.conf` inside the container has no effect** because | ||
| Patroni overwrites it. Instead, declare the rules in the `postgresql` custom | ||
| resource under `spec.patroni.pg_hba`, and the Operator/Patroni will apply and | ||
| reload them. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A running PostgreSQL cluster managed by the PostgreSQL Operator. | ||
| - Permission to edit the `postgresql` custom resource. | ||
|
|
||
| ## Procedure | ||
|
|
||
| ### 1. Locate the custom resource | ||
|
|
||
| ```bash | ||
| kubectl get postgresql -n $NAMESPACE | ||
| ``` | ||
|
|
||
| ### 2. Set the pg_hba rules | ||
|
|
||
| Edit the `postgresql` resource and add the whitelist under `spec.patroni.pg_hba`. | ||
| Keep the internal Patroni/replication entries, and append your own rules. Order | ||
| matters — the first matching rule wins. | ||
|
|
||
| ```yaml | ||
| spec: | ||
| patroni: | ||
| pg_hba: | ||
| - local all all trust | ||
| - hostssl all +zalandos 127.0.0.1/32 pam | ||
| - host all all 127.0.0.1/32 md5 | ||
| - hostssl all +zalandos ::1/128 pam | ||
| - host all all ::1/128 md5 | ||
| - hostssl replication standby all md5 | ||
| - hostssl all +zalandos all pam | ||
| - hostssl all all all md5 | ||
| # The two catch-all rules below permit UNENCRYPTED password auth from any | ||
| # address. Include them only if clients cannot use SSL (see the warning). | ||
| - host all all 0.0.0.0/0 md5 | ||
| - host all all ::0/0 md5 | ||
| ``` | ||
|
|
||
| Apply with `kubectl apply` / `kubectl edit`. Patroni reloads the configuration | ||
| without a database restart. | ||
|
|
||
| :::warning | ||
| `host all all 0.0.0.0/0 md5` (and its IPv6 form `::0/0`) allow unencrypted | ||
| password authentication from any address, exposing credentials to network | ||
| sniffing. Prefer the `hostssl ... md5` rules and require clients to use SSL. | ||
| Only add the plain `host` catch-all rules when a client genuinely cannot use | ||
| SSL — see | ||
| [Connection fails with "SSL off"](../trouble_shooting/connection_ssl_off.mdx). | ||
| ::: | ||
|
|
||
| ### 3. Verify | ||
|
|
||
| ```bash | ||
| kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \ | ||
| psql -U postgres -c "SELECT type, database, user_name, address, auth_method FROM pg_hba_file_rules ORDER BY line_number;" | ||
| ``` | ||
|
|
||
| The output should reflect the rules you declared. `pg_hba_file_rules` also | ||
| reports parse errors in the `error` column if a rule is malformed. | ||
|
|
||
| ## Notes | ||
|
|
||
| - Prefer `hostssl ... md5` over plain `host ... md5` when exposing the database | ||
| beyond the cluster, so that credentials are not sent over an unencrypted | ||
| connection. See also | ||
| [Connection fails with "SSL off"](../trouble_shooting/connection_ssl_off.mdx). | ||
| - `+zalandos` is an internal role group used by the Operator; do not remove the | ||
| `+zalandos` lines or internal components may lose access. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| --- | ||
| weight: 44 | ||
| title: Disabling NodePort Exposure for a PostgreSQL Cluster | ||
| --- | ||
|
|
||
| # Disabling NodePort Exposure for a PostgreSQL Cluster | ||
|
|
||
| ## Overview | ||
|
|
||
| By default the Service that fronts a PostgreSQL cluster is of type `NodePort`, | ||
| which opens a port on every node. In environments where exposing a node port is | ||
| not acceptable, you can switch the Service to type `LoadBalancer` and disable | ||
| node-port allocation, so the database is no longer reachable through a node | ||
| port. | ||
|
|
||
| :::info | ||
| This requires the platform to provide a LoadBalancer implementation (for example | ||
| MetalLB). If no `IPAddressPool` is configured, the Service's `EXTERNAL-IP` stays | ||
| `<pending>` — the node port is still removed, but no external address is | ||
| assigned until a pool exists. On OpenShift Container Platform, prefer exposing | ||
| the database through a Route / passthrough instead of a node port. | ||
| ::: | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A running PostgreSQL cluster managed by the PostgreSQL Operator. | ||
| - A LoadBalancer provider on the cluster if external reachability is required. | ||
|
|
||
| ## Procedure | ||
|
|
||
| Set `$CLUSTER_NAME` and `$NAMESPACE` for the target cluster. | ||
|
|
||
| ### 1. Switch the Services to LoadBalancer | ||
|
|
||
| ```bash | ||
| kubectl patch postgresql -n $NAMESPACE $CLUSTER_NAME --type merge \ | ||
| -p '{"spec":{"enableMasterLoadBalancer":true,"enableReplicaLoadBalancer":true}}' | ||
| ``` | ||
|
|
||
| Wait ~30 seconds for the Operator to reconcile and the Service type to change to | ||
| `LoadBalancer`: | ||
|
|
||
| ```bash | ||
| kubectl get svc -n $NAMESPACE $CLUSTER_NAME -o jsonpath='{.spec.type}{"\n"}' | ||
| ``` | ||
|
|
||
| ### 2. Remove node-port allocation | ||
|
|
||
| Patch the master Service (and the `-repl` Service if you enabled the replica | ||
| LoadBalancer) to stop allocating node ports: | ||
|
|
||
| ```bash | ||
| kubectl patch service -n $NAMESPACE $CLUSTER_NAME \ | ||
| -p '{"spec":{"allocateLoadBalancerNodePorts":false,"ports":[{"name":"postgresql","nodePort":null,"port":5432,"protocol":"TCP","targetPort":5432}]}}' | ||
|
|
||
| kubectl patch service -n $NAMESPACE $CLUSTER_NAME-repl \ | ||
| -p '{"spec":{"allocateLoadBalancerNodePorts":false,"ports":[{"name":"postgresql","nodePort":null,"port":5432,"protocol":"TCP","targetPort":5432}]}}' | ||
| ``` | ||
|
|
||
| ### 3. Verify | ||
|
|
||
| ```bash | ||
| kubectl get svc -n $NAMESPACE $CLUSTER_NAME \ | ||
| -o custom-columns=NAME:.metadata.name,TYPE:.spec.type,NODEPORT:.spec.ports[0].nodePort,ALLOC:.spec.allocateLoadBalancerNodePorts | ||
| ``` | ||
|
|
||
| Expected: `TYPE=LoadBalancer`, `NODEPORT=<none>`, `ALLOC=false`. | ||
|
|
||
| :::note | ||
| The `ports[].name` in the patch must match the existing port name on the | ||
| Service. Inspect it first with | ||
| `kubectl get svc -n $NAMESPACE $CLUSTER_NAME -o jsonpath='{.spec.ports[*].name}'` | ||
| and adjust the patch accordingly. | ||
| ::: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| --- | ||
| weight: 40 | ||
| title: Installing the pgvector Extension | ||
| --- | ||
|
|
||
| # Installing the pgvector Extension | ||
|
|
||
| ## Overview | ||
|
|
||
| [pgvector](https://github.com/pgvector/pgvector) adds a `vector` data type and | ||
| nearest-neighbor search to PostgreSQL, which is commonly used for embedding / | ||
| similarity-search workloads. The extension is pre-bundled in the Spilo image | ||
| shipped with the PostgreSQL Operator, so no image rebuild is required — you only | ||
| need to create the extension inside the target database. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A running PostgreSQL cluster managed by the PostgreSQL Operator. | ||
| - A database user with privileges to create extensions (the `postgres` | ||
| superuser, used below). | ||
|
|
||
| ## Procedure | ||
|
|
||
| ### 1. Verify the extension is available | ||
|
|
||
| ```bash | ||
| kubectl exec -n $NAMESPACE $CLUSTER_NAME-0 -c postgres -- \ | ||
| psql -U postgres -tAc \ | ||
| "SELECT name, default_version FROM pg_available_extensions WHERE name = 'vector';" | ||
| ``` | ||
|
|
||
| Expected output (version may differ depending on the operand release): | ||
|
|
||
| ``` | ||
| vector|0.8.2 | ||
| ``` | ||
|
|
||
| ### 2. Create the extension | ||
|
|
||
| ```sql | ||
| CREATE EXTENSION IF NOT EXISTS vector; | ||
| ``` | ||
|
|
||
| ### 3. Smoke test | ||
|
|
||
| ```sql | ||
| -- Create a table with a 3-dimensional vector column | ||
| CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3)); | ||
|
|
||
| -- Insert sample data | ||
| INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]'); | ||
|
|
||
| -- Order by L2 distance to a query vector | ||
| SELECT id, embedding <-> '[3,1,2]' AS l2_distance FROM items ORDER BY l2_distance; | ||
| ``` | ||
|
|
||
| The distance operators are: | ||
|
|
||
| | Operator | Distance | | ||
| |----------|----------| | ||
| | `<->` | L2 (Euclidean) | | ||
| | `<#>` | negative inner product | | ||
| | `<=>` | cosine | | ||
|
|
||
| ## Indexing for approximate nearest-neighbor search | ||
|
|
||
| By default pgvector performs an exact search (perfect recall). For larger | ||
| datasets you can add an approximate index, trading some recall for speed. | ||
|
|
||
| ### IVFFlat | ||
|
|
||
| Build the index **after** the table contains data. A good starting point for | ||
| the number of lists is `rows / 1000` (up to 1M rows) or `sqrt(rows)` beyond | ||
| that. | ||
|
|
||
| ```sql | ||
| -- L2 distance | ||
| CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100); | ||
|
|
||
| -- Tune probes at query time (higher = better recall, slower) | ||
| SET ivfflat.probes = 10; | ||
| ``` | ||
|
|
||
| ### HNSW | ||
|
|
||
| HNSW has slower build time and higher memory usage than IVFFlat but better | ||
| query performance, and can be created on an empty table. | ||
|
|
||
| ```sql | ||
| CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64); | ||
|
|
||
| -- Tune the search candidate list at query time (default 40) | ||
| SET hnsw.ef_search = 100; | ||
| ``` | ||
|
|
||
| Use `vector_ip_ops` (inner product) or `vector_cosine_ops` (cosine) instead of | ||
| `vector_l2_ops` to index the corresponding distance function. | ||
|
|
||
| ## Upgrading the extension | ||
|
|
||
| ```sql | ||
| ALTER EXTENSION vector UPDATE; | ||
| ``` | ||
|
|
||
| ## Verification | ||
|
|
||
| ```sql | ||
| SELECT extname, extversion FROM pg_extension WHERE extname = 'vector'; | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| --- | ||
| weight: 41 | ||
| title: Installing the zhparser Extension | ||
| --- | ||
|
|
||
| # Installing the zhparser Extension | ||
|
|
||
| ## Overview | ||
|
|
||
| [zhparser](https://github.com/amutu/zhparser) is a PostgreSQL full-text search | ||
| parser for Chinese, based on SCWS. It is pre-bundled in the Spilo image shipped | ||
| with the PostgreSQL Operator, so you only need to create the extension and a | ||
| text-search configuration that uses it. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A running PostgreSQL cluster managed by the PostgreSQL Operator. | ||
| - A database user with privileges to create extensions (the `postgres` | ||
| superuser, used below). Managing the custom dictionary requires superuser | ||
| privileges. | ||
|
|
||
| ## Procedure | ||
|
|
||
| ### 1. Create the extension | ||
|
|
||
| ```sql | ||
| CREATE EXTENSION IF NOT EXISTS zhparser; | ||
| ``` | ||
|
|
||
| ### 2. Create a text-search configuration | ||
|
|
||
| ```sql | ||
| CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser); | ||
| ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple; | ||
| ``` | ||
|
|
||
| ### 3. Tokenize and build search vectors | ||
|
|
||
| ```sql | ||
| -- Inspect raw tokenization | ||
| SELECT * FROM ts_parse('zhparser', '保障房资金压力'); | ||
|
|
||
| -- Build a tsvector using the configuration created above | ||
| SELECT to_tsvector('testzhcfg', '2011年保障房进入了更大规模的建设阶段'); | ||
|
|
||
| -- Build a tsquery | ||
| SELECT to_tsquery('testzhcfg', '保障房资金压力'); | ||
| ``` | ||
|
|
||
| ## Custom dictionary | ||
|
|
||
| The custom dictionary is scoped per **database** (not per instance) and is | ||
| stored under the data directory. Adding custom words requires superuser | ||
| privileges. | ||
|
|
||
| ```sql | ||
| -- Add a custom word | ||
| INSERT INTO zhparser.zhprs_custom_word VALUES ('资金压力'); | ||
|
|
||
| -- Synchronize the dictionary | ||
| SELECT sync_zhprs_custom_word(); | ||
| ``` | ||
|
|
||
| Re-establish your session (reconnect) for the change to take effect. After that, | ||
| `资金压力` is tokenized as a single word instead of `资金` + `压力`. | ||
|
|
||
| ## Parser configuration | ||
|
|
||
| The following options control dictionary loading and tokenization behavior | ||
| (PostgreSQL 9.2+). All default to `false`: | ||
|
|
||
| | Option | Purpose | | ||
| |--------|---------| | ||
| | `zhparser.punctuation_ignore` | Ignore punctuation and special symbols | | ||
| | `zhparser.seg_with_duality` | Aggregate loose characters using bigram segmentation | | ||
| | `zhparser.dict_in_memory` | Load the whole dictionary into memory | | ||
| | `zhparser.multi_short` | Compound short words | | ||
| | `zhparser.multi_duality` | Compound loose characters into bigrams | | ||
| | `zhparser.multi_zmain` | Compound important single characters | | ||
| | `zhparser.multi_zall` | Compound all single characters | | ||
| | `zhparser.extra_dicts` | Comma-separated extra dictionary files (`.txt`/`.xdb`) loaded in addition to the built-in dictionary; must be set before the backend starts | | ||
|
|
||
| ```sql | ||
| SHOW zhparser.punctuation_ignore; | ||
| ALTER SYSTEM SET zhparser.punctuation_ignore = true; | ||
| SELECT pg_reload_conf(); | ||
| ``` | ||
|
|
||
| `zhparser.extra_dicts` and `zhparser.dict_in_memory` must be set before the | ||
| backend starts (set them in the configuration and reload; new connections pick | ||
| them up). The other options can be set per session. | ||
|
|
||
| ## Upgrading the extension | ||
|
|
||
| ```sql | ||
| ALTER EXTENSION zhparser UPDATE; | ||
| ``` | ||
|
|
||
| ## Verification | ||
|
|
||
| ```sql | ||
| SELECT extname, extversion FROM pg_extension WHERE extname = 'zhparser'; | ||
| ``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.