diff --git a/mission-control-chart b/mission-control-chart index fb6fe4a4..56c1597b 160000 --- a/mission-control-chart +++ b/mission-control-chart @@ -1 +1 @@ -Subproject commit fb6fe4a4b98f54fdbf322ff1b0d85fd7469bfec5 +Subproject commit 56c1597b7ab63405de74a5953aa5dacaa3e7b978 diff --git a/mission-control/docs/guide/notifications/concepts/inhibition.mdx b/mission-control/docs/guide/notifications/concepts/inhibition.mdx index beb48200..fba30163 100644 --- a/mission-control/docs/guide/notifications/concepts/inhibition.mdx +++ b/mission-control/docs/guide/notifications/concepts/inhibition.mdx @@ -6,15 +6,95 @@ sidebar_custom_props: import Inhibition from '../../../reference/notifications/_inhibition.mdx'; -Multiple related notifications may be generated within a short time window. Instead of sending each alert separately, -you can use notification inhibition to inhibit notifications based on the resource hierarchy. +When something breaks in your infrastructure, it rarely breaks alone. A crashing pod makes its ReplicaSet unhealthy, +which makes its Deployment unhealthy — and one root cause turns into three notifications. -_Example_: When a Kubernetes pod becomes unhealthy, its replicaset and the deployment will also become unhealthy. -If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications for the same cause. +Inhibition lets you keep the notification that points closest to the root cause and automatically suppress the related +notifications that follow it. -```yaml title="deployment-with-inhibition.yaml" file=/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml +## How it works + +An inhibition rule has two sides: + +- `from` — the config type whose notification you want to **keep** (the inhibitor) +- `to` — the related config types whose notifications you want to **suppress** + +Once a notification is sent for a `from` resource, it starts inhibiting. For the length of the notification's +`repeatInterval`, any new event for a related `to` resource is recorded as `inhibited` instead of being delivered. + +Walking through the pod example: + +1. A pod crashes and a `config.unhealthy` notification for it is sent. The rule below lists `Kubernetes::Pod` in + `from`, so this notification becomes an inhibitor. +2. Moments later, the pod's ReplicaSet and Deployment also turn unhealthy. Their types are listed in `to`, so Mission + Control walks the relationship graph from each of them, finds the pod that already notified, and suppresses both. +3. You receive one notification — the pod alert — instead of three. + +```yaml +inhibitions: + - direction: incoming + from: Kubernetes::Pod + to: + - Kubernetes::ReplicaSet + - Kubernetes::Deployment ``` - +:::note Things to keep in mind + +- Inhibition requires `repeatInterval` on the notification — it doubles as the inhibition window. Without it, + inhibition rules are ignored. +- Both the kept and the suppressed alerts must come from the **same** Notification resource, so the notification's + `events` and `filter` must match all the resource types involved. +- Inhibition works on catalog (config) events such as `config.unhealthy` — not on check or component events. +- Order matters: only an already-sent `from` notification can inhibit. If the Deployment's alert happens to arrive + before the Pod's, both are sent. +- Inhibited notifications aren't lost — they appear in the notification send history with the status `inhibited`. + +::: + +## Writing your own rule + +1. **Pick the alert to keep.** Choose the resource type that gives the clearest signal about the root cause — that's + your `from`. For Kubernetes roll-up health, that's usually the Pod. +2. **List the noise.** The related types whose alerts repeat the same information go in `to`. +3. **Choose a direction.** Ask where the `to` resources sit relative to `from` in the relationship graph: + - They're parents or owners (Pod → its ReplicaSet/Deployment): use `incoming`. + - They're children or dependents (Node → its Pods): use `outgoing`. + - Could be either: use `all`. +4. **Count the hops and set `depth`.** Each relationship level is one hop: Pod → ReplicaSet is 1, Pod → ReplicaSet → + Deployment is 2. Defaults to 5 when omitted. +5. **Set `soft: true` for soft relationships.** Ownership links like Deployment → Pod are hard relationships and match + by default. Placement links like Node → Pod are soft, and are only followed when `soft: true`. + +## Examples +### Keep the Pod alert, suppress its ReplicaSet and Deployment +A pod's failure usually explains why its parents are unhealthy, so this notification keeps the pod alert and inhibits +the parent alerts that follow within the 4-hour window. The direction is `incoming` because ReplicaSets and Deployments +are parents of the pod, and `depth: 2` covers the two hops from Pod up to Deployment. + +```yaml title="deployment-with-inhibition.yaml" file=/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml {9,12-18} +``` + +How this plays out: + +| Time | Resource | Event | Action | +| ----- | ----------------------- | ------------------ | ----------------------------------------------- | +| 10:00 | Pod `api-7d9f` | `config.unhealthy` | Notification sent _(becomes the inhibitor)_ | +| 10:01 | ReplicaSet `api-7d9f` | `config.unhealthy` | Inhibited _(related pod already notified)_ | +| 10:02 | Deployment `api` | `config.unhealthy` | Inhibited _(related pod already notified)_ | +| 15:30 | Deployment `api` | `config.unhealthy` | Notification sent _(4h window expired)_ | + +### Keep the Node alert, suppress its Pods + +When a node goes down, every pod scheduled on it raises an alert. This notification keeps the node alert and inhibits +the pod alerts. The direction is `outgoing` because the pods sit below the node, and `soft: true` is required because +Node-to-Pod is a soft relationship. + +```yaml title="node-with-inhibition.yaml" file=/modules/mission-control/fixtures/notifications/node-with-inhibition.yaml {9,12-18} +``` + +## Fields + + diff --git a/mission-control/docs/reference/notifications/_inhibition.mdx b/mission-control/docs/reference/notifications/_inhibition.mdx index f4aec630..5a783ba8 100644 --- a/mission-control/docs/reference/notifications/_inhibition.mdx +++ b/mission-control/docs/reference/notifications/_inhibition.mdx @@ -3,31 +3,30 @@ { field: 'depth', scheme: 'int', - description: 'Defines how many levels of child or parent resources to traverse.' + description: 'Maximum number of relationship levels to traverse. Defaults to 5 when omitted.' }, { field: 'direction', - scheme: '`inoming`|`outgoing`|`both`', + scheme: '`incoming` | `outgoing` | `all`', required: true, - description: 'Specifies the traversal direction in relation to the "From" resource. Can be "outgoing" (looks for child resources), "incoming" (looks for parent resources), or "all" (considers both).' + description: 'Relationship direction from `from` to `to`. Use `outgoing` when `to` resources are downstream or child resources, `incoming` when `to` resources are upstream or parent resources, and `all` to check both directions.' }, { field: 'from', scheme: '`string`', required: true, - description: 'Specifies the starting resource type (for example, "Kubernetes::Deployment").' + description: 'Config type whose sent notification can inhibit notifications for related `to` resources. For example, `Kubernetes::Deployment`.' }, { field: 'soft', scheme: 'bool', - description: 'When true, relates using soft relationships. Example: Deployment to Pod is hard relationship, but Node to Pod is soft relationship.' + description: 'When false, only hard relationships are considered. When true, both hard and soft relationships are considered. For example, Deployment to Pod is a hard relationship, but Node to Pod is a soft relationship.' }, { field: 'to', scheme: '`[]string`', required: true, - description: 'Specifies the traversal direction in relation to the `from` resource. `outgoing` looks for child resources and `incoming` looks for parent resources.' + description: 'Config types that can be inhibited when they are related to a `from` resource that already sent this notification within the `repeatInterval` window.' } ]} /> - diff --git a/modules/canary-checker b/modules/canary-checker index cb7214ad..8cc042df 160000 --- a/modules/canary-checker +++ b/modules/canary-checker @@ -1 +1 @@ -Subproject commit cb7214ad82fb344bb910458c2d12e6f19dfff562 +Subproject commit 8cc042df1520b271314471f2b1c86a3edef234d2 diff --git a/modules/config-db b/modules/config-db index 96960c71..06d2e5b9 160000 --- a/modules/config-db +++ b/modules/config-db @@ -1 +1 @@ -Subproject commit 96960c71e57caec1fc618dd2cddb8b5da6ffee27 +Subproject commit 06d2e5b986023a1320258eb3c51798ef909d9c75 diff --git a/modules/duty b/modules/duty index b4d8a784..e95facc4 160000 --- a/modules/duty +++ b/modules/duty @@ -1 +1 @@ -Subproject commit b4d8a7845ccd361c7704486d2ccfacf1fe9f20bd +Subproject commit e95facc465d7bc8f2508c4d28ea48e69953431a9 diff --git a/modules/mission-control b/modules/mission-control index 3b906d69..76687d52 160000 --- a/modules/mission-control +++ b/modules/mission-control @@ -1 +1 @@ -Subproject commit 3b906d69ae3cbc419479208efd4dd3ec3403b856 +Subproject commit 76687d52e8d45d4647d2f9a2ca9f791b24e0d94a diff --git a/modules/mission-control-chart b/modules/mission-control-chart index fb6fe4a4..56c1597b 160000 --- a/modules/mission-control-chart +++ b/modules/mission-control-chart @@ -1 +1 @@ -Subproject commit fb6fe4a4b98f54fdbf322ff1b0d85fd7469bfec5 +Subproject commit 56c1597b7ab63405de74a5953aa5dacaa3e7b978 diff --git a/modules/mission-control-registry b/modules/mission-control-registry index 0e5c5a9f..e70f3ea4 160000 --- a/modules/mission-control-registry +++ b/modules/mission-control-registry @@ -1 +1 @@ -Subproject commit 0e5c5a9ff21f5160dbc18d3c9653f42c68cf1a90 +Subproject commit e70f3ea40e50e714ada9153933450208ec81038d