Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 86 additions & 6 deletions mission-control/docs/guide/notifications/concepts/inhibition.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,95 @@ sidebar_custom_props:

import Inhibition from '../../../reference/notifications/_inhibition.mdx';

Multiple related notifications may be generated within a short time window. Instead of sending each alert separately,
you can use notification inhibition to inhibit notifications based on the resource hierarchy.
When something breaks in your infrastructure, it rarely breaks alone. A crashing pod makes its ReplicaSet unhealthy,
which makes its Deployment unhealthy — and one root cause turns into three notifications.

_Example_: When a Kubernetes pod becomes unhealthy, its replicaset and the deployment will also become unhealthy.
If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications for the same cause.
Inhibition lets you keep the notification that points closest to the root cause and automatically suppress the related
notifications that follow it.

```yaml title="deployment-with-inhibition.yaml" file=<rootDir>/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml
## How it works

An inhibition rule has two sides:

- `from` — the config type whose notification you want to **keep** (the inhibitor)
- `to` — the related config types whose notifications you want to **suppress**

Once a notification is sent for a `from` resource, it starts inhibiting. For the length of the notification's
`repeatInterval`, any new event for a related `to` resource is recorded as `inhibited` instead of being delivered.

Walking through the pod example:

1. A pod crashes and a `config.unhealthy` notification for it is sent. The rule below lists `Kubernetes::Pod` in
`from`, so this notification becomes an inhibitor.
2. Moments later, the pod's ReplicaSet and Deployment also turn unhealthy. Their types are listed in `to`, so Mission
Control walks the relationship graph from each of them, finds the pod that already notified, and suppresses both.
3. You receive one notification — the pod alert — instead of three.

```yaml
inhibitions:
- direction: incoming
from: Kubernetes::Pod
to:
- Kubernetes::ReplicaSet
- Kubernetes::Deployment
```

<Inhibition />
:::note Things to keep in mind

- Inhibition requires `repeatInterval` on the notification — it doubles as the inhibition window. Without it,
inhibition rules are ignored.
- Both the kept and the suppressed alerts must come from the **same** Notification resource, so the notification's
`events` and `filter` must match all the resource types involved.
- Inhibition works on catalog (config) events such as `config.unhealthy` — not on check or component events.
- Order matters: only an already-sent `from` notification can inhibit. If the Deployment's alert happens to arrive
before the Pod's, both are sent.
- Inhibited notifications aren't lost — they appear in the notification send history with the status `inhibited`.

:::

## Writing your own rule

1. **Pick the alert to keep.** Choose the resource type that gives the clearest signal about the root cause — that's
your `from`. For Kubernetes roll-up health, that's usually the Pod.
2. **List the noise.** The related types whose alerts repeat the same information go in `to`.
3. **Choose a direction.** Ask where the `to` resources sit relative to `from` in the relationship graph:
- They're parents or owners (Pod → its ReplicaSet/Deployment): use `incoming`.
- They're children or dependents (Node → its Pods): use `outgoing`.
- Could be either: use `all`.
4. **Count the hops and set `depth`.** Each relationship level is one hop: Pod → ReplicaSet is 1, Pod → ReplicaSet →
Deployment is 2. Defaults to 5 when omitted.
5. **Set `soft: true` for soft relationships.** Ownership links like Deployment → Pod are hard relationships and match
by default. Placement links like Node → Pod are soft, and are only followed when `soft: true`.

## Examples

### Keep the Pod alert, suppress its ReplicaSet and Deployment

A pod's failure usually explains why its parents are unhealthy, so this notification keeps the pod alert and inhibits
the parent alerts that follow within the 4-hour window. The direction is `incoming` because ReplicaSets and Deployments
are parents of the pod, and `depth: 2` covers the two hops from Pod up to Deployment.

```yaml title="deployment-with-inhibition.yaml" file=<rootDir>/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml {9,12-18}
```

How this plays out:

| Time | Resource | Event | Action |
| ----- | ----------------------- | ------------------ | ----------------------------------------------- |
| 10:00 | Pod `api-7d9f` | `config.unhealthy` | Notification sent _(becomes the inhibitor)_ |
| 10:01 | ReplicaSet `api-7d9f` | `config.unhealthy` | Inhibited _(related pod already notified)_ |
| 10:02 | Deployment `api` | `config.unhealthy` | Inhibited _(related pod already notified)_ |
| 15:30 | Deployment `api` | `config.unhealthy` | Notification sent _(4h window expired)_ |

### Keep the Node alert, suppress its Pods

When a node goes down, every pod scheduled on it raises an alert. This notification keeps the node alert and inhibits
the pod alerts. The direction is `outgoing` because the pods sit below the node, and `soft: true` is required because
Node-to-Pod is a soft relationship.

```yaml title="node-with-inhibition.yaml" file=<rootDir>/modules/mission-control/fixtures/notifications/node-with-inhibition.yaml {9,12-18}
```

## Fields

<Inhibition />
13 changes: 6 additions & 7 deletions mission-control/docs/reference/notifications/_inhibition.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,30 @@
{
field: 'depth',
scheme: 'int',
description: 'Defines how many levels of child or parent resources to traverse.'
description: 'Maximum number of relationship levels to traverse. Defaults to 5 when omitted.'
},
{
field: 'direction',
scheme: '`inoming`|`outgoing`|`both`',
scheme: '`incoming` | `outgoing` | `all`',
required: true,
description: 'Specifies the traversal direction in relation to the "From" resource. Can be "outgoing" (looks for child resources), "incoming" (looks for parent resources), or "all" (considers both).'
description: 'Relationship direction from `from` to `to`. Use `outgoing` when `to` resources are downstream or child resources, `incoming` when `to` resources are upstream or parent resources, and `all` to check both directions.'
},
{
field: 'from',
scheme: '`string`',
required: true,
description: 'Specifies the starting resource type (for example, "Kubernetes::Deployment").'
description: 'Config type whose sent notification can inhibit notifications for related `to` resources. For example, `Kubernetes::Deployment`.'
},
{
field: 'soft',
scheme: 'bool',
description: 'When true, relates using soft relationships. Example: Deployment to Pod is hard relationship, but Node to Pod is soft relationship.'
description: 'When false, only hard relationships are considered. When true, both hard and soft relationships are considered. For example, Deployment to Pod is a hard relationship, but Node to Pod is a soft relationship.'
},
{
field: 'to',
scheme: '`[]string`',
required: true,
description: 'Specifies the traversal direction in relation to the `from` resource. `outgoing` looks for child resources and `incoming` looks for parent resources.'
description: 'Config types that can be inhibited when they are related to a `from` resource that already sent this notification within the `repeatInterval` window.'
}
]}
/>

2 changes: 1 addition & 1 deletion modules/canary-checker
Submodule canary-checker updated 2 files
+6 −3 go.mod
+12 −6 go.sum
2 changes: 1 addition & 1 deletion modules/mission-control
Submodule mission-control updated 218 files
2 changes: 1 addition & 1 deletion modules/mission-control-registry
Loading