add a one-shot retry for object not found errors by glightfoot · Pull Request #6759 · tilt-dev/tilt

glightfoot · 2026-05-01T17:25:40Z

Summary

Fix a rare transient NotFound error during Kubernetes apply reloads.

During rapid reloads, Tilt can hit errors like:

rolebindings.rbac.authorization.k8s.io "..." not found

The same YAML usually succeeds if applied again manually.

What was happening

Tilt uses kubectl's apply implementation for Kubernetes upserts. Client-side apply does a read of the current object, computes a patch, then sends the patch/update.

There is a race where the object can be deleted between those steps. For example, a previous reload may have started an async delete, then the next reload begins applying the same object before the API server has fully converged.

Kubectl apply already handles the simple case where the initial read returns NotFound: it creates the object. But it does not recover if the read succeeds and the later patch/update returns NotFound.

Tilt already had retry handling for a related case where apply returns an object with a deletion timestamp. This change covers the adjacent failure mode where apply fails before returning an updated object.

Fix

When Apply returns a Kubernetes-style NotFound error, Tilt now treats it as a transient apply race:

Rebuild the resource list so it reflects the latest cluster state.
Retry apply once.
Return the retry error if the object still cannot be applied.

This keeps the retry narrow:

only NotFound errors are retried
only one retry is attempted
non-transient apply errors are still returned immediately

I am not sure that this is the best way to fix this long-term, but it does solve the issue in our environment.

Why this works

If the object was deleted during kubectl apply's read/patch window, retrying starts a fresh apply operation after the cluster has had another chance to converge. On retry, kubectl either sees that the object no longer exists and creates it, or sees the current object and patches it normally.

Tests

Added a regression test that simulates a RoleBinding apply returning:

rolebindings.rbac.authorization.k8s.io "app-worker-discovery" not found

and verifies that Tilt retries and successfully applies the object.

Signed-off-by: Greg Lightfoot <greg.lightfoot@reddit.com>

add a one-shot retry for object not found errors

6a5c584

Signed-off-by: Greg Lightfoot <greg.lightfoot@reddit.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a one-shot retry for object not found errors#6759

add a one-shot retry for object not found errors#6759
glightfoot wants to merge 1 commit intotilt-dev:masterfrom
glightfoot:retry-type-ordering-bug

glightfoot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

glightfoot commented May 1, 2026

Summary

What was happening

Fix

Why this works

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant