Skip to content

ci: stop #freenet-dev alert spam from chronic gateway backpressure#57

Open
sanity wants to merge 1 commit into
mainfrom
fix/quiet-demo-mirror-spam
Open

ci: stop #freenet-dev alert spam from chronic gateway backpressure#57
sanity wants to merge 1 commit into
mainfrom
fix/quiet-demo-mirror-spam

Conversation

@sanity

@sanity sanity commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Problem

rescue-demos and the reusable mirror-repo workflow have been paging #freenet-dev on nearly every run for weeks. Confirmed root cause across recent runs: upstream Freenet gateway backpressure, not anything these workflows can fix. The gateway behind FREENET_GIT_WS_URL returns:

  • contract queue full, try again later on the repo-state probe (per-contract fair-queue saturation, #4251), and
  • put timed out after N peer attempt(s) on the PUTs.

The freenet-git-side robustness fixes that targeted exactly this (#4291 mirror stabilization → v0.2.69, #4253 queue-full amplification → v0.2.64) already shipped and did not resolve it. So the alerts are non-actionable noise — classic alert fatigue that trains everyone to ignore the channel.

Approach

  • rescue-demos: pause the schedule: (keep workflow_dispatch). Every scheduled run is currently a guaranteed no-op plus two 🚨/day. Re-enable once the gateway can service rescue again (e.g. after the 0.2.79 #4499 event-loop wedge fix propagates): dispatch manually, confirm green, uncomment the cron.
  • mirror-repo (reusable): classify the push outcome. A failure whose log matches the transient gateway-backpressure signatures (contract queue full / try again later / put timed out / host backpressure or timeout) sets transient=true and the Matrix-notify step is skipped — the next push or daily safety-net cron is the retry. Genuine failures (auth, pack corruption, helper bugs), job-timeout cancellations, and pre-push step failures still alert.

The run still goes red in the Actions tab on a transient failure (honest — the demo URL is behind); only the channel page is suppressed.

Note

The caller repos (freenet-core, freenet-stdlib) pin mirror-repo.yml to a fixed commit SHA, so the mirror transient-guard does not take effect for them until their pinned SHA is bumped to this commit. This PR stops the rescue spam on merge; stopping the per-repo mirror spam needs a follow-up SHA bump in each caller (or pausing their daily schedule:).

Testing

  • yamllint/python -c yaml.safe_load parse clean on both files.
  • The new push-classification bash was extracted and exercised standalone with shellcheck + behavior tests: transient-backpressure log → transient=true (alert suppressed); clean push → transient=false; genuine failure → transient=false (alert fires).

[AI-assisted - Claude]

🤖 Generated with Claude Code

Both demo-maintenance workflows have been paging #freenet-dev on nearly
every run for weeks. The cause is upstream Freenet gateway capacity, not
anything these workflows can fix: the gateway behind FREENET_GIT_WS_URL
returns "contract queue full, try again later" on the repo-state probe
and "put timed out" on the PUTs. The freenet-git-side robustness fixes
(#4291 mirror stabilization, #4253 queue-full amplification) already
shipped in v0.2.64/v0.2.69 and did not resolve it, so the alerts are
non-actionable noise that trains everyone to ignore the channel.

- rescue-demos: pause the schedule (workflow_dispatch retained). Every
  scheduled run is a guaranteed no-op plus two 🚨 alerts/day. Re-enable
  once the gateway can service rescue again (e.g. after the 0.2.79 #4499
  event-loop wedge fix propagates): dispatch manually, confirm green,
  uncomment the cron.

- mirror-repo (reusable): classify the push failure. Transient gateway
  backpressure now suppresses the Matrix alert (the next push / daily
  safety-net cron is the retry); genuine failures, job-timeout
  cancellations, and pre-push step failures still alert. One edit covers
  freenet-core, freenet-stdlib, and the self-mirror.

The run still goes red in the Actions tab on a transient failure (honest:
the demo URL is behind); only the channel page is suppressed.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01P5syMzUfC5Zk4fv5ivaYrx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant