feat(loops): anytime metrics — time-to-satisfactory from the waterfall, free by drewstone · Pull Request #253 · tangle-network/agent-runtime

drewstone · 2026-06-11T00:16:02Z

The answer to 'time until satisfactory output, measuring the hill-climb at each step' — and we get it free: derived entirely from the waterfall's existing spans, zero new instrumentation.

Standard vocabulary (per the PLAIN.md rule — no invented terms)

TTT / shots-to-target — elapsed time / attempts until best-so-far ≥ a satisficing target (Simon's term).
ERT — the COCO/BBOB benchmarking convention: Σ all wall-time including failures' ÷ #successes — the honest all-in cost per success. Same construction over dollars.
Anytime curve + AUC — mean best-so-far score by shot index (the hill-climb), with a sparkline in the table.
Multi-target: targets: [0.5, 0.8, 1] measured independently (the COCO multi-target convention), or per-task bars via targetFor — satisfaction generic to the task.

anytime metrics · satisficing targets [0.5, 1] · ERT = Σ all wall-time / #successes (COCO)
strategy            ≥tgt   reach   med-TTT   med-shots   ERT(all-in)   $/success   AUC   curve
refine              0.50    2/2      2.5s        1          5.5s       $0.0150    0.62   ▄▆
refine              1.00    1/2      5.0s        2         11.0s       $0.0300    0.62   ▄▆

Wired into steering-modes (prints per arm — tonight's run produces the curves per steering mode; per-model = arms with different WORKER_MODEL). +3 tests (multi-target hits, ERT failure-charging, per-task bars). Suite 788 ✓.

…l, free Derived entirely from existing spans (no new instrumentation): per-task hill-climb curves (best-so-far score after each shot with elapsed wall and cumulative spend) and the standard anytime-optimization summary per (strategy, satisficing target): median time-to-target, shots-to-target, COCO ERT (Σ all wall-time including failures / #successes — the honest all-in cost per success), $ /success, and the AUC of the anytime curve with a sparkline render. Satisfaction follows the COCO/BBOB convention — a SET of satisficing targets measured independently (targets: [0.5, 0.8, 1]) — or per-task bars via targetFor (task-generic satisfaction). steering-modes prints the table per arm; per-model comparison = arms with different WORKER_MODEL.

tangletools

✅ Auto-approved PR — `5fa9d3f3`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-11T00:16:09Z}

tangletools approved these changes Jun 11, 2026

View reviewed changes

drewstone merged commit a279d6d into main Jun 11, 2026
1 check passed

drewstone deleted the feat/anytime-metrics branch June 11, 2026 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(loops): anytime metrics — time-to-satisfactory from the waterfall, free#253

feat(loops): anytime metrics — time-to-satisfactory from the waterfall, free#253
drewstone merged 1 commit into
mainfrom
feat/anytime-metrics

drewstone commented Jun 11, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 11, 2026

Standard vocabulary (per the PLAIN.md rule — no invented terms)

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 5fa9d3f3

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `5fa9d3f3`