Skip to content

feat(loops): anytime metrics — time-to-satisfactory from the waterfall, free#253

Merged
drewstone merged 1 commit into
mainfrom
feat/anytime-metrics
Jun 11, 2026
Merged

feat(loops): anytime metrics — time-to-satisfactory from the waterfall, free#253
drewstone merged 1 commit into
mainfrom
feat/anytime-metrics

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

The answer to 'time until satisfactory output, measuring the hill-climb at each step' — and we get it free: derived entirely from the waterfall's existing spans, zero new instrumentation.

Standard vocabulary (per the PLAIN.md rule — no invented terms)

  • TTT / shots-to-target — elapsed time / attempts until best-so-far ≥ a satisficing target (Simon's term).
  • ERT — the COCO/BBOB benchmarking convention: Σ all wall-time including failures' ÷ #successes — the honest all-in cost per success. Same construction over dollars.
  • Anytime curve + AUC — mean best-so-far score by shot index (the hill-climb), with a sparkline in the table.
  • Multi-target: targets: [0.5, 0.8, 1] measured independently (the COCO multi-target convention), or per-task bars via targetFor — satisfaction generic to the task.
anytime metrics · satisficing targets [0.5, 1] · ERT = Σ all wall-time / #successes (COCO)
strategy            ≥tgt   reach   med-TTT   med-shots   ERT(all-in)   $/success   AUC   curve
refine              0.50    2/2      2.5s        1          5.5s       $0.0150    0.62   ▄▆
refine              1.00    1/2      5.0s        2         11.0s       $0.0300    0.62   ▄▆

Wired into steering-modes (prints per arm — tonight's run produces the curves per steering mode; per-model = arms with different WORKER_MODEL). +3 tests (multi-target hits, ERT failure-charging, per-task bars). Suite 788 ✓.

…l, free

Derived entirely from existing spans (no new instrumentation): per-task
hill-climb curves (best-so-far score after each shot with elapsed wall and
cumulative spend) and the standard anytime-optimization summary per
(strategy, satisficing target): median time-to-target, shots-to-target,
COCO ERT (Σ all wall-time including failures / #successes — the honest
all-in cost per success), $ /success, and the AUC of the anytime curve
with a sparkline render.

Satisfaction follows the COCO/BBOB convention — a SET of satisficing
targets measured independently (targets: [0.5, 0.8, 1]) — or per-task bars
via targetFor (task-generic satisfaction). steering-modes prints the table
per arm; per-model comparison = arms with different WORKER_MODEL.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 5fa9d3f3

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-11T00:16:09Z

@drewstone drewstone merged commit a279d6d into main Jun 11, 2026
1 check passed
@drewstone drewstone deleted the feat/anytime-metrics branch June 11, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants