feat(loops): the cost waterfall + three steering modes from new hypercube corners by drewstone · Pull Request #250 · tangle-network/agent-runtime

drewstone · 2026-06-10T23:01:37Z

100% trajectory observability — `createWaterfallCollector`

Folds the lifecycle stream (every spawn/settle: shots, analysts, nested agents) into timed, billed spans. The sum of spans IS the run's cost story: a text waterfall (bars scaled to the window, per-span s / $ / tokens / score, by-kind rollups, totals) or structured rows for any chart. Attaches to runAgentic/runBenchmark hooks. With #249's critic billing, every router call in a trajectory is now visible and priced — nothing rides free.

Three steering modes (bench arms; AIME; gated vs the incumbent critic at equal budget)

mode	hypercube corner	the bet
`structural`	deterministic (the ddmin of steering)	a pure function detecting stuck loops / tool-error pileups / no-execution, templated correctives, zero LLM cost — if the critic can't beat this, it isn't earning its calls (the analyst-prompt GEPA null says nobody has checked)
`contrastive`	population information	hybrid strategies pay for multiple rollouts then discard the losers; the critic sees BOTH trajectories (never scores — firewall intact) and emits the difference that matters; the better attempt continues with the diff
`belief`	decision-theoretic (E8's cheap form)	the critic emits `VERDICT: CONTINUE\|RESTART\|STOP confidence=…` — value-of-continuation — and the strategy obeys: steering the BUDGET, not the content (the steerer-population winner's lineage, domain-generic)

Runner: steering-modes.mts — each mode its own arm on identical AIME tasks (the grid pattern), sample as the no-steering control, non-inferiority gates vs refine, per-arm waterfall + by-kind cost rollups (WATERFALL=1 prints the bars).

+3 waterfall tests (span summation/rollups, downed-span billing, reset). Suite 785 ✓ · typecheck ✓ · lint ✓ (1 pre-existing warning). Bench script compiles against the next dist refresh (cost run holds bench/node_modules).

…cube corners 100%% observability: createWaterfallCollector folds the lifecycle stream (every spawn/settle — shots, analysts, nested agents) into timed, billed spans; the sum of spans IS the run's cost story — text waterfall (bars, per-span s/$ /tokens/score, by-kind rollups, totals) or structured rows for any chart. Attaches to runAgentic/runBenchmark hooks. Three steering modes from unexplored corners (bench arms, AIME, gated non-inferiority vs the incumbent critic at equal budget): - structural — the DETERMINISTIC floor (the ddmin of steering): pure function detects stuck loops / tool-error pileups / no-execution, emits a templated corrective; zero LLM cost. If the critic can't beat this, it is not earning its calls. - contrastive — POPULATION-information steering: the critic sees two trajectories (never scores; firewall intact) and emits the difference that matters; the better attempt continues with the diff as steer. - belief — DECISION-theoretic steering (E8's cheap form): the critic emits VERDICT CONTINUE|RESTART|STOP + confidence; the strategy obeys — steering the budget allocation, not the message content.

tangletools

✅ Auto-approved PR — `b3224800`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-10T23:01:43Z}

tangletools approved these changes Jun 10, 2026

View reviewed changes

drewstone merged commit 7324cee into main Jun 10, 2026
1 check passed

drewstone deleted the feat/cost-waterfall-steering-modes branch June 10, 2026 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(loops): the cost waterfall + three steering modes from new hypercube corners#250

feat(loops): the cost waterfall + three steering modes from new hypercube corners#250
drewstone merged 1 commit into
mainfrom
feat/cost-waterfall-steering-modes

drewstone commented Jun 10, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 10, 2026

100% trajectory observability — createWaterfallCollector

Three steering modes (bench arms; AIME; gated vs the incumbent critic at equal budget)

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — b3224800

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

100% trajectory observability — `createWaterfallCollector`

✅ Auto-approved PR — `b3224800`