Skip to content

chore(bench): self-describing verdict tables — models + config in every banner and artifact#251

Merged
drewstone merged 1 commit into
mainfrom
chore/self-describing-tables
Jun 10, 2026
Merged

chore(bench): self-describing verdict tables — models + config in every banner and artifact#251
drewstone merged 1 commit into
mainfrom
chore/self-describing-tables

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

A pasted verdict table must carry its own provenance. Every verdict banner now prints the model identities (worker / analyst / author+fallback / compressor) plus domain, n, budget, and objective; every artifact (ablation-grid, steering-modes, flywheel-evolve, prompt-compression-gate) carries a models + config block.

The standing model map (cheap router models only — never CC models): EOPS runs = deepseek-v4-pro worker/author with flash fallback; AIME/math grid + steering modes = deepseek-v4-flash worker; compressor = deepseek-v4-flash; analyst defaults to the worker (the analystModel knob exists).

Bench-only; the running cost experiment is unaffected (tsx loaded its sources at start). Typecheck deltas beyond the known stale-dist class: none.

…ry banner and artifact

A pasted verdict table must carry its own provenance: every verdict banner
now prints worker/analyst/author/compressor models + domain/n/budget (and
objective where set), and every artifact (grid, steering-modes, evolve,
compression-gate) carries a models + config block. Tables become shareable
without the surrounding session.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — b345abe4

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-10T23:18:47Z

@drewstone drewstone merged commit 85de3c2 into main Jun 10, 2026
1 check passed
@drewstone drewstone deleted the chore/self-describing-tables branch June 10, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants