tangle-network · drewstone · Jun 11, 2026 · Jun 11, 2026
diff --git a/docs/research/adaptive-computation-program.md b/docs/research/adaptive-computation-program.md
@@ -78,7 +78,17 @@ the default unfinished-items analyst in the existing steerer harness on EOPS. Sc
 criterion 1: decision regret, not prediction quality alone — a state that predicts
 better but does not choose better fails.
 
-Status: DESIGNED (the steerer-population fitness harness exists).
+Status: DESIGNED (the steerer-population fitness harness exists). **First arm RAN and was
+AUTOPSIED (2026-06-11): the −37.5pp result does NOT measure the corner — it measured a
+broken channel.** Ground truth (per-task artifact + a parser probe): the controller
+stopped after one shot on 14/16 unsolved tasks because its verdicts never existed —
+`critique()` routes through `observe()`'s findings-extraction protocol, which overrides
+the verdict-format instruction (probe: the analyst returned an ordinary recommendation,
+no `VERDICT:` line), so the body received nulls/plain steers, never a decision.
+Classification: infra/design-flaw, not real-result. **The corner is unmeasured.** Next
+arm requires a verdict-capable channel (a raw firewalled analyst call on the strategy
+ctx — trace-only input, raw text out — bypassing the findings schema) plus calibration;
+only then does the corner get a verdict.
 
 ## E1-coarse — the leakage-bounded author channel
 
@@ -111,7 +121,13 @@ explicit selection rules, fresh-slice promotion gate, band-aware holdout
 (`band.holdoutPoolN` — the estimand "paired lift on headroom tasks", pre-registered).
 Each run also feeds E2 for free (gzip-bits per authored artifact vs holdout gap).
 
-Status: RUNNING (n=24/budget-4 family; band + tool-selection arms land in the next run).
+Status: RUNNING → three family members completed (2026-06-10/11): powered n=24 HOLD;
+discriminating run (tool introspection) HOLD with the first train-side champion
+displacements + a REPRODUCIBLE certificate on the authored champion; cost-objective run
+#1 HOLD via the funnel misalignment (the search tie-band was stricter than the gate —
+fixed, the law recorded in HARNESS.md), rerun in flight. The replicated positives so far
+are cost-shaped (author at ~2.5× lower cost, ×3) and interaction-shaped (σ×κ promoted
+×2 on AIME). Live ledger: `.evolve/current.json` + the findings gist.
 
 ## Rejected without prejudice (named triggers, not dead)
 

diff --git a/docs/research/factorial-ablation-design.md b/docs/research/factorial-ablation-design.md
@@ -82,3 +82,38 @@ future claim a named cell instead of an ad-hoc run.
   leakage-bounded channel (`lossesDetail: 'binary'`) is the dose-control arm (E1-coarse).
 - **One-domain exposure**: EOPS-itsm is the workhorse; cross-domain cells (commit0, verifier-math)
   are the same grid at a different `ENV` — the transfer claim needs at least two.
+
+
+## The formal object (added 2026-06-11)
+
+A configuration is a point in a finite product lattice
+`c = (σ, α, γ, κ, m, d, b) ∈ Σ × A × Γ × K × M × D × B`; the response is vector-valued
+`J(c) = (score, cost, latency)`, estimated per cell from paired task-level deltas. The
+"hypercube" is the binary on/off projection of (σ, α, γ, κ) — our cell-naming scheme.
+In design-of-experiments language this is a fractional factorial; the estimands are main
+effects and two-way interactions as paired contrasts, each with its own promotion rule.
+
+**The steering subspace is a true {0,1}⁴**: who steers (LLM/deterministic) × what
+information (single trajectory/population) × what is steered (content/budget) × what form
+(advice/verdict). Measured corners (AIME n=16, 2026-06-11): `refine` = (LLM, single,
+content, advice) 50.0%; `structural` flips axis 1 → 43.8% (the deterministic floor
+captures ~⅔ of the lift; the LLM critic's marginal value is unproven at this n);
+`contrastive` flips axis 2 → 25.0% (significantly negative — context dilution);
+`belief` flips axes 3+4 → 12.5% — **autopsied: the channel was broken, not the idea**
+(the findings pipeline strips the verdict format; the controller never received a
+decision; corner UNMEASURED — see adaptive-computation-program.md E8). Twelve corners remain
+unexplored and enumerable (e.g. deterministic+budget = rule-based early stopping). Note
+the exploration shape: a Hamming ball of radius 1–2 around the incumbent — and the
+measured σ×κ interaction (neither factor promotes alone; the combination promoted, ×2)
+is the standing exhibit for factorial over one-factor-at-a-time designs.
+
+## Measured cells update (2026-06-11)
+
+- σ×κ (`steer+compress`): **PROMOTED non-inferior-and-cheaper ×2** (incl. on honest
+  critic billing; savings CI[+0.0006,+0.0061]; the compression call broke even after 1
+  task; Δlatency +38.8s CI[19.4,56.7] = steering's real price).
+- σ on stateless hard math: +18.7pp (50.0 vs 31.3) — refines the domain-boundary law
+  (carried messages behave like state on competition math).
+- Funnel-alignment law (from the cost run): the search-side champion tie-band must be no
+  stricter than the gate's tolerance, or promotable candidates die in search.
+- The model×cell matrix (8 workers × 4 cells, identical tasks) is the M-axis sweep.