Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/PLAIN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# The system in plain language

> The translation layer. Internal docs use the project's own vocabulary; THIS page says the
> same things without it. If an explanation here contradicts a technical doc, the technical
> doc wins — then fix this page. Audience: a colleague meeting the project cold.

## Five sentences, no invented words

1. We have tasks with **automatic pass/fail checks** — tests you can run, answer keys you
can verify mechanically.
2. An AI attempts each task a fixed number of times under different **retry policies**:
"try 3 times, keep the best", "try, get feedback, try again", and so on.
3. We compare policies **fairly**: identical tasks, identical attempt budgets, paired
statistics, judged on fresh tasks that no tuning step ever saw.
4. The distinctive part: the AI also **writes new retry policies itself**, as short
programs, and they enter the same tournament under the same rules as human-written ones.
5. Every dollar and second is metered, so "better" can also mean "**equally good but
cheaper**" — and that claim is statistically testable, not vibes.

## The load-bearing core is six pieces

Task-with-check · retry policy · the tournament runner · the AI policy-writer · the
statistical promotion gate · crash-resume. Everything else is either a **fairness rule**
(added because a specific run produced a wrong number without it) or an **experiment on
the menu** (a configuration, not a machine part). Experiment configs are cheap; do not
mistake a long menu for a complicated machine.

## Translation table

| Project term | Plain English | Standard concept? |
|---|---|---|
| Environment | a task domain: open it, act on it with tools, check the result | RL environment / gym |
| shot | one attempt | — |
| steering / `refine` | feedback injected between attempts | self-refinement |
| the author / `authorStrategy` | the AI writes a new retry policy as a program | program synthesis |
| evolution / generations | rounds of: write candidates → tournament → keep the champion | evolutionary search |
| harness-verified scoring | never trust a policy's self-reported score; recompute it from the attempts the system actually ran | basic measurement hygiene |
| selector ≠ judge (the firewall) | the feedback-giver never sees the answer key or the score | no reward leakage |
| conserved budget pool | every policy gets exactly the same attempt budget; overspending is structurally impossible | compute-matched comparison |
| holdout / fresh slice | final judging happens on tasks no tuning step ever touched | train/test split |
| the gate / `promotionGate` | a seeded paired bootstrap must show the win is real before anything is declared better | standard inferential statistics |
| non-inferiority mode | prove "not worse on quality AND significantly cheaper" | clinical-trials statistics |
| band screen | drop questions every policy aces — they carry no information | item discrimination (psychometrics) |
| reproducer certificate | a fresh AI re-builds the winner from a ~64-word description; if the rebuild can't match it, the win was memorization, not method | description-length / compression test (arXiv:2606.11045) |
| κ compression / minimization | shorten the prompt; prove quality holds and cost drops | prompt compression (LLMLingua lineage); the every-Nth-character floor is delta debugging |
| waterfall | a per-step timeline of the run: what each step cost in seconds, dollars, tokens | distributed tracing |
| σ / α / γ / κ | the four independent on/off knobs: feedback, policy-writing, prompt optimization, prompt compression | factorial experimental design |

## For a game theorist, in one paragraph

A repeated tournament under mechanism-design constraints: entrants (retry policies) compete
under a hard budget; new entrants are generated by an oracle that observes only past
payoffs (never the scoring function); and the promotion rule is built to be
non-manipulable — entrants cannot misreport scores, cannot observe the test set, cannot
outspend rivals, and a declared winner must replicate from a compressed description of
itself. The research question: which entry-generation and feedback mechanisms produce
genuine improvements versus exploitation of the evaluation.

## What it has measured (plain claims, each gated)

- Feedback-between-attempts helps a lot on tasks with persistent state (+16.4pp), and
*hurts* on one-shot retrieval tasks — the effect has a sign that depends on the domain.
- Tuning the feedback-giver's instructions with a state-of-the-art prompt optimizer
changed nothing (an exact tie on held-out tasks).
- Naively giving the AI a memory of its own past outputs made it *worse* (−11.6pp).
- The AI's self-written policies reliably match the best human-written policy's quality
at roughly 2.5× lower cost (replicated three times); they have not yet beaten it on
quality on held-out tasks.
- Compressing a verbose prompt to ~a third, combined with feedback, kept quality and cut
cost ~30% on a hard math benchmark — promoted by the "not worse AND cheaper" test.

## The honest weaknesses

Mostly one domain family per claim so far (cross-domain replication is configuration, not
new code); small holdouts (12–16 tasks) mean only effects ≳6pp are detectable; and the
homegrown vocabulary is heavier than the machine it names — hence this page.
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ The package API and subsystems.
| Doc | Role | Purpose |
|---|---|---|
| [../README.md](../README.md) | API entry point | Install, the loop API, self-improvement framing, exported subpaths. |
| [PLAIN.md](./PLAIN.md) | the translation layer | The whole system in plain language — five sentences, the six-piece core, the project-term → plain-English table, the one-paragraph version for outside collaborators. Start HERE when introducing the project to anyone. |
| [glossary.md](./glossary.md) | canonical vocabulary | One definition per term (iteration/round/rollout/attempt, driver/worker/executor, TopologyMove, budget/spend, Scope.act + the coordination MCP), grounded to `file:line`; drifted synonyms flagged. Read when a term is ambiguous. |
| [execution-model.md](./execution-model.md) | the picture | The four diagrams: the unified `Executor` port (router/bridge/cli/sandbox/BYO) + two engines, driver vs worker, who gets which tools/MCPs, and the spawn mechanics. |
| [concepts.md](./concepts.md) | mental model | The product-API layer cake (chat turns, tasks, runs) — the onramp before the loop/strategy docs. |
Expand Down
Loading