Skip to content

Bench-runner LLM transport fix: HTTP/1.1 + no pool + 15s header timeout (LlmPolicy)#135

Merged
adithyn7 merged 2 commits into
mainfrom
feat/bench-runner-llm-policy
Jun 12, 2026
Merged

Bench-runner LLM transport fix: HTTP/1.1 + no pool + 15s header timeout (LlmPolicy)#135
adithyn7 merged 2 commits into
mainfrom
feat/bench-runner-llm-policy

Conversation

@adithyn7

@adithyn7 adithyn7 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Bench-runner-only LLM transport fix for the chronic "error decoding response body" decode deaths on long Kimi/Fireworks runs. Introduces an LlmPolicy on LlmClientConfig mirroring the existing ToolPolicy pattern: default = byte-identical to today's desktop-app behavior (pooled HTTP/2 via shared client); bench-runner opts into LlmPolicy::bench() (HTTP/1.1, fresh TCP per request, 15s header-arrival timeout). Matches opencode's working transport shape.

Evidence (off-grid validation, chronic dead cell)

teleport-1b08 × kimi × ON — 6 consecutive deterministic decode deaths across eras with v0.1.5:

Attempt (old binary) Survived to Outcome
g6/a0 20 turns decode death
g6/a1 11 turns decode death
g6/a2 6 turns decode death
g6/a3 9 turns decode death
g6/a4 (this PR) 45 turns done — real 3.1 KB Go patch, 12.2 min, $0.50

stderr was completely empty on the survivor — no [bench] llm-error retrying=… lines. The truncations didn't happen; they were prevented at the transport layer. Matches the hypothesis: opencode (HTTP/1.1) had zero decode deaths on the same router/account/model across ~16 heavy Kimi cells.

The three changes (all bench-only)

  1. http1_only — pin the LLM client to HTTP/1.1 (no h2 ALPN negotiation).
  2. no_poolpool_max_idle_per_host(0) → fresh TCP+TLS connection per request, kills connection-reuse / pool-poisoning failure modes.
  3. header_timeout_ms: Some(15_000) — abort the request if response headers don't arrive within 15 s. New retryable AgentError::HeaderTimeout joins the existing HttpError/ChunkTimeout retry arm.

Plus: AgentEvent::Error { retrying, message } mirrored to stderr so the grid's stderr_tail captures LLM retry attempts directly.

Ships as v0.1.6 (patch).

adithyn7 added 2 commits June 12, 2026 17:05
Introduce a LlmPolicy carried on LlmClientConfig so the LLM HTTP client can run
stricter under headless eval without changing the desktop app. Default policy is
permissive (byte-identical to current app behavior — pooled HTTP/2 via the shared
client); LlmPolicy::bench() enables:

- http1_only: pin the LLM client to HTTP/1.1 (no h2 negotiation)
- no_pool: fresh TCP+TLS per request (pool_max_idle_per_host(0))
- header_timeout_ms: 15s abort if response headers don't arrive

LlmClient::new builds a dedicated reqwest::Client when the policy deviates from
default; otherwise still clones the shared client. A new retryable AgentError
variant HeaderTimeout joins HttpError/ChunkTimeout in the existing turn-level
retry path. Matches opencode's working transport shape (HTTP/1.1, fresh sockets,
header timeout + SDK retries) which has zero decode deaths on the same router
that gave bench-runner deterministic mid-stream resets on long Kimi runs.
The eval harness sets LlmPolicy::bench() so the frozen binary uses the same
HTTP transport shape opencode does on the same router. Always on — part of
the harness identity, no CLI flag.

Also mirror AgentEvent::Error { retrying, message } to stderr so the grid's
stderr_tail captures LLM retry attempts directly. Without this the result JSON
only records the final outcome, leaving us blind to whether retries fired.

Validated off-grid on the chronic dead cell (teleport-1b08 × kimi × ON): old
binary died on decode at turns 6, 9, 11, 20 across 4 attempts; this binary
completed cleanly at 45 turns with empty stderr (no retries triggered — the
underlying truncations stopped happening at the transport layer).
@adithyn7 adithyn7 added the patch Patch version bump label Jun 12, 2026
@adithyn7 adithyn7 merged commit 6e1048e into main Jun 12, 2026
4 checks passed
@adithyn7 adithyn7 changed the title Bench-runner LLM transport fix: HTTP/1.1 + no pool + 15s header timeout (LlmPolicy) Bench-runner v0.1.7: HTTP/1.1 transport fix + tunable LLM/tool policy via CLI/env (LlmPolicy) Jun 13, 2026
@adithyn7 adithyn7 changed the title Bench-runner v0.1.7: HTTP/1.1 transport fix + tunable LLM/tool policy via CLI/env (LlmPolicy) Bench-runner LLM transport fix: HTTP/1.1 + no pool + 15s header timeout (LlmPolicy) Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Patch version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant