From 208eb60724a260cfde067164a37e96825e9fcf43 Mon Sep 17 00:00:00 2001 From: MK Date: Tue, 16 Jun 2026 15:05:01 -0400 Subject: [PATCH] docs(sync-docs): audit seq emit invariant + counter install order MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reflects PRs #173 and #174 (both merged to main) in the canonical audit doc and the comprehensive knowledge skill. docs/security/audit-logging.md — under "Sequence numbers", added: - Counter installation order — the SequenceCounter is installed by installSequenceCounterMiddleware (wraps the auth chain) so auth_verify / auth_fail land seq=1; the runner's request entry uses EnsureSequenceCounter to reuse the wrapper-installed counter and only allocate fresh on the --no-auth path. Pinned by TestAuthAudit_SeqStampedWhenCounterInstalled + TestEnsureSequenceCounter_ReusesExisting. - Emit invariant — per-invocation events MUST emit via EmitFromContext (or a typed helper); plain Emit skips the counter and the trace cross-link. The regression behind #173 (tool_exec + session_end) and #174 (auth_verify). A 4-row table names the sites that intentionally stay on plain Emit (egress proxy subprocess CONNECT, MCP startup events, scheduler tick, startup banners) and why. Issue #175 is named as the lint follow-up. .claude/skills/forge.md — three sections updated: - § 3 path trace: middleware order now shows installSequenceCounterMiddleware as outermost; runner request entry uses EnsureSequenceCounter. - § 12.4 FWS-8 section: added the Emit invariant paragraph naming the regression pins and exception list. - § 18 FWS-8 row: appended #173 / #174 follow-up summary + #175 lint tracker. No code change. TOC anchors unchanged. Broken-link sweep clean. --- .claude/skills/forge.md | 35 ++++++++++++++++++++++----- docs/security/audit-logging.md | 43 ++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 6 deletions(-) diff --git a/.claude/skills/forge.md b/.claude/skills/forge.md index 0f75e55..c8ea189 100644 --- a/.claude/skills/forge.md +++ b/.claude/skills/forge.md @@ -124,11 +124,16 @@ The path of a single A2A invocation: ``` 1. Inbound HTTP request → POST / with JSON-RPC envelope forge-cli/server/a2a_server.go:handleJSONRPC -2. Middleware chain: - auth (forge-core/auth/middleware.go) — verifies bearer - rate-limit — per-IP, FWS-10 +2. Middleware chain (outermost → innermost): + installSequenceCounterMiddleware — installs the + per-request SequenceCounter on r.Context() + BEFORE the auth chain runs so auth_verify / + auth_fail land seq=1 (FWS-8, fix #174) + auth (forge-core/auth/middleware.go) — verifies bearer; + OnAuth callback emits via EmitFromContext so + seq + trace_id + workflow tags are stamped + rate-limit — per-IP, FWS-10 CORS, security headers, request-size limits - audit middleware — auth_verify/auth_fail 3. Dispatch to method handler: tasks/send → runs the agent (long) tasks/cancel → signals an in-flight invocation (instant) @@ -137,7 +142,8 @@ The path of a single A2A invocation: 4. Request entry (forge-cli/runtime/runner.go): correlation_id generated task_id from params.ID - WithSequenceCounter installed (FWS-8) + EnsureSequenceCounter reuses the auth-installed counter + (or installs a fresh one on the --no-auth path) — FWS-8 LLMUsageAccumulator installed (FWS-3) CancellationRegistry registers a CancelCauseFunc (FWS-4) 5. AgentExecutor loop: @@ -552,6 +558,23 @@ fan-out across configured sinks. responsible for routing captured payloads to a store appropriate to their sensitivity — Forge does not redact. +**Emit invariant.** Every audit emission inside an A2A invocation +scope MUST go through `AuditLogger.EmitFromContext(ctx, ...)` (or one +of the typed helpers — `EmitLLMCall`, `EmitToolExec`, etc.). That's +the path that picks the per-invocation `SequenceCounter` off ctx and +stamps `seq` + `trace_id` + `span_id` + workflow tags. Plain `Emit` +emits raw — no seq, no trace link, no workflow tags. Regression pins: +`TestToolExecAudit_CarriesSequenceFromContext` (PR #173, +`tool_exec` + `session_end`), +`TestAuthAudit_SeqStampedWhenCounterInstalled` (PR #176, `auth_verify` +once the counter is installed by the middleware wrapper). Sites that +still call plain `Emit` are explicitly outside any invocation scope: +the egress proxy's subprocess CONNECT (no Go ctx tying back), +MCP-server startup (`mcp_server_started` / `_failed` / `_degraded`, +pre-invocation), and the scheduler tick (`schedule_fire` / +`schedule_complete`, runs on its own timer). Those events +intentionally have no `seq`. + **Sinks** (FWS-7): | Sink | Always on? | Notes | @@ -1009,7 +1032,7 @@ Every event also carries `schema_version: "1.0"` (FWS-8) and `seq` | **FWS-5** | #89 | Platform policy enforcement at runtime — workspace-level deploy-time bounds (egress / tools / models / sizes); `forge package` policy-ready manifests | `docs/security/platform-policy.md` | | **FWS-6** | #90 | Three-layer platform policy + channel scope — system / user / workspace layers compose by union + most-restrictive; `denied_channels` first-class; `forge channel disable/enable` edits the user layer | `docs/security/platform-policy.md` | | **FWS-7** | #95 | Audit event export capability — Unix Domain Socket sink + HTTP localhost fallback; fire-and-forget; `audit_export_status` periodic per-sink health | `docs/security/audit-logging.md` § Audit Event Export | -| **FWS-8** | #91 | Hardened audit emission — `schema_version` + monotonic `seq` per invocation; default metadata-only invariant pinned by regression test; opt-in `AuditPayloadCapture` with per-field byte caps | `docs/security/audit-logging.md` § Schema contract | +| **FWS-8** | #91 | Hardened audit emission — `schema_version` + monotonic `seq` per invocation; default metadata-only invariant pinned by regression test; opt-in `AuditPayloadCapture` with per-field byte caps. Follow-ups: #173 (PR closed the seq gap on `tool_exec` / `session_end` — 3 sites switched from plain `Emit` to `EmitFromContext`) and #174 (PR moved the `SequenceCounter` installation upstream of the auth middleware via a wrapper + `EnsureSequenceCounter` so `auth_verify` / `auth_fail` land seq=1). #175 tracks a follow-up lint to catch future `Emit`-instead-of-`EmitFromContext` drift. | `docs/security/audit-logging.md` § Schema contract / § Sequence numbers | | **FWS-9** | #100 | Ops logger output stream separation — stdout for `JSONLogger`, stderr stays for audit NDJSON | `docs/security/audit-logging.md` § Streams | | **FWS-10** | #110 | Rate-limit configurability + orchestration-friendly defaults + cancel exemption — `server.rate_limit:` yaml block + CLI flags + env; `tasks/cancel` exempt from the write bucket by default | `docs/reference/forge-yaml-schema.md` § `server.rate_limit` | | **OTel v1** | #108 | OpenTelemetry tracing — shipped across phases #101-#107 (PRs #122-#128). Tracer seam → OTLP provider → config resolver + CLI flags → span instrumentation across A2A/executor/LLM/tool → audit ↔ trace cross-link → end-to-end A2A propagation → build-time egress merge. Off by default; reuses the egress-enforced transport. | `docs/core-concepts/observability-tracing.md` | diff --git a/docs/security/audit-logging.md b/docs/security/audit-logging.md index 1f189b5..c0fd79d 100644 --- a/docs/security/audit-logging.md +++ b/docs/security/audit-logging.md @@ -490,6 +490,49 @@ start their own counters. Events emitted outside any invocation scope (`policy_loaded`, `agent_card_published`, `audit_export_status`) omit `seq` entirely. +#### Counter installation order + +The per-invocation `SequenceCounter` is installed on `r.Context()` by +`installSequenceCounterMiddleware`, which wraps the auth middleware so +the counter is already on context before the auth chain runs. This +puts `auth_verify` / `auth_fail` first in the sequence (`seq=1`) and +keeps the rest of the per-invocation events (`session_start`, +`guardrail_check`, `llm_call`, `tool_exec`, `invocation_complete`, +etc.) gap-free under the same `(correlation_id, task_id)` group. The +runner's request entry calls `coreruntime.EnsureSequenceCounter` — +which reuses the wrapper-installed counter when present and installs a +fresh one on the `--no-auth` path, so no embedder configuration loses +seq stamping. Pinned by `TestAuthAudit_SeqStampedWhenCounterInstalled` +and `TestEnsureSequenceCounter_ReusesExisting` (issue #174). + +#### Emit invariant + +The seq counter is picked up by `AuditLogger.EmitFromContext(ctx, ...)` +(and the typed helpers built on top of it — `EmitLLMCall`, +`EmitToolExec`, `EmitInvocationComplete`, `EmitInvocationCancelled`, +the egress and guardrail emit paths). Plain `AuditLogger.Emit` skips +the counter and the trace cross-link — so every audit emission that +happens inside an invocation scope MUST go through `EmitFromContext`. +This was the regression behind issues #173 (three sites — the +`BeforeToolExec` / `AfterToolExec` hook callbacks and the +outbound-guardrail-failure `session_end` emit — had drifted to plain +`Emit` and lost seq on `tool_exec` + that branch's `session_end`) and +#174 (the auth callback couldn't use `EmitFromContext` until the +counter was installed upstream of the auth middleware). Pinned by +`TestToolExecAudit_CarriesSequenceFromContext`. Sites that still call +plain `Emit` are explicitly outside any invocation scope and are +documented inline: + +| Site | Why plain `Emit` | +|---|---| +| Egress proxy `OnAttempt` with `source=proxy` | Subprocess HTTP `CONNECT` has no Go ctx tying back to the A2A request | +| MCP server startup events (`mcp_server_started` / `_failed` / `_degraded`) | Pre-invocation; no scope | +| Scheduler tick (`schedule_fire` / `schedule_complete` / `schedule_skip` / `schedule_modify`) | Runs on its own timer outside any A2A request | +| Startup banners (`policy_loaded`, `agent_card_published`, `audit_export_status`) | Pre-invocation; no scope | + +Issue #175 tracks a follow-up vet/lint pass to catch future +`Emit`-instead-of-`EmitFromContext` drift on per-invocation events. + ### Schema versioning policy | Change | Bumps version? |