Skip to content

feat(vis): full session debugging — tasks/cron, execution timeline, retries & tool progress#1210

Open
RealKai42 wants to merge 8 commits into
mainfrom
kaiyi/lagos-v1
Open

feat(vis): full session debugging — tasks/cron, execution timeline, retries & tool progress#1210
RealKai42 wants to merge 8 commits into
mainfrom
kaiyi/lagos-v1

Conversation

@RealKai42

Copy link
Copy Markdown
Collaborator

Make apps/vis surface everything a kimi-code session persists, and turn it from a record viewer into an analysis tool.

What changed

1. Background tasks & cron (feat(vis): surface background tasks and cron jobs)

The visualizer read every wire/state/blob artifact but ignored the two on-demand families agent-core writes per session — neither reconstructable from the wire:

  • Tasks tab: tasks/<id>.json + output.log (process / agent / question kinds) with status, timing, kind-specific fields, and a progressively paged log viewer (byte-window paging via an exact nextOffset cursor).
  • Cron tab: cron/<id>.json (expression, prompt, recurring/one-shot, last-fired).
  • Read-only server readers mirror agent-core's on-disk layout, id guards, and legacy task normalization.

2. agent-core: persist step retries and tool progress (feat(agent-core): …)

Two transient signals were live-only, so nothing survived for post-hoc analysis. Both are additive optional wire fields — no protocol bump, existing records keep loading:

  • step.end.retries — recovered transient provider failures (via a new onRetry callback on chatWithRetry).
  • tool.result.progress — a bounded sparse-progress summary (updateCount / lastStatus / maxPercent); streamed stdout/stderr is deliberately excluded.
  • New public types: LoopStepRetryRecord, LoopToolProgressSummary. Includes a changeset.

3. vis: execution-analysis timeline (feat(vis): add execution-analysis timeline …)

New Timeline tab folds the wire into turns → steps → tool calls client-side and derives what the flat list hides: per-turn/step/tool durations, per-turn token cost, a context-window-fill sparkline + cache-hit rate, tool usage stats, idle-gap detection, and a config-change timeline. Inline: Wire rows show tool call→result elapsed time and result truncation/size/retries/progress; the Issues drawer gains tool-error / truncation / filtered / max_tokens / retried categories; Tasks links agent tasks to the subagent's wire.

Tests

  • agent-core full suite: 3248 passed, 0 regressions (core loop touched).
  • vis-server: 113 passed (added task/cron lib + route tests).
  • vis-web: vitest newly wired up; analysis + issues unit tests.
  • Typecheck clean across agent-core / vis-server / vis-web; vis-web builds; lint clean on changed files.

The visualizer read every wire/state/blob artifact a session persists but
ignored the two on-demand families agent-core also writes under the session
directory: background tasks (tasks/<id>.json + output.log) and cron jobs
(cron/<id>.json). Neither is reconstructable from the wire, so there was no
way to inspect what a session spawned in the background or scheduled.

Server:
- task-store / cron-store read-only readers mirroring agent-core's on-disk
  layout, id-validation guard, and legacy snake_case task normalization
- GET /:id/tasks, /:id/tasks/:taskId/output (byte-window paged via an exact
  nextOffset cursor), and /:id/cron routes
- re-export the public background-task types from agent-core; mirror the
  non-exported CronTask shape with a fixture-backed drift test

Web:
- Tasks tab: process/agent/question kinds with status, timing, kind-specific
  fields, raw JSON, and a progressively paged output.log viewer
- Cron tab: expression, prompt, recurring/one-shot, created/last-fired
- count badges on both tabs

Tests: +20 (lib + route), all 113 vis-server tests green; web typecheck and
build clean.
Two transient signals were only ever emitted as live-only loop events, so
nothing survived in the agent record for post-hoc analysis:

- step retries: chatWithRetry gains an onRetry callback; turn-step collects
  the recovered attempts and attaches them to step.end as an optional
  `retries` array (previously only the live `step.retrying` event).
- tool progress: tool-call distills a tool's sparse status/percent updates
  into a bounded `progress` summary (updateCount / lastStatus / maxPercent)
  on tool.result. Streamed stdout/stderr is excluded — it would bloat the
  wire and is already reflected in the result output.

Both are additive optional fields, so the wire protocol version is unchanged
and existing records keep loading. New public types: LoopStepRetryRecord,
LoopToolProgressSummary.
Turn the debugger from a flat record viewer into an analysis tool.

New Timeline tab: folds the wire into turns → steps → tool calls (client-side,
no extra round-trip) and derives the metrics the raw list hides — per-turn /
per-step / per-tool duration, per-turn token cost, a context-window fill
sparkline with cache-hit rate, a tool usage table, idle-gap detection, and a
config-change timeline.

Inline elsewhere:
- Wire rows show tool.call → tool.result elapsed time; tool.result detail
  shows truncation, output size, retries, and the progress summary.
- Issues drawer gains tool-error, truncation, filtered, max_tokens, and
  retried categories.
- Tasks tab links agent-kind tasks to the subagent's wire.

Wires up vitest for the web package and adds analysis/issues unit tests.
@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 162708e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@moonshot-ai/agent-core Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 29, 2026

Copy link
Copy Markdown
pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@162708e
npx https://pkg.pr.new/@moonshot-ai/kimi-code@162708e

commit: 162708e

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff2d7185ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/vis/server/src/routes/tasks.ts Outdated
Comment thread apps/vis/server/src/routes/cron.ts Outdated
…ltering

A `/export-debug-zip` bundle is just `manifest.json` plus a flattened session
directory, which vis already knows how to read. Importing one therefore lights
up every existing tab for a session that lives on someone else's machine.

Server:
- zip-import: yauzl extraction with zip-slip path guards and entry-count /
  uncompressed-size caps for untrusted uploads.
- import-store: extract a bundle into <home>/imported/<imp_…>/, validate it
  has a main wire, and record an import-meta.json sidecar.
- session-store resolves imp_-prefixed ids against imported/, so wire /
  context / tasks / cron / blobs / logs all work on imported sessions; agent
  homedirs are re-derived locally (the bundle holds foreign absolute paths).
- POST /api/imports (raw zip body) and GET /api/sessions/:id/logs (structured
  log lines — also available for local sessions).

Web:
- session rail: import button + all/local/imported filter + imported badge.
- new Logs tab: virtualized, level filter, search, session/global toggle.
- manifest card atop the State tab for imported sessions.

SessionSummary/SessionDetail gain `imported` + `importMeta`. Tests cover
extraction, the zip-slip guard, list merge, reading an imported wire through
the existing route, and log parsing.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bb23c4b1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/agent-core/src/loop/tool-call.ts Outdated
Comment thread apps/vis/server/src/lib/log-reader.ts
…l status text

Addresses review feedback on the debug-tooling changes:

- Background tasks and cron jobs are persisted under each agent's homedir
  (<session>/agents/<id>/tasks and /cron), not the session root. The Tasks and
  Cron tabs read the session root, so they showed nothing for normal sessions.
  Both routes now aggregate across detail.agents homedirs; task entries carry
  the owning agentId. The route-test fixtures were writing to the wrong
  (session-root) location too — corrected to the real agents/main layout so
  they actually exercise the path.

- tool.result progress no longer keeps free-form status text, only updateCount
  and maxPercent. A tool's status string can contain sensitive data (e.g. an
  MCP OAuth authorization URL) that must not leak into persisted wire files or
  exported debug bundles.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3d558a3a8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/vis/server/src/routes/logs.ts Outdated
Comment thread apps/vis/server/src/lib/session-store.ts Outdated
The <main> flex child lacked min-w-0, so it defaulted to min-width:auto and
refused to shrink below its content's intrinsic width. Tabs that lay out in
normal flow with flex-wrap rows (the Timeline tab) then got unbounded width,
never wrapped, and blew the layout out to thousands of pixels wide. Adding
min-w-0 lets the column shrink to the available width so its content wraps,
truncates, or scrolls within its own container.
# Conflicts:
#	packages/agent-core/src/loop/tool-call.ts
…back

- Logs tab: for non-imported sessions the shared global log lives at
  <KIMI_CODE_HOME>/logs/kimi-code.log, not under the session dir (that path is
  only used inside exported bundles). The route now reads the home path for
  local sessions, so the global-log toggle works for them.
- Imported detail: a bundle's state.json is best-effort and may omit the
  agents map. When the inventory is empty, fall back to discovering agents
  from disk so routes that require an agent (wire/context) still resolve main.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 162708e90e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


async function readManifest(dir: string): Promise<ImportManifest | null> {
try {
return JSON.parse(await readFile(join(dir, 'manifest.json'), 'utf8')) as ImportManifest;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate imported manifest field types

For an imported zip with a syntactically valid but type-corrupt manifest.json (for example { "workspaceDir": 123 }), this cast accepts the value and listSessions later returns that non-string as workDir; the session rail calls s.workDir.split('/'), so merely opening the UI after importing the bundle can crash instead of treating the manifest field as absent. Coerce/validate manifest fields when reading them from the untrusted zip.

Useful? React with 👍 / 👎.

continue;
}
if (!isReadablePersistedTask(parsed)) continue;
out.push(normalizePersistedTask(parsed));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip malformed task records during normalization

When a local/imported task JSON has a valid safe filename and task_id but malformed legacy fields (for example stop_reason: 5 or subagent_type: 5), isReadablePersistedTask passes and this normalization path throws (trim on a non-string). That makes GET /:id/tasks fail and hides all remaining tasks for the session; validate the legacy field types or catch normalization failures and skip the bad record as the reader comments promise.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant