Skip to content

Commit c84735c

Browse files
Generalize process-to-session mapping for CLI agent adapters (#45)
* Add generalize session mapping plan * Implement generalize session mapping * Clean up dead code * Update docs * Fix lint
1 parent f8b2112 commit c84735c

24 files changed

Lines changed: 2418 additions & 2836 deletions
Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
---
2+
phase: design
3+
title: Generalize Process-to-Session Mapping — Design
4+
description: Architecture for shared process detection, session matching, and per-agent adapters
5+
---
6+
7+
# System Design & Architecture
8+
9+
## Architecture Overview
10+
11+
```mermaid
12+
graph TD
13+
subgraph "Shared Utilities"
14+
P["utils/process.ts<br/>ps aux | grep, lsof batch,<br/>ps lstart, enrichProcesses"]
15+
S["utils/session.ts<br/>stat birthtime"]
16+
M["utils/matching.ts<br/>1:1 greedy matching + agent naming"]
17+
end
18+
19+
subgraph "Adapters (per-agent)"
20+
CA["ClaudeCodeAdapter"]
21+
XA["CodexAdapter"]
22+
GA["Future adapters..."]
23+
end
24+
25+
CA -->|uses| P
26+
CA -->|uses| S
27+
CA -->|uses| M
28+
XA -->|uses| P
29+
XA -->|uses| S
30+
XA -->|uses| M
31+
GA -->|uses| P
32+
GA -->|uses| S
33+
GA -->|uses| M
34+
35+
CA -->|implements| AI["AgentAdapter interface"]
36+
XA -->|implements| AI
37+
GA -->|implements| AI
38+
```
39+
40+
Each adapter implements `AgentAdapter` (unchanged interface), owns its detection flow and session scanning, and calls shared utilities for OS-level commands and matching.
41+
42+
## Data Flow
43+
44+
```mermaid
45+
sequenceDiagram
46+
participant A as Adapter
47+
participant P as utils/process
48+
participant S as utils/session
49+
participant M as utils/matching
50+
51+
A->>P: listAgentProcesses('claude')
52+
Note right of P: ps aux | grep claude<br/>+ post-filter executable name
53+
P-->>A: ProcessInfo[] (pid, command, tty)
54+
55+
A->>P: enrichProcesses(processes)
56+
Note right of P: batchGetProcessCwds (1 lsof)<br/>+ batchGetProcessStartTimes (1 ps lstart)<br/>→ populates cwd + startTime
57+
P-->>A: ProcessInfo[] (fully populated)
58+
59+
A->>A: discoverSessions(processes)
60+
Note right of A: Adapter-specific:<br/>CWD → session dir path(s)<br/>Sets resolvedCwd on each SessionFile<br/>CodexAdapter also caches file content
61+
62+
A->>S: batchGetSessionFileBirthtimes(dirs)
63+
Note right of S: stat -f '%B %N' (macOS)<br/>stat --format='%W %n' (Linux)<br/>Single call across all dirs
64+
S-->>A: SessionFile[]
65+
66+
A->>M: matchProcessesToSessions(processes, sessions)
67+
Note right of M: Filter: process.cwd === session.resolvedCwd<br/>Filter: deltaMs <= 180s<br/>Filter: startTime must exist<br/>1:1 greedy by smallest delta
68+
M-->>A: MatchResult[]
69+
70+
A->>A: parseSession / readSession per match
71+
Note right of A: Adapter-specific:<br/>Read JSONL for status/summary<br/>Only matched files<br/>CodexAdapter uses cached content
72+
73+
A->>M: generateAgentName(cwd, pid)
74+
M-->>A: "folderName (pid)"
75+
76+
A-->>A: AgentInfo[]
77+
```
78+
79+
## Data Models
80+
81+
### ProcessInfo (existing, extended)
82+
83+
```typescript
84+
interface ProcessInfo {
85+
pid: number;
86+
command: string;
87+
cwd: string; // populated by enrichProcesses
88+
tty: string;
89+
startTime?: Date; // populated by enrichProcesses
90+
}
91+
```
92+
93+
Adding `startTime?: Date` to the existing `ProcessInfo` in `AgentAdapter.ts`. This is a public type change — accepted since it's additive (optional field).
94+
95+
### SessionFile (new, shared)
96+
97+
```typescript
98+
interface SessionFile {
99+
sessionId: string; // filename without .jsonl
100+
filePath: string; // full path
101+
projectDir: string; // parent directory
102+
birthtimeMs: number; // from stat (epoch seconds × 1000 → milliseconds)
103+
resolvedCwd: string; // set by adapter: the CWD this session maps to
104+
}
105+
```
106+
107+
`resolvedCwd` is set by the adapter after calling `batchGetSessionFileBirthtimes()`. This keeps the CWD↔session mapping adapter-specific while allowing the shared matcher to compare `process.cwd === session.resolvedCwd` without callbacks or maps.
108+
109+
### MatchResult (new, shared)
110+
111+
```typescript
112+
interface MatchResult {
113+
process: ProcessInfo;
114+
session: SessionFile;
115+
deltaMs: number; // |process.startTime - session.birthtimeMs|
116+
}
117+
```
118+
119+
## Component Breakdown
120+
121+
### `utils/process.ts` — Shell command wrappers for process data
122+
123+
Extended from existing file. All `execSync` calls for process data live here.
124+
125+
| Function | Shell command | Returns |
126+
|----------|-------------|---------|
127+
| `listAgentProcesses(namePattern)` | `ps aux \| grep <pattern>` + post-filter executable basename | `ProcessInfo[]` (pid, command, tty — cwd/startTime empty) |
128+
| `batchGetProcessCwds(pids)` | `lsof -a -d cwd -Fn -p PID1,PID2,...` | `Map<number, string>` |
129+
| `batchGetProcessStartTimes(pids)` | `ps -o pid=,lstart= -p PID1,PID2,...` | `Map<number, Date>` |
130+
| `enrichProcesses(processes)` | Calls `batchGetProcessCwds` + `batchGetProcessStartTimes` | `ProcessInfo[]` with cwd and startTime populated |
131+
132+
Notes:
133+
- `listAgentProcesses` uses `grep` at shell level for performance, then post-filters by checking `path.basename(executable)` matches exactly (avoids matching `claude-helper`, `vscode-claude-extension`, or the grep process itself)
134+
- `enrichProcesses` is a convenience that calls both batch functions and merges results into each `ProcessInfo`. Returns partial results — if `lsof` fails for a PID, that process gets empty cwd; if `ps lstart` fails for a PID, that process gets no `startTime`
135+
- `batchGetProcessStartTimes` uses `lstart` format (full timestamp like `Thu Feb 5 16:00:57 2026`) instead of lossy `etime`
136+
137+
### `utils/session.ts` — Shell command wrappers for session files
138+
139+
New file.
140+
141+
| Function | Shell command | Returns |
142+
|----------|-------------|---------|
143+
| `batchGetSessionFileBirthtimes(dirs)` | `stat -f '%B %N' dir1/*.jsonl dir2/*.jsonl ...` (macOS) or `stat --format='%W %n' ...` (Linux) | `SessionFile[]` |
144+
145+
Notes:
146+
- Combines all directory globs into a single `stat` call
147+
- Uses `stat` instead of `ls -lU` — gives epoch seconds (exact, no parsing ambiguity)
148+
- Platform detection via `process.platform`
149+
- Returns empty array if directories don't exist, have no `.jsonl` files, or command fails
150+
- `resolvedCwd` is left empty — adapter must set it after calling this function
151+
152+
### `utils/matching.ts` — Shared matching algorithm and naming
153+
154+
New file.
155+
156+
| Function | Description |
157+
|----------|-------------|
158+
| `matchProcessesToSessions(processes, sessions)` | 1:1 greedy assignment by closest birthtimeMs |
159+
| `generateAgentName(cwd, pid)` | Returns `basename(cwd) (pid)` |
160+
161+
#### Matching algorithm
162+
163+
```
164+
Input:
165+
processes: ProcessInfo[] (with cwd and startTime populated)
166+
sessions: SessionFile[] (with resolvedCwd set by adapter)
167+
168+
1. Filter processes: exclude any where startTime is undefined
169+
(→ these become process-only fallback in the adapter)
170+
171+
2. Build candidate pairs:
172+
for each process P, for each session S:
173+
if P.cwd === S.resolvedCwd:
174+
deltaMs = |P.startTime - S.birthtimeMs|
175+
if deltaMs <= 180_000 (3 minutes):
176+
add (P, S, deltaMs) to candidates
177+
178+
3. Sort candidates by deltaMs ascending (best matches first)
179+
180+
4. Greedy assign:
181+
matchedPids = Set()
182+
matchedSessionIds = Set()
183+
results = []
184+
185+
for each (P, S, deltaMs) in candidates:
186+
if P.pid in matchedPids → skip
187+
if S.sessionId in matchedSessionIds → skip
188+
assign P ↔ S
189+
results.push({ process: P, session: S, deltaMs })
190+
191+
5. Return results
192+
```
193+
194+
Unmatched processes (no session within tolerance, or no startTime) → adapter creates process-only fallback AgentInfo.
195+
196+
### Per-adapter responsibilities
197+
198+
| Responsibility | Stays in adapter | Reason |
199+
|---|---|---|
200+
| `canHandle(command)` | Yes (interface contract) | Kept for interface, but `listAgentProcesses` already filters |
201+
| Session dir scanning | Yes | Claude: `~/.claude/projects/<encoded>/`, Codex: `~/.codex/sessions/YYYY/MM/DD/` |
202+
| CWD → session dir mapping | Yes | Adapter sets `resolvedCwd` on each SessionFile |
203+
| Session parsing (`parseSession`/`readSession`) | Yes | JSONL schema differs per agent. CodexAdapter supports cached content to avoid double I/O. |
204+
| `determineStatus(session)` | Yes | Entry types and status mapping differ |
205+
| Summary extraction | Yes | Content structure differs |
206+
207+
#### Codex date-dir scanning
208+
209+
Codex stores sessions in `~/.codex/sessions/YYYY/MM/DD/*.jsonl`. The adapter will:
210+
1. Use process start times (from `enrichProcesses`) to determine date dirs
211+
2. Scan date directories around each process start date (±1 day window)
212+
3. Call `batchGetSessionFileBirthtimes(dateDirs)` once with all date directories
213+
4. Read each file once and cache content in `Map<string, string>` for later parsing
214+
5. Set `resolvedCwd` from the session_meta first line's `cwd` field
215+
216+
## Design Decisions
217+
218+
### Adapter pattern over base class / plugin
219+
220+
- Adapters own their full flow and can diverge freely
221+
- Shared logic pulled in as utility functions, not inherited
222+
- No inversion of control — adapter calls utils, not the other way around
223+
224+
### birthtimeMs via `stat` over JSONL first-entry timestamp
225+
226+
- Zero file I/O for matching — `stat` gives epoch seconds directly
227+
- No date format parsing ambiguity (unlike `ls -lU` which shows `MMM DD HH:MM` lossy format)
228+
- OS-level timestamp, no app-level lag
229+
- Dry-run validated: 6/8 exact matches, 2/8 within 3min tolerance
230+
- Known limitation: session resumption without process restart (accepted)
231+
232+
### `stat` over `ls -lU`
233+
234+
- `ls -lU` date format is lossy — no seconds for recent files, no year for old files
235+
- `stat -f '%B %N'` (macOS) and `stat --format='%W %n'` (Linux) give epoch seconds
236+
- Exact timestamps, trivial to parse (split on space, `parseInt`)
237+
238+
### `resolvedCwd` on SessionFile over callback/map
239+
240+
- Adapter sets `resolvedCwd` after getting birthtimes, before calling matcher
241+
- Matcher compares `process.cwd === session.resolvedCwd` — pure, no adapter-specific logic
242+
- No callback indirection, no map lookup
243+
244+
### `enrichProcesses` convenience function
245+
246+
- Adapter calls `listAgentProcesses` then `enrichProcesses` — two calls instead of managing 3 separate maps
247+
- Returns partial results — if one PID fails, others still get populated
248+
- Processes without `startTime` are excluded from matching (→ process-only fallback)
249+
250+
### Greedy 1:1 over multi-pass modes
251+
252+
- Single greedy pass sorted by delta ascending
253+
- Simpler, deterministic, no pass-ordering side effects
254+
- Parent-child matching dropped — exact CWD match only
255+
256+
### Agent naming: `folderName (pid)`
257+
258+
- Deterministic, no JSONL parse needed
259+
- PID always included for uniqueness
260+
- Breaking change from slug-based naming — accepted
261+
262+
### Batched shell calls
263+
264+
- 1 `lsof` for all PIDs vs N per-PID calls
265+
- 1 `ps -o lstart` for all PIDs vs N `ps -o etime` calls
266+
- grep at shell level vs list-all-then-filter-in-code
267+
268+
### 3-minute tolerance
269+
270+
- Covers all observed deltas (23s to 2m24s) with margin
271+
- Beyond tolerance → process-only fallback (wrong match worse than no match)
272+
273+
### Error handling
274+
275+
- Shell command utils return partial results — if lsof fails for 1 of 5 PIDs, the other 4 still return
276+
- Future: `--verbose` mode will log matching details (which candidates were considered, why matches were rejected) to log files for debugging
277+
278+
## Non-Functional Requirements
279+
280+
- **Performance**: Detection < 500ms for 10 processes, 50 session files
281+
- **Correctness**: Identical output for non-edge-case scenarios
282+
- **Portability**: macOS and Linux (no Windows)
283+
- **Testability**: Shared utils independently testable — mock `execSync` at module level with `jest.mock`
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
phase: implementation
3+
title: Generalize Process-to-Session Mapping — Implementation
4+
description: Implementation notes for shared utilities and adapter refactoring
5+
---
6+
7+
# Implementation Guide
8+
9+
## Code Structure
10+
11+
```
12+
packages/agent-manager/src/
13+
├── adapters/
14+
│ ├── AgentAdapter.ts # Interface + ProcessInfo (added startTime?)
15+
│ ├── ClaudeCodeAdapter.ts # ~419 lines — session dir via path encoding
16+
│ └── CodexAdapter.ts # ~319 lines — session dir via date dirs
17+
├── utils/
18+
│ ├── process.ts # Shell wrappers: ps aux, lsof, ps lstart, getProcessTty
19+
│ ├── session.ts # Shell wrappers: stat for birthtimes
20+
│ ├── matching.ts # 1:1 greedy matching + agent naming
21+
│ └── index.ts # Re-exports
22+
└── AgentManager.ts # Orchestrates adapters
23+
```
24+
25+
## Implementation Notes
26+
27+
### Shared Utilities
28+
29+
**`utils/process.ts`** — All `execSync` calls for process data:
30+
- `listAgentProcesses(namePattern)`: Uses `[c]laude` grep trick to avoid matching grep itself. Post-filters by `path.basename(executable)` for exact match. Input validated against `/^[a-zA-Z0-9_-]+$/` to prevent shell injection.
31+
- `batchGetProcessCwds(pids)`: Single `lsof -a -d cwd -Fn -p PID1,PID2,...`. Falls back to per-PID `pwdx` on Linux if lsof fails.
32+
- `batchGetProcessStartTimes(pids)`: Single `ps -o pid=,lstart=`. Parses full timestamp via `new Date(dateStr)`.
33+
- `enrichProcesses(processes)`: Convenience — calls both batch functions, populates in-place.
34+
35+
**`utils/session.ts`** — Session file discovery:
36+
- `batchGetSessionFileBirthtimes(dirs)`: Combines all dir globs into single `stat` call. Uses `|| true` to handle empty globs gracefully.
37+
38+
**`utils/matching.ts`** — Matching algorithm:
39+
- `matchProcessesToSessions`: Builds candidate pairs (CWD match + within 3min tolerance), sorts by delta ascending, greedy 1:1 assign.
40+
- `generateAgentName(cwd, pid)`: Returns `basename(cwd) (pid)` or `unknown (pid)`.
41+
42+
### Adapter-Specific Logic
43+
44+
**ClaudeCodeAdapter**:
45+
- Session dir: `~/.claude/projects/<encoded>/` where encoded = `cwd.replace(/\//g, '-')`
46+
- `discoverSessions`: Encodes each unique process CWD, checks if dir exists, calls `batchGetSessionFileBirthtimes`, sets `resolvedCwd` from dir-to-CWD mapping
47+
- `readSession(filePath, projectPath)`: Parses all JSONL lines for timestamps, slug, cwd, entry type, interruption state, user message text
48+
- Status: Based on `lastEntryType` (user/assistant/progress/thinking/system). No age-based override since process is confirmed running.
49+
50+
**CodexAdapter**:
51+
- Session dir: `~/.codex/sessions/YYYY/MM/DD/`
52+
- `discoverSessions`: Scans ±1 day window around each process start time. Reads each file once into `contentCache: Map<string, string>`. Sets `resolvedCwd` from `session_meta` first line.
53+
- `parseSession(cachedContent, filePath)`: Uses cached content when available, falls back to disk read. Extracts session ID, project path, summary, timestamps, last payload type.
54+
- Status: Based on `lastPayloadType` and 5-minute idle threshold.
55+
56+
## Error Handling
57+
58+
- Shell command utils return partial results — if lsof/ps fails for one PID, others still return
59+
- Session file read failures are silently skipped (file may have been deleted between stat and read)
60+
- Adapters fall back to process-only AgentInfo for unmatched processes
61+
- `listAgentProcesses` rejects patterns with shell metacharacters (returns `[]`)
62+
63+
## Performance
64+
65+
- 1 `ps aux | grep` per adapter (not per process)
66+
- 1 `lsof` for all PIDs (not per PID)
67+
- 1 `ps -o lstart` for all PIDs
68+
- 1 `stat` per adapter across all session directories
69+
- JSONL files only read for matched sessions (CodexAdapter caches content from discovery phase)
70+
- Legacy `listProcesses`, `getProcessCwd`, `getSessionFileBirthtimes` removed — no consumers
71+
- `getProcessTty` kept — used by `TerminalFocusManager`
72+
73+
## Dead Code Removed
74+
75+
**agent-manager package:**
76+
- `utils/file.ts` — entire file (`readLastLines`, `readJsonLines`) — no production callers
77+
- `utils/process.ts``listProcesses`, `getProcessCwd`, `ListProcessesOptions` — deprecated, no callers
78+
- `utils/session.ts``getSessionFileBirthtimes` — unused wrapper, all callers use batch version
79+
80+
**CLI package:**
81+
- `util/process.ts` — entire file (`listProcesses`, `getProcessCwd`, `getProcessTty`, `isProcessRunning`, `getProcessInfo`) — zero production imports
82+
- `util/file.ts` — entire file (`readLastLines`, `readJsonLines`, `fileExists`, `readJson`) — zero production imports

0 commit comments

Comments
 (0)