|
| 1 | +--- |
| 2 | +phase: design |
| 3 | +title: Generalize Process-to-Session Mapping — Design |
| 4 | +description: Architecture for shared process detection, session matching, and per-agent adapters |
| 5 | +--- |
| 6 | + |
| 7 | +# System Design & Architecture |
| 8 | + |
| 9 | +## Architecture Overview |
| 10 | + |
| 11 | +```mermaid |
| 12 | +graph TD |
| 13 | + subgraph "Shared Utilities" |
| 14 | + P["utils/process.ts<br/>ps aux | grep, lsof batch,<br/>ps lstart, enrichProcesses"] |
| 15 | + S["utils/session.ts<br/>stat birthtime"] |
| 16 | + M["utils/matching.ts<br/>1:1 greedy matching + agent naming"] |
| 17 | + end |
| 18 | +
|
| 19 | + subgraph "Adapters (per-agent)" |
| 20 | + CA["ClaudeCodeAdapter"] |
| 21 | + XA["CodexAdapter"] |
| 22 | + GA["Future adapters..."] |
| 23 | + end |
| 24 | +
|
| 25 | + CA -->|uses| P |
| 26 | + CA -->|uses| S |
| 27 | + CA -->|uses| M |
| 28 | + XA -->|uses| P |
| 29 | + XA -->|uses| S |
| 30 | + XA -->|uses| M |
| 31 | + GA -->|uses| P |
| 32 | + GA -->|uses| S |
| 33 | + GA -->|uses| M |
| 34 | +
|
| 35 | + CA -->|implements| AI["AgentAdapter interface"] |
| 36 | + XA -->|implements| AI |
| 37 | + GA -->|implements| AI |
| 38 | +``` |
| 39 | + |
| 40 | +Each adapter implements `AgentAdapter` (unchanged interface), owns its detection flow and session scanning, and calls shared utilities for OS-level commands and matching. |
| 41 | + |
| 42 | +## Data Flow |
| 43 | + |
| 44 | +```mermaid |
| 45 | +sequenceDiagram |
| 46 | + participant A as Adapter |
| 47 | + participant P as utils/process |
| 48 | + participant S as utils/session |
| 49 | + participant M as utils/matching |
| 50 | +
|
| 51 | + A->>P: listAgentProcesses('claude') |
| 52 | + Note right of P: ps aux | grep claude<br/>+ post-filter executable name |
| 53 | + P-->>A: ProcessInfo[] (pid, command, tty) |
| 54 | +
|
| 55 | + A->>P: enrichProcesses(processes) |
| 56 | + Note right of P: batchGetProcessCwds (1 lsof)<br/>+ batchGetProcessStartTimes (1 ps lstart)<br/>→ populates cwd + startTime |
| 57 | + P-->>A: ProcessInfo[] (fully populated) |
| 58 | +
|
| 59 | + A->>A: discoverSessions(processes) |
| 60 | + Note right of A: Adapter-specific:<br/>CWD → session dir path(s)<br/>Sets resolvedCwd on each SessionFile<br/>CodexAdapter also caches file content |
| 61 | +
|
| 62 | + A->>S: batchGetSessionFileBirthtimes(dirs) |
| 63 | + Note right of S: stat -f '%B %N' (macOS)<br/>stat --format='%W %n' (Linux)<br/>Single call across all dirs |
| 64 | + S-->>A: SessionFile[] |
| 65 | +
|
| 66 | + A->>M: matchProcessesToSessions(processes, sessions) |
| 67 | + Note right of M: Filter: process.cwd === session.resolvedCwd<br/>Filter: deltaMs <= 180s<br/>Filter: startTime must exist<br/>1:1 greedy by smallest delta |
| 68 | + M-->>A: MatchResult[] |
| 69 | +
|
| 70 | + A->>A: parseSession / readSession per match |
| 71 | + Note right of A: Adapter-specific:<br/>Read JSONL for status/summary<br/>Only matched files<br/>CodexAdapter uses cached content |
| 72 | +
|
| 73 | + A->>M: generateAgentName(cwd, pid) |
| 74 | + M-->>A: "folderName (pid)" |
| 75 | +
|
| 76 | + A-->>A: AgentInfo[] |
| 77 | +``` |
| 78 | + |
| 79 | +## Data Models |
| 80 | + |
| 81 | +### ProcessInfo (existing, extended) |
| 82 | + |
| 83 | +```typescript |
| 84 | +interface ProcessInfo { |
| 85 | + pid: number; |
| 86 | + command: string; |
| 87 | + cwd: string; // populated by enrichProcesses |
| 88 | + tty: string; |
| 89 | + startTime?: Date; // populated by enrichProcesses |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +Adding `startTime?: Date` to the existing `ProcessInfo` in `AgentAdapter.ts`. This is a public type change — accepted since it's additive (optional field). |
| 94 | + |
| 95 | +### SessionFile (new, shared) |
| 96 | + |
| 97 | +```typescript |
| 98 | +interface SessionFile { |
| 99 | + sessionId: string; // filename without .jsonl |
| 100 | + filePath: string; // full path |
| 101 | + projectDir: string; // parent directory |
| 102 | + birthtimeMs: number; // from stat (epoch seconds × 1000 → milliseconds) |
| 103 | + resolvedCwd: string; // set by adapter: the CWD this session maps to |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +`resolvedCwd` is set by the adapter after calling `batchGetSessionFileBirthtimes()`. This keeps the CWD↔session mapping adapter-specific while allowing the shared matcher to compare `process.cwd === session.resolvedCwd` without callbacks or maps. |
| 108 | + |
| 109 | +### MatchResult (new, shared) |
| 110 | + |
| 111 | +```typescript |
| 112 | +interface MatchResult { |
| 113 | + process: ProcessInfo; |
| 114 | + session: SessionFile; |
| 115 | + deltaMs: number; // |process.startTime - session.birthtimeMs| |
| 116 | +} |
| 117 | +``` |
| 118 | + |
| 119 | +## Component Breakdown |
| 120 | + |
| 121 | +### `utils/process.ts` — Shell command wrappers for process data |
| 122 | + |
| 123 | +Extended from existing file. All `execSync` calls for process data live here. |
| 124 | + |
| 125 | +| Function | Shell command | Returns | |
| 126 | +|----------|-------------|---------| |
| 127 | +| `listAgentProcesses(namePattern)` | `ps aux \| grep <pattern>` + post-filter executable basename | `ProcessInfo[]` (pid, command, tty — cwd/startTime empty) | |
| 128 | +| `batchGetProcessCwds(pids)` | `lsof -a -d cwd -Fn -p PID1,PID2,...` | `Map<number, string>` | |
| 129 | +| `batchGetProcessStartTimes(pids)` | `ps -o pid=,lstart= -p PID1,PID2,...` | `Map<number, Date>` | |
| 130 | +| `enrichProcesses(processes)` | Calls `batchGetProcessCwds` + `batchGetProcessStartTimes` | `ProcessInfo[]` with cwd and startTime populated | |
| 131 | + |
| 132 | +Notes: |
| 133 | +- `listAgentProcesses` uses `grep` at shell level for performance, then post-filters by checking `path.basename(executable)` matches exactly (avoids matching `claude-helper`, `vscode-claude-extension`, or the grep process itself) |
| 134 | +- `enrichProcesses` is a convenience that calls both batch functions and merges results into each `ProcessInfo`. Returns partial results — if `lsof` fails for a PID, that process gets empty cwd; if `ps lstart` fails for a PID, that process gets no `startTime` |
| 135 | +- `batchGetProcessStartTimes` uses `lstart` format (full timestamp like `Thu Feb 5 16:00:57 2026`) instead of lossy `etime` |
| 136 | + |
| 137 | +### `utils/session.ts` — Shell command wrappers for session files |
| 138 | + |
| 139 | +New file. |
| 140 | + |
| 141 | +| Function | Shell command | Returns | |
| 142 | +|----------|-------------|---------| |
| 143 | +| `batchGetSessionFileBirthtimes(dirs)` | `stat -f '%B %N' dir1/*.jsonl dir2/*.jsonl ...` (macOS) or `stat --format='%W %n' ...` (Linux) | `SessionFile[]` | |
| 144 | + |
| 145 | +Notes: |
| 146 | +- Combines all directory globs into a single `stat` call |
| 147 | +- Uses `stat` instead of `ls -lU` — gives epoch seconds (exact, no parsing ambiguity) |
| 148 | +- Platform detection via `process.platform` |
| 149 | +- Returns empty array if directories don't exist, have no `.jsonl` files, or command fails |
| 150 | +- `resolvedCwd` is left empty — adapter must set it after calling this function |
| 151 | + |
| 152 | +### `utils/matching.ts` — Shared matching algorithm and naming |
| 153 | + |
| 154 | +New file. |
| 155 | + |
| 156 | +| Function | Description | |
| 157 | +|----------|-------------| |
| 158 | +| `matchProcessesToSessions(processes, sessions)` | 1:1 greedy assignment by closest birthtimeMs | |
| 159 | +| `generateAgentName(cwd, pid)` | Returns `basename(cwd) (pid)` | |
| 160 | + |
| 161 | +#### Matching algorithm |
| 162 | + |
| 163 | +``` |
| 164 | +Input: |
| 165 | + processes: ProcessInfo[] (with cwd and startTime populated) |
| 166 | + sessions: SessionFile[] (with resolvedCwd set by adapter) |
| 167 | +
|
| 168 | +1. Filter processes: exclude any where startTime is undefined |
| 169 | + (→ these become process-only fallback in the adapter) |
| 170 | +
|
| 171 | +2. Build candidate pairs: |
| 172 | + for each process P, for each session S: |
| 173 | + if P.cwd === S.resolvedCwd: |
| 174 | + deltaMs = |P.startTime - S.birthtimeMs| |
| 175 | + if deltaMs <= 180_000 (3 minutes): |
| 176 | + add (P, S, deltaMs) to candidates |
| 177 | +
|
| 178 | +3. Sort candidates by deltaMs ascending (best matches first) |
| 179 | +
|
| 180 | +4. Greedy assign: |
| 181 | + matchedPids = Set() |
| 182 | + matchedSessionIds = Set() |
| 183 | + results = [] |
| 184 | +
|
| 185 | + for each (P, S, deltaMs) in candidates: |
| 186 | + if P.pid in matchedPids → skip |
| 187 | + if S.sessionId in matchedSessionIds → skip |
| 188 | + assign P ↔ S |
| 189 | + results.push({ process: P, session: S, deltaMs }) |
| 190 | +
|
| 191 | +5. Return results |
| 192 | +``` |
| 193 | + |
| 194 | +Unmatched processes (no session within tolerance, or no startTime) → adapter creates process-only fallback AgentInfo. |
| 195 | + |
| 196 | +### Per-adapter responsibilities |
| 197 | + |
| 198 | +| Responsibility | Stays in adapter | Reason | |
| 199 | +|---|---|---| |
| 200 | +| `canHandle(command)` | Yes (interface contract) | Kept for interface, but `listAgentProcesses` already filters | |
| 201 | +| Session dir scanning | Yes | Claude: `~/.claude/projects/<encoded>/`, Codex: `~/.codex/sessions/YYYY/MM/DD/` | |
| 202 | +| CWD → session dir mapping | Yes | Adapter sets `resolvedCwd` on each SessionFile | |
| 203 | +| Session parsing (`parseSession`/`readSession`) | Yes | JSONL schema differs per agent. CodexAdapter supports cached content to avoid double I/O. | |
| 204 | +| `determineStatus(session)` | Yes | Entry types and status mapping differ | |
| 205 | +| Summary extraction | Yes | Content structure differs | |
| 206 | + |
| 207 | +#### Codex date-dir scanning |
| 208 | + |
| 209 | +Codex stores sessions in `~/.codex/sessions/YYYY/MM/DD/*.jsonl`. The adapter will: |
| 210 | +1. Use process start times (from `enrichProcesses`) to determine date dirs |
| 211 | +2. Scan date directories around each process start date (±1 day window) |
| 212 | +3. Call `batchGetSessionFileBirthtimes(dateDirs)` once with all date directories |
| 213 | +4. Read each file once and cache content in `Map<string, string>` for later parsing |
| 214 | +5. Set `resolvedCwd` from the session_meta first line's `cwd` field |
| 215 | + |
| 216 | +## Design Decisions |
| 217 | + |
| 218 | +### Adapter pattern over base class / plugin |
| 219 | + |
| 220 | +- Adapters own their full flow and can diverge freely |
| 221 | +- Shared logic pulled in as utility functions, not inherited |
| 222 | +- No inversion of control — adapter calls utils, not the other way around |
| 223 | + |
| 224 | +### birthtimeMs via `stat` over JSONL first-entry timestamp |
| 225 | + |
| 226 | +- Zero file I/O for matching — `stat` gives epoch seconds directly |
| 227 | +- No date format parsing ambiguity (unlike `ls -lU` which shows `MMM DD HH:MM` lossy format) |
| 228 | +- OS-level timestamp, no app-level lag |
| 229 | +- Dry-run validated: 6/8 exact matches, 2/8 within 3min tolerance |
| 230 | +- Known limitation: session resumption without process restart (accepted) |
| 231 | + |
| 232 | +### `stat` over `ls -lU` |
| 233 | + |
| 234 | +- `ls -lU` date format is lossy — no seconds for recent files, no year for old files |
| 235 | +- `stat -f '%B %N'` (macOS) and `stat --format='%W %n'` (Linux) give epoch seconds |
| 236 | +- Exact timestamps, trivial to parse (split on space, `parseInt`) |
| 237 | + |
| 238 | +### `resolvedCwd` on SessionFile over callback/map |
| 239 | + |
| 240 | +- Adapter sets `resolvedCwd` after getting birthtimes, before calling matcher |
| 241 | +- Matcher compares `process.cwd === session.resolvedCwd` — pure, no adapter-specific logic |
| 242 | +- No callback indirection, no map lookup |
| 243 | + |
| 244 | +### `enrichProcesses` convenience function |
| 245 | + |
| 246 | +- Adapter calls `listAgentProcesses` then `enrichProcesses` — two calls instead of managing 3 separate maps |
| 247 | +- Returns partial results — if one PID fails, others still get populated |
| 248 | +- Processes without `startTime` are excluded from matching (→ process-only fallback) |
| 249 | + |
| 250 | +### Greedy 1:1 over multi-pass modes |
| 251 | + |
| 252 | +- Single greedy pass sorted by delta ascending |
| 253 | +- Simpler, deterministic, no pass-ordering side effects |
| 254 | +- Parent-child matching dropped — exact CWD match only |
| 255 | + |
| 256 | +### Agent naming: `folderName (pid)` |
| 257 | + |
| 258 | +- Deterministic, no JSONL parse needed |
| 259 | +- PID always included for uniqueness |
| 260 | +- Breaking change from slug-based naming — accepted |
| 261 | + |
| 262 | +### Batched shell calls |
| 263 | + |
| 264 | +- 1 `lsof` for all PIDs vs N per-PID calls |
| 265 | +- 1 `ps -o lstart` for all PIDs vs N `ps -o etime` calls |
| 266 | +- grep at shell level vs list-all-then-filter-in-code |
| 267 | + |
| 268 | +### 3-minute tolerance |
| 269 | + |
| 270 | +- Covers all observed deltas (23s to 2m24s) with margin |
| 271 | +- Beyond tolerance → process-only fallback (wrong match worse than no match) |
| 272 | + |
| 273 | +### Error handling |
| 274 | + |
| 275 | +- Shell command utils return partial results — if lsof fails for 1 of 5 PIDs, the other 4 still return |
| 276 | +- Future: `--verbose` mode will log matching details (which candidates were considered, why matches were rejected) to log files for debugging |
| 277 | + |
| 278 | +## Non-Functional Requirements |
| 279 | + |
| 280 | +- **Performance**: Detection < 500ms for 10 processes, 50 session files |
| 281 | +- **Correctness**: Identical output for non-edge-case scenarios |
| 282 | +- **Portability**: macOS and Linux (no Windows) |
| 283 | +- **Testability**: Shared utils independently testable — mock `execSync` at module level with `jest.mock` |
0 commit comments