fix(cli): reconcile hybrid stitch+stamp identity with shared LegacyIdentityStitch service#5607
Conversation
Two concurrent writers in the same millisecond shared a tmp path and raced the rename into ENOENT. Surfaced as a cross-file flake in the login/logout integration tests, but two real CLI processes could hit it the same way.
After the first authenticated API call, the user UUID is stashed in process memory and used as distinct_id on all subsequent capture events (stamping, zero extra PostHog events). The $create_alias merge and the telemetry.json write now only happen in persistent runtimes, and the gate lives inside the stitch owner on every surface (Go StitchLogin, legacy TS stitchLogin + platform-api layer, next TS login) so no call site can forget it. Logout clears both the in-memory and persisted identity on all surfaces, and the redundant $identify that GROWTH-890 missed in next/login is gone. This restores per-user attribution for cli_* events in CI/Docker/npx (lost to the #5366 gate) without re-introducing the identify/alias volume spike. See docs/adr/0013-hybrid-stitch-stamp-identity-attribution.md. GROWTH-891
Three fixes from an independent review of the hybrid stitch+stamp change: - Stamp the in-memory identity from the first authenticated response even when a persisted distinct_id already exists (stale disk identity + a different live token previously kept attributing events to the old user). Alias only the first identity a device ever sees — re-aliasing on re-login, or on the login command's direct StitchLogin call after the response hook already stitched, would merge a second user into the device's existing person graph. - Mark the legacy TS stitch attempt before its first yield point so concurrent authenticated responses cannot double-stitch. - Clear the telemetry identity on logout even when no token exists; a stale distinct_id can outlive the token.
Second-round review findings: - Login as A, logout, login as B re-aliased the same device — a device already merged into A's person graph. Logout now resets the identity entirely (new ResetIdentity / resetIdentity): forget the user AND rotate the device id, so the next login aliases a fresh device. Transient failure paths keep ClearDistinctID and preserve the device id. - A failed alias enqueue left the in-memory identity set, so the login command's follow-up StitchLogin treated the identity as already aliased and permanently skipped the history merge. The identity is now un-stashed on enqueue failure, keeping the first-identity gate retryable.
…red service develop's #5579 (db lint/advisors port) extracted the legacy identity stitch into a shared LegacyIdentityStitch service with a single per-command stitchAttempted guard — exactly the architecture we want. But it was a port of the pre-891 behavior: it only set a distinct_id when it aliased (persistent, first-login), never stamped in CI/Docker/npx, set stitchAttempted after the file-read yield, and read the now-removed runtime.distinctId. This merge keeps #5579's shared-service architecture and folds the hybrid stitch+stamp behavior into it: - Stamp the in-memory identity (runtime.identity) on the first authenticated response in every runtime, so captures in CI/Docker/npx carry the real user id; alias + telemetry.json write still only happen on a persistent machine. - Set stitchAttempted before the first yield (concurrency guard). - Stamp without aliasing when an identity already exists (avoids merging two person graphs). - stitchedDistinctId() now returns runtime.identity.current() so the post-run cli_command_executed event is attributed in every runtime. legacy-analytics resolves to runtime.identity.current() for distinctId while keeping develop's already-keyed groups map. Stitch behavior assertions moved to legacy-identity-stitch.integration.test.ts (added CI-stamp-no-alias, stale-identity-stamp-no-alias, concurrent-alias-once); the platform-api layer test keeps develop's service-mocked wiring. Fixed test runtimes still using the removed distinctId field to use makeTelemetryIdentity. GROWTH-891
Supabase CLI previewnpx --yes https://pkg.pr.new/supabase/cli/supabase@509c563f965335ba421b15cfdacda88491ffb44dPreview package for commit |
|
Heads up on CI: the only red check here is Codegen, and it's not from this PR — this branch touches zero codegen/API files.
Fix is upstream: add the discriminator |
Problem
The #5366 gate stopped the ephemeral-env
$identifyspike, but at the cost of attribution: in CI, Docker, andnpx supabase,cli_*events stay orphaned on throwaway device IDs and never link to the authenticated user. GROWTH-891 (#5559) fixes that with a hybrid stitch+stamp model.While #5559 was in review, #5579 (db lint/advisors port) landed on develop and independently extracted the legacy identity stitch into a shared
LegacyIdentityStitchservice — one per-commandstitchAttemptedguard so the advisor transports alias at most once. That's the architecture we want, but it's a port of the pre-891 behavior: it only stamps when it aliases (persistent, first login), so it doesn't restore CI/Docker/npx attribution; it setsstitchAttemptedafter the file-read yield; and it reads theruntime.distinctIdfield that 891 replaced with a mutable identity slot. Merging #5559 on top as-is would silently drop the attribution feature and reintroduce the race.This PR reconciles the two: keep #5579's shared-service architecture, fold the hybrid stitch+stamp behavior into it. Supersedes #5559.
Changes
LegacyIdentityStitchnow stamps everywhere. On the first authenticated response the user UUID is stamped intoruntime.identityin every runtime, so captures in CI/Docker/npx carry the real user. The$create_alias(pre-login history merge) and thetelemetry.jsonwrite still only happen on a persistent machine.stitchAttemptedis set before the first yield (no double-stitch race); when an identity already exists we stamp without aliasing (never merge two person graphs); alias fires at most once across all transports sharing the service.stitchedDistinctId()returnsruntime.identity.current()so the post-runcli_command_executedis attributed to the real user in every runtime, including steady state.legacy-analytics.layer.tsresolvesdistinctIdfrom the identity slot while keeping develop's already-keyedgroupsmap.legacy-identity-stitch.integration.test.ts(CI-stamp-no-alias, stale-identity-stamp-no-alias, concurrent-alias-once); the platform-api layer test keeps develop's service-mocked wiring. A few command test runtimes still using the removeddistinctIdfield were updated tomakeTelemetryIdentity.$identifyremoval, ADR 0013) along through the merge.Testing
Typecheck clean, full unit suite (1318) green, and the affected integration suites (identity-stitch, platform-api, login, logout, advisors, lint, services, gen/types, issue, linked-project-cache) pass under bun. Also ran an independent Codex review focused on the spike-regression risk — it confirmed no alias in ephemeral runtimes, alias-at-most-once across transports, the pre-yield race guard, and the no-cross-graph-merge invariant, with no findings.
GROWTH-891