Skip to content

feat(relay): enforce resource limits#2925

Draft
juliusmarminge wants to merge 63 commits into
mainfrom
codex/relay-resource-limits
Draft

feat(relay): enforce resource limits#2925
juliusmarminge wants to merge 63 commits into
mainfrom
codex/relay-resource-limits

Conversation

@juliusmarminge

@juliusmarminge juliusmarminge commented Jun 3, 2026

Copy link
Copy Markdown
Member

Summary

  • add per-user relay entitlements with default caps of 3 managed endpoints and 5 mobile devices, plus sparse expiring overrides and rate-limit tiers
  • enforce serialized durable quotas for managed endpoints, mobile devices, and active agent threads; add fanout guardrails and scheduled retention pruning
  • wire Alchemy Cloudflare RateLimit bindings for high-rate relay operations with standard, trusted, and blocked tiers
  • make managed endpoint cleanup retryable and reconcile orphaned or partially deprovisioned allocations on cron
  • expose typed 409 quota and 429 rate-limit responses through the relay contracts and client error messages

Storage model

relay_user_entitlements is the source of truth for user-specific overrides. Null fields inherit defaults, and expires_at supports temporary overrides. High-rate counters stay in Cloudflare RateLimit bindings rather than Postgres.

Verification

  • bun fmt
  • bun lint
  • bun lint:mobile
  • bun typecheck
  • bun run --cwd infra/relay test (134 tests)
  • git diff --check

Note

Enforce resource limits and rate limiting across relay API endpoints

  • Adds a RateLimits service backed by Cloudflare rate limit bindings with per-operation, per-tier limits; endpoints for token exchange, environment linking/connecting, mobile registration, and agent activity publishing are now rate limited.
  • Adds an Entitlements service that reads per-user overrides from a new relay_user_entitlements DB table, with defaults of 3 managed endpoints and 5 mobile devices per user.
  • Enforces quota limits on managed endpoint allocation (reserve), mobile device registration, and active agent threads per environment (capped at 50), failing with UserResourceQuotaExceeded or AgentActivityRowQuotaExceeded when exceeded.
  • Embeds a rate_limit_tier claim in issued DPoP access tokens and propagates it through RelayClientPrincipal for use in downstream rate limit checks.
  • Adds RelayQuotaExceededError (409) and RelayRateLimitedError (429) to the contracts and surfaces them through the API error mapper and client-facing error messages on mobile and web.
  • Adds a Maintenance service and reconcileDeprovisioning effect for periodic cleanup of expired DB rows and orphaned managed endpoints.
  • Risk: quota and rate limit checks add DB lookups and Cloudflare API calls to hot paths; Cloudflare failures fail open (allow), but DB failures for entitlements surface as internal errors.
📊 Macroscope summarized fab70d7. 19 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted (Automatic summaries will resume when PR exits draft mode or review begins).

🗂️ Filtered Issues

No issues evaluated.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c41f07c8-9baa-404a-9fb8-240a310754be

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/relay-resource-limits

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added vouch:trusted PR author is trusted by repo permissions or the VOUCHED list. size:XXL 1,000+ changed lines (additions + deletions). labels Jun 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

const unlinked = yield* links.revokeForUser({

In unlinkEnvironment, if credentials.revokeForEnvironmentPublicKey throws after links.revokeForUser succeeds, the managedEndpointProvider.deprovision call is never reached. This leaves orphaned managed endpoints and credentials while the link is already gone, returning an error for a partially-completed operation.

Consider wrapping revokeForEnvironmentPublicKey with Effect.ignore (and optionally Effect.tapError to log), similar to how deprovision is handled, so all cleanup steps run regardless of intermediate failures.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file infra/relay/src/http/Api.ts around line 595:

In `unlinkEnvironment`, if `credentials.revokeForEnvironmentPublicKey` throws after `links.revokeForUser` succeeds, the `managedEndpointProvider.deprovision` call is never reached. This leaves orphaned managed endpoints and credentials while the link is already gone, returning an error for a partially-completed operation.

Consider wrapping `revokeForEnvironmentPublicKey` with `Effect.ignore` (and optionally `Effect.tapError` to log), similar to how `deprovision` is handled, so all cleanup steps run regardless of intermediate failures.

Evidence trail:
infra/relay/src/http/Api.ts lines 585-621 at REVIEWED_COMMIT: line 595 `yield* links.revokeForUser(...)` (bare yield*, errors propagate), line 600-603 `yield* credentials.revokeForEnvironmentPublicKey(...)` (bare yield*, errors propagate), lines 604-618 `yield* managedEndpointProvider.deprovision(...).pipe(Effect.tapError(...), Effect.ignore)` (wrapped to ignore errors). The asymmetry means a failure at line 600 prevents line 604 from executing.

Comment thread infra/relay/src/environments/EnvironmentLinks.ts
@juliusmarminge juliusmarminge force-pushed the codex/relay-managed-tunnels-auth-infra branch from 802e800 to adcb3fb Compare June 3, 2026 17:45
@juliusmarminge juliusmarminge force-pushed the codex/relay-resource-limits branch from 68de4bd to ee22c80 Compare June 3, 2026 17:47
@juliusmarminge juliusmarminge force-pushed the codex/relay-managed-tunnels-auth-infra branch from adcb3fb to 6ccf377 Compare June 3, 2026 23:42
@juliusmarminge juliusmarminge force-pushed the codex/relay-resource-limits branch 4 times, most recently from 1f1c59f to fab70d7 Compare June 4, 2026 00:36
@juliusmarminge juliusmarminge force-pushed the codex/relay-managed-tunnels-auth-infra branch from a6d98e6 to eb6fed6 Compare June 4, 2026 06:02
@juliusmarminge juliusmarminge force-pushed the codex/relay-resource-limits branch from 4985921 to acb09c3 Compare June 4, 2026 06:10
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🚀 Expo continuous deployment is ready!

  • Project → t3-code
  • Platforms → android, ios
  • Scheme → t3code-preview
  🤖 Android 🍎 iOS
Fingerprint f71fb465ed4f1a9ca0b6bb93584f3ab85c87233f 4696b8290c05e331cddf0d4d8ac39ff107b8b1bd
Build Details Build Permalink
DetailsDistribution: INTERNAL
Build profile: preview:dev
Runtime version: f71fb465ed4f1a9ca0b6bb93584f3ab85c87233f
App version: 0.1.0
Git commit: e935dcb1e5e475789053d4833d59006ca390db19
Build Permalink
DetailsDistribution: INTERNAL
Build profile: preview:dev
Runtime version: 4696b8290c05e331cddf0d4d8ac39ff107b8b1bd
App version: 0.1.0
Git commit: e935dcb1e5e475789053d4833d59006ca390db19
Update Details Update Permalink
DetailsBranch: pr-2925
Runtime version: f71fb465ed4f1a9ca0b6bb93584f3ab85c87233f
Git commit: 04caeea06b2479b94167b2a248d137039d994828
Update Permalink
DetailsBranch: pr-2925
Runtime version: 4696b8290c05e331cddf0d4d8ac39ff107b8b1bd
Git commit: 04caeea06b2479b94167b2a248d137039d994828
Update QR

@juliusmarminge juliusmarminge force-pushed the codex/relay-managed-tunnels-auth-infra branch from eb6fed6 to 42279b9 Compare June 4, 2026 06:13
@juliusmarminge juliusmarminge force-pushed the codex/relay-resource-limits branch from acb09c3 to 6d91211 Compare June 4, 2026 06:15
@juliusmarminge juliusmarminge force-pushed the codex/relay-managed-tunnels-auth-infra branch from 42279b9 to 826c4ab Compare June 4, 2026 06:36
juliusmarminge and others added 13 commits June 3, 2026 23:48
Co-authored-by: codex <codex@users.noreply.github.com>
refactor: remove cloudflare settings from RelayConfigurationShape and related tests

test: clean up tests by removing cloudflare settings from configuration

feat: implement ManagedEndpointDnsClient for DNS operations with Cloudflare

refactor: simplify ManagedEndpointProvider by removing HTTP client dependencies

fix: update ManagedEndpointProvider to use new DNS client for CNAME record management

chore: adjust Api to use new DNS binding and remove deprecated cloudflare settings

refactor: streamline ManagedEndpointZone creation by removing unnecessary token generation
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
juliusmarminge and others added 23 commits June 3, 2026 23:48
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Persist per-user environment allocations and checkpoint Cloudflare resources so retries converge partial provisioning state.

Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
@juliusmarminge juliusmarminge force-pushed the codex/relay-managed-tunnels-auth-infra branch from 826c4ab to 974463e Compare June 4, 2026 06:55
juliusmarminge and others added 2 commits June 3, 2026 23:56
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
@juliusmarminge juliusmarminge force-pushed the codex/relay-resource-limits branch from 6d91211 to 46afa39 Compare June 4, 2026 06:57
@juliusmarminge juliusmarminge force-pushed the codex/relay-managed-tunnels-auth-infra branch from 974463e to 1bb0737 Compare June 4, 2026 21:11
Base automatically changed from codex/relay-managed-tunnels-auth-infra to main June 5, 2026 02:53
Comment on lines +409 to +415
if (props.href) {
return (
<Link href={props.href} asChild>
<Pressable>{content}</Pressable>
</Link>
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium settings/index.tsx:409

When props.href is provided, disabled={true} is not passed to the Pressable (line 412), so the row remains interactive despite appearing visually disabled. The non-href branch at line 418 correctly passes disabled={props.disabled}. Consider passing the disabled prop to Pressable in the href branch to prevent navigation when disabled.

   if (props.href) {
     return (
       <Link href={props.href} asChild>
-        <Pressable>{content}</Pressable>
+        <Pressable disabled={props.disabled}>{content}</Pressable>
       </Link>
     );
   }
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @apps/mobile/src/app/settings/index.tsx around lines 409-415:

When `props.href` is provided, `disabled={true}` is not passed to the `Pressable` (line 412), so the row remains interactive despite appearing visually disabled. The non-href branch at line 418 correctly passes `disabled={props.disabled}`. Consider passing the `disabled` prop to `Pressable` in the href branch to prevent navigation when disabled.

Evidence trail:
apps/mobile/src/app/settings/index.tsx lines 374-422 at REVIEWED_COMMIT: SettingsRow component definition. Line 387 applies visual opacity for disabled. Line 409-414 (href branch) wraps Pressable in Link without disabled prop. Line 417-420 (non-href branch) passes disabled={props.disabled} to Pressable.

Comment on lines +144 to +157
export const acquireRelayClientForLink = Effect.fn("cloud.cli.acquire_relay_client_for_link")(
function* <ConfirmError, ConfirmContext>(
relayClient: RelayClient.RelayClientShape,
confirmInstall: (version: string) => Effect.Effect<boolean, ConfirmError, ConfirmContext>,
reportProgress: (event: RelayClientInstallProgressEvent) => Effect.Effect<void>,
) {
const executable = yield* relayClient.resolve;
if (executable.status === "available") {
return Option.some(executable);
}
if (executable.status === "unsupported") {
return Option.some(yield* relayClient.installWithProgress(reportProgress));
}
if (!(yield* confirmInstall(executable.version))) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium cli/cloud.ts:144

When executable.status === "unsupported", the code calls relayClient.installWithProgress without user confirmation (lines 154-156). Since "unsupported" means the platform/architecture cannot run the relay client, this installation will fail. The intended behavior is to return Option.none() after confirmation fails, like the "missing" case does.

-    if (executable.status === "unsupported") {
-      return Option.some(yield* relayClient.installWithProgress(reportProgress));
-    }
-    if (!(yield* confirmInstall(executable.version))) {
+    if (executable.status === "unsupported" || !(yield* confirmInstall(executable.version))) {
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @apps/server/src/cli/cloud.ts around lines 144-157:

When `executable.status === "unsupported"`, the code calls `relayClient.installWithProgress` without user confirmation (lines 154-156). Since "unsupported" means the platform/architecture cannot run the relay client, this installation will fail. The intended behavior is to return `Option.none()` after confirmation fails, like the "missing" case does.

Evidence trail:
apps/server/src/cli/cloud.ts:144-162 (REVIEWED_COMMIT) - the acquireRelayClientForLink function showing the unsupported branch at line 154-156 calling installWithProgress without confirmation. packages/shared/src/relayClient.ts:279-286 (REVIEWED_COMMIT) - resolve returns status 'unsupported' when releaseAsset is falsy. packages/shared/src/relayClient.ts:394-398 (REVIEWED_COMMIT) - installUnlocked checks !releaseAsset and yields RelayClientInstallError with reason 'unsupported_platform', confirming install will always fail when status was 'unsupported'.

Comment on lines +467 to +471
export function updatePrimaryCloudPreferences(input: {
readonly publishAgentActivity: boolean;
}): Effect.Effect<CloudLinkState, CloudEnvironmentLinkError, HttpClient.HttpClient> {
return Effect.gen(function* () {
const client = yield* makeEnvironmentHttpApiClient(resolvePrimaryEnvironmentHttpUrl("/"));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium cloud/linkEnvironment.ts:467

updatePrimaryCloudPreferences calls resolvePrimaryEnvironmentHttpUrl("/") without checking if a primary environment exists. When no target is configured, resolvePrimaryEnvironmentHttpUrl throws a plain Error, which becomes a defect rather than the declared CloudEnvironmentLinkError. The function should return a typed error instead, like readPrimaryCloudLinkState does at line 454.

  export function updatePrimaryCloudPreferences(input: {
    readonly publishAgentActivity: boolean;
  }): Effect.Effect<CloudLinkState, CloudEnvironmentLinkError, HttpClient.HttpClient> {
    return Effect.gen(function* () {
+      if (!readPrimaryCloudLinkTarget()) {
+        return yield* new CloudEnvironmentLinkError({
+          message: "Local environment is not ready yet.",
+        });
+      }
      const client = yield* makeEnvironmentHttpApiClient(resolvePrimaryEnvironmentHttpUrl("/"));
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @apps/web/src/cloud/linkEnvironment.ts around lines 467-471:

`updatePrimaryCloudPreferences` calls `resolvePrimaryEnvironmentHttpUrl("/")` without checking if a primary environment exists. When no target is configured, `resolvePrimaryEnvironmentHttpUrl` throws a plain `Error`, which becomes a defect rather than the declared `CloudEnvironmentLinkError`. The function should return a typed error instead, like `readPrimaryCloudLinkState` does at line 454.

Evidence trail:
apps/web/src/cloud/linkEnvironment.ts lines 467-482 (updatePrimaryCloudPreferences with no guard), apps/web/src/environments/primary/target.ts lines 135-150 (resolvePrimaryEnvironmentHttpUrl throws plain Error), apps/web/src/cloud/linkEnvironment.ts lines 448-464 (readPrimaryCloudLinkState with guard at line 454), apps/web/src/cloud/linkEnvironment.ts lines 484-500 (unlinkPrimaryEnvironmentFromCloud with guard at line 488-493), apps/web/src/cloud/linkEnvironment.ts lines 270-282 (readPrimaryCloudLinkTarget returns null when readPrimaryEnvironmentTarget returns null)

Comment on lines +184 to +185
yield* Layer.launch(HttpRouter.serve(routesLayer).pipe(Layer.provideMerge(serverLayer))).pipe(
Effect.forkScoped,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low app/DesktopCloudAuth.ts:184

startProtocolCallbackForwardServer returns successfully even when the HTTP server fails to start (e.g., port already in use). The Layer.launch effect is forked with Effect.forkScoped, which discards the fiber's result, so startup errors are silently swallowed and auth callbacks fail without any error being surfaced.

     yield* Layer.launch(HttpRouter.serve(routesLayer).pipe(Layer.provideMerge(serverLayer))).pipe(
-      Effect.forkScoped,
+      Effect.scoped,
     );
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @apps/desktop/src/app/DesktopCloudAuth.ts around lines 184-185:

`startProtocolCallbackForwardServer` returns successfully even when the HTTP server fails to start (e.g., port already in use). The `Layer.launch` effect is forked with `Effect.forkScoped`, which discards the fiber's result, so startup errors are silently swallowed and auth callbacks fail without any error being surfaced.

Evidence trail:
apps/desktop/src/app/DesktopCloudAuth.ts lines 160-188 (function definition with `never` error type and `forkScoped` discarding errors), lines 285-290 (caller site in `configure`). Effect version: 4.0.0-beta.73 per pnpm-workspace.yaml line 19.

Comment on lines +517 to +526
const reconcileDesiredCloudLinkWith = Effect.fn("environment.cloud.reconcileDesiredLinkWith")(
function* (dependencies: CloudHttpDependencies, localOrigin: string) {
const localUrl = yield* Effect.try({
try: () => new URL(localOrigin),
catch: () =>
new EnvironmentHttpBadRequestError({
message: "Could not resolve local environment origin.",
}),
});
if (localUrl.origin !== localOrigin) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low cloud/http.ts:517

When localOrigin includes an explicit default port (e.g., "http://127.0.0.1:80" or "https://127.0.0.1:443"), the check localUrl.origin !== localOrigin on line 526 incorrectly fails. new URL(...).origin strips default ports per the URL spec, so "http://127.0.0.1:80".origin returns "http://127.0.0.1", causing a mismatch that throws EnvironmentHttpBadRequestError even for valid origins. Consider normalizing the origin comparison to handle default ports correctly.

-    if (localUrl.origin !== localOrigin) {
+    const normalizedOrigin = `${localUrl.protocol}//${localUrl.host}`;
+    if (normalizedOrigin !== localOrigin) {
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @apps/server/src/cloud/http.ts around lines 517-526:

When `localOrigin` includes an explicit default port (e.g., `"http://127.0.0.1:80"` or `"https://127.0.0.1:443"`), the check `localUrl.origin !== localOrigin` on line 526 incorrectly fails. `new URL(...).origin` strips default ports per the URL spec, so `"http://127.0.0.1:80".origin` returns `"http://127.0.0.1"`, causing a mismatch that throws `EnvironmentHttpBadRequestError` even for valid origins. Consider normalizing the origin comparison to handle default ports correctly.

Evidence trail:
apps/server/src/cloud/http.ts lines 517-530 (REVIEWED_COMMIT): `reconcileDesiredCloudLinkWith` function with the `localUrl.origin !== localOrigin` check at line 526. apps/server/src/server.ts line 430 (REVIEWED_COMMIT): caller constructs `http://127.0.0.1:${address.port}` from the actual server port. WHATWG URL Standard: URL.origin strips default ports (80 for http, 443 for https).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL 1,000+ changed lines (additions + deletions). vouch:trusted PR author is trusted by repo permissions or the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant