Skip to content

refactor: decouple TelemetryClient from command_run logic#1246

Draft
Hweinstock wants to merge 12 commits into
aws:mainfrom
Hweinstock:decouple-telemetry
Draft

refactor: decouple TelemetryClient from command_run logic#1246
Hweinstock wants to merge 12 commits into
aws:mainfrom
Hweinstock:decouple-telemetry

Conversation

@Hweinstock
Copy link
Copy Markdown
Contributor

@Hweinstock Hweinstock commented May 14, 2026

Description

This PR addresses a few friction points for adding new metrics and attributes and cleans up the implementation to how its being used:

  1. TelemetryClient is now generic and does not contain command_run specific logic.
  2. Attributes are now global, such that all metric attribute names must be unique across metrics.
  3. Remove unused cancellation logic.

Adding a new metric after this PR

  1. Add attribute schemas to common-shapes.ts + ATTRIBUTES
  2. Add entry to METRICS in registry.ts with MetricAttrs branch for strong typing on emit.
  3. Call client.emit() or build higher-level utilities such as withCommandRunTelemetry.

Follow-Up

Now that attributes are global, we may want to prefix them to avoid ambiguity. This will be done as a follow-up.

Related Issue

Closes #

Documentation PR

N/A

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

Hweinstock added 12 commits May 14, 2026 03:00
TelemetryClient is now a generic metric emitter (emit, flush, shutdown).
withCommandRun + recordCommandRun moved to cli-command-run.ts as standalone
functions. All sinks accept metricName as first arg to record().
- help: migrated to withCommandRunTelemetry
- dev browser mode: migrated to direct client.emit()
- withCommandRun and CANCELLED are now module-private
- Tests rewritten to go through withCommandRunTelemetry
…→deploy_mode

- Remove `& Record<string, unknown>` from emit() — typos in attrs are now compile errors
- MetricSink.record() uses MetricName instead of string
- Remove unused instrument field from registry (YAGNI)
- Rename deploy `mode` attr to `deploy_mode` to avoid collision with session Mode
- Registry schema includes all per-command attrs as optional for full type coverage
…ommandGroup enum

- Optional fields in registry derived automatically from COMMAND_SCHEMAS (eliminates 55 lines)
- CommandGroup reverted to type-level derivation from Command (zero maintenance)
- Removed CommandGroup z.enum from common-shapes.ts
…ependency

- Derive Command from keyof COMMAND_SCHEMAS (zero maintenance, as before)
- Remove Command z.enum from common-shapes.ts (~50 lines)
- Remove mode from ATTRIBUTES (resource attr, not metric attr)
- Remove unused barrel re-exports (CompositeSink, classifyError, isUserError, CommandGroup)
- Revert deploy/utils.ts to simple literal union (no zod dependency)
@github-actions github-actions Bot added size/l PR size: L agentcore-harness-reviewing AgentCore Harness review in progress labels May 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.13.1.tgz

How to install

npm install https://github.com/aws/agentcore-cli/releases/download/pr-1246-tarball/aws-agentcore-0.13.1.tgz

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 14, 2026
Copy link
Copy Markdown

@agentcore-cli-automation agentcore-cli-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup overall — splitting the generic emitter from the command-run logic is a good direction, and resilientParse is a reasonable improvement over silently dropping the whole metric.

A few issues I think should be addressed before merging:

1. Same callback can either throw or return a failure Result depending on whether telemetry is initialized (src/cli/telemetry/cli-command-run.ts:86 and 100–107)

When client is unavailable, withCommandRunTelemetry returns fn() directly — if fn() throws, the rejection propagates. When client is available, a thrown exception is caught and converted to { success: false, error }, so the caller never sees the throw.

That means the same callback has two different observable behaviors depending on whether telemetry happens to be initialized:

  • Without telemetry: await withCommandRunTelemetry(...) rejects on throw.
  • With telemetry: await withCommandRunTelemetry(...) resolves with a failure Result.

This is a footgun for callers (including the void withCommandRunTelemetry(...) calls in useDeployFlow.ts — those would emit unhandled rejections only when telemetry is disabled).

Options to fix:

  1. Make the no-client branch also catch and convert: if (!client) { try { return await fn(); } catch (e) { return { success: false, error: ... } as R; } }.
  2. Make the client branch propagate the throw instead of converting (and let call sites that expect a Result use the existing if (!result.success) throw result.error; pattern). This also more naturally aligns with runCliCommand, which re-throws.

2. Docstring on withCommandRunTelemetry contradicts the implementation (src/cli/telemetry/cli-command-run.ts:77-78)

The doc says: "If the callback throws, telemetry is recorded and the exception propagates." But lines 100–107 explicitly catch the exception and return a failure Result instead of re-throwing — and the test 'records failure and returns error result when callback throws' confirms the conversion is intentional. Whichever direction is taken to resolve issue #1, please update the docstring to match.

3. Stale comment on standardize() (src/cli/telemetry/schemas/common-shapes.ts:31-34, unchanged in this PR but inaccurate after the changes here)

It says invalid values "pass through to recordCommandRun, where COMMAND_SCHEMAS[command].parse(attrs) validates the full attr object in a try/catch — silently dropping the metric if any field is invalid." That's no longer true: recordCommandRun now uses resilientParse, which defaults invalid fields to 'unknown' rather than dropping the whole metric. This matters for anyone trying to understand whether passing an unvalidated string into standardize is safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/l PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants