Skip to content

test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575

Open
justin212407 wants to merge 6 commits into
Agenta-AI:mainfrom
justin212407:test-tracing-fix
Open

test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575
justin212407 wants to merge 6 commits into
Agenta-AI:mainfrom
justin212407:test-tracing-fix

Conversation

@justin212407

@justin212407 justin212407 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary:

Fixes #4172 - ag.metrics.duration.cumulative query↔ingest schema asymmetry that caused spans to be silently dropped on re-ingest.

During ingestion, parsing.py computed wall-clock duration and wrote it as a raw scalar float. The original Pydantic model (AgMetricEntryAttributes) declared cumulative as Optional[Metrics] where Metrics = Dict[str, NumericJson] - a dict, not a scalar. This meant any span queried via /api/tracing/spans/query and re-submitted to /api/tracing/spans/ingest failed Pydantic validation silently, returning {"count": 0, "links": []} with no error surfaced to the caller.

Per the team's documented decision in docs/design/homomorphic-tracing-io/findings.md, the canonical shape for duration.cumulative is a scalar float. The production fixes (AgScalarMetricEntryAttributes in tracing.py, _coerce_scalar_metric_value in attributes.py, scalar write in parsing.py) were already present in the codebase. This PR adds the missing test coverage for those changes.

Testing:

Verified locally

# New tests only
uv run pytest oss/tests/pytest/unit/tracing/utils/test_parsing.py \
  -v -k "duration_cumulative" --no-header
# 2 passed

# Full tracing suite
uv run pytest oss/tests/pytest/unit/tracing/ -v --no-header -p no:warnings
# 74 passed

# Full unit suite
uv run pytest oss/tests/pytest/unit/ -q --no-header -p no:warnings
# 525 passed

# Linting and formatting
uv run ruff check oss/src/core/tracing/ oss/tests/pytest/unit/tracing/
uv run ruff format --check oss/tests/pytest/unit/tracing/utils/test_parsing.py
# All checks passed

Added or updated tests:

Two new unit tests added to

  1. api/oss/tests/pytest/unit/tracing/utils/test_parsing.py:
    test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest - Verifies that a span with metrics.duration.cumulative set as a scalar float passes through the full parse_spans_from_request pipeline without error and the output value is a plain float, not a dict. Covers the canonical ingest path.

  2. test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar - Verifies backward compatibility: a span carrying the old dict shape duration.cumulative = {"total": 1922.653} (as written by older ingestion code
    or replayed from a query response) is normalized to the canonical scalar 1922.653 by _coerce_scalar_metric_value inside initialize_ag_attributes. No timestamps are supplied so the computed-duration override path is skipped,
    isolating the normalization behavior.

QA follow-up:

  • Verify that existing traces in production with duration.cumulative as a scalar are still returned correctly by /api/tracing/spans/query.
  • Verify that a full query→re-ingest round-trip (the exact repro from the issue) no longer returns {"count": 0, "links": []}.
  • Verify that traces ingested by older SDK versions (which may have written duration.cumulative = {"total": v}) are normalized correctly on the way through ingest without requiring a backfill.

Demo:

N/A - no UI changes.

Checklist:

  • I have included a video or screen recording for UI changes, or marked Demo as N/A.
  • Relevant tests pass locally.
  • Relevant linting and formatting pass locally.
  • I have signed the CLA, or I will sign it when the bot prompts me.

Contributor Resources:

…e scalar shape

Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
Copilot AI review requested due to automatic review settings June 7, 2026 20:02
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 7, 2026
@vercel

vercel Bot commented Jun 7, 2026

Copy link
Copy Markdown

@justin212407 is attempting to deploy a commit to the agenta projects Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot dosubot Bot added the tests label Jun 7, 2026
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6c97097f-8653-4ae8-a515-9eb95415bdfa

📥 Commits

Reviewing files that changed from the base of the PR and between 092b3d6 and d7a28b3.

📒 Files selected for processing (1)
  • api/oss/tests/pytest/unit/tracing/utils/test_parsing.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/oss/tests/pytest/unit/tracing/utils/test_parsing.py

📝 Walkthrough

Summary by CodeRabbit

  • Tests
    • Added unit tests verifying cumulative duration metrics are normalized: canonical scalar values remain scalars, legacy dict forms are coerced to canonical scalar, and when start/end timestamps are present the cumulative duration is derived from wall-clock time.

Walkthrough

Adds two pytest unit tests for parse_spans_from_request covering metric-duration normalization: one ensures a scalar cumulative duration is overwritten by computed wall-clock duration when timestamps exist; the other ensures legacy dict-shaped cumulative metrics are normalized to a scalar when timestamps are absent.

Changes

Metric duration ingestion normalization tests

Layer / File(s) Summary
Scalar and dict metric duration handling
api/oss/tests/pytest/unit/tracing/utils/test_parsing.py
Two tests: one verifies a scalar ag.metrics.duration.cumulative is overwritten by computed wall-clock duration when start_time/end_time exist and remains a numeric scalar; the other verifies legacy dict-shaped {"total": ...} cumulative metrics are normalized to a canonical scalar when timestamps are absent.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: adding round-trip tests for ag.metrics.duration.cumulative normalization as part of the tracing fix.
Description check ✅ Passed The description comprehensively explains the issue (#4172), the fix, test coverage, verification steps, and clearly relates to the changeset of adding unit tests.
Linked Issues check ✅ Passed The PR adds test coverage for the production fixes already present in the codebase to address issue #4172, validating both scalar ingest path and legacy dict normalization as specified in the issue requirements.
Out of Scope Changes check ✅ Passed All changes are focused on adding unit test coverage for duration.cumulative normalization; no unrelated modifications to production code or other areas are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds unit tests around parse_spans_from_request to ensure duration metric normalization and precedence rules are stable across ingest/parsing.

Changes:

  • Adds a test ensuring scalar duration.cumulative is accepted and overridden by computed duration when timestamps are present.
  • Adds a test ensuring legacy {"total": v} dict shape for duration.cumulative is normalized to a scalar when timestamps are absent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# The computed duration (1000ms from timestamps) overwrites the pre-set value;
# the key contract is that the result is a plain scalar, not a dict.
assert isinstance(metrics["duration"]["cumulative"], (int, float))
assert metrics["duration"]["cumulative"] == 1000.0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - replaced bare float equality with pytest.approx(1000.0).

metrics = parsed[0].attributes["ag"]["metrics"]
# Legacy {"total": v} dict is collapsed to the canonical scalar by
# _coerce_scalar_metric_value inside initialize_ag_attributes.
assert metrics["duration"]["cumulative"] == 1922.653

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - added isinstance(metrics["duration"]["cumulative"], (int, float)) scalar contract check and switched to pytest.approx(1922.653).

assert len(response[0].span_id) == 16


def test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest():

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - renamed to test_ingest_accepts_scalar_duration_cumulative.

assert metrics["duration"]["cumulative"] == 1000.0


def test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar():

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - renamed to test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar

@mmabrouk mmabrouk requested a review from jp-agenta June 9, 2026 07:58
@junaway

junaway commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Hi @justin212407,

Could you please

  • fix the agents' comments, then
  • leave a clear 'Fixed' comment on each of them, then
  • run all tests locally, and then
    let us know when you're done ?

Copilot AI review requested due to automatic review settings June 13, 2026 07:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

# The computed duration (1000ms from timestamps) overwrites the pre-set value;
# the key contract is that the result is a plain scalar, not a dict.
assert isinstance(metrics["duration"]["cumulative"], (int, float))
assert metrics["duration"]["cumulative"] == 1000.0
metrics = parsed[0].attributes["ag"]["metrics"]
# Legacy {"total": v} dict is collapsed to the canonical scalar by
# _coerce_scalar_metric_value inside initialize_ag_attributes.
assert metrics["duration"]["cumulative"] == 1922.653
Comment on lines +256 to +269
span = OTelSpan(
trace_id=TRACE_HEX,
span_id=SPAN_HEX,
span_name="root",
start_time=1_700_000_000,
end_time=1_700_000_001,
attributes={
"ag": {
"metrics": {
"duration": {"cumulative": 1922.653},
}
}
},
)
Comment on lines +286 to +297
span = OTelSpan(
trace_id=TRACE_HEX,
span_id=SPAN_HEX,
span_name="root",
attributes={
"ag": {
"metrics": {
"duration": {"cumulative": {"total": 1922.653}},
}
}
},
)
Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
@justin212407

Copy link
Copy Markdown
Contributor Author

Hi @justin212407,

Could you please

  • fix the agents' comments, then
  • leave a clear 'Fixed' comment on each of them, then
  • run all tests locally, and then
    let us know when you're done ?

Tests pass locally: 2 new tests , full tracing suite 80 passed , full unit suite 737 passed , ruff check and format check clean

@justin212407 justin212407 requested a review from Copilot June 13, 2026 07:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment on lines +252 to +269
def test_ingest_accepts_scalar_duration_cumulative():
# Canonical scalar shape in attributes — the ingest pipeline accepts it without
# error. When start_time and end_time are present the computed wall-clock duration
# takes precedence (1_700_000_001 - 1_700_000_000 = 1s = 1000ms).
span = OTelSpan(
trace_id=TRACE_HEX,
span_id=SPAN_HEX,
span_name="root",
start_time=1_700_000_000,
end_time=1_700_000_001,
attributes={
"ag": {
"metrics": {
"duration": {"cumulative": 1922.653},
}
}
},
)
assert len(response[0].span_id) == 16


def test_ingest_accepts_scalar_duration_cumulative():

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names were explicitly shortened in the previous review round after Copilot flagged the original test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest names as too long. Reverting to longer names would reintroduce the previous violation. Deferring to @junaway yo make the final call on naming convention.

assert metrics["duration"]["cumulative"] == pytest.approx(1000.0)


def test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar():
justin212407 and others added 2 commits June 13, 2026 15:23
Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
Copilot AI review requested due to automatic review settings June 13, 2026 09:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

assert len(response[0].span_id) == 16


def test_ingest_accepts_scalar_duration_cumulative():
assert metrics["duration"]["cumulative"] == pytest.approx(1000.0)


def test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar():
Comment on lines +261 to +262
start_time=1_700_000_000,
end_time=1_700_000_001,
# The computed duration (1000ms from timestamps) overwrites the pre-set value;
# the key contract is that the result is a plain scalar, not a dict.
assert isinstance(metrics["duration"]["cumulative"], (int, float))
assert metrics["duration"]["cumulative"] == pytest.approx(1000.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS This PR changes 0-9 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants