Skip to content

Eval regression detected (2026-05-01) #16

@github-actions

Description

@github-actions

Regression Report: math-20

Baseline: 0.2.0-cerebras-llama3.1-8b
Current: ci-monitor
Status: REGRESSION DETECTED

Metrics

Metric Baseline Current Change
accuracy 0.9000 0.8500 -5.6%
avg_latency 11.2776 3.6511 -67.6%
total_tokens 3194.0000 2037.0000 -36.2%
avg_tool_accuracy 0.0000 0.0000 N/A

Alerts

  • [WARNING] math-20/accuracy: 0.9000 -> 0.8500 (-5.6%, threshold: 5%)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions