Post sticky PR comment summarizing CI test job failures#1412
Post sticky PR comment summarizing CI test job failures#1412rgsl888prabhu wants to merge 3 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a workflow job that runs after tests and invokes ci/utils/pr_test_summary.py to aggregate test job outcomes, extract failures/crashes from logs, and post or update a sticky PR comment with results. ChangesPR Test Result Aggregation
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
| ) | ||
| lines += ["", "| Job | Logs |", "|-----|------|"] | ||
| for job in failed: | ||
| lines.append(f"| ❌ `{job['name']}` | [View logs]({job['html_url']}) |") |
There was a problem hiding this comment.
Surfacing the names of the tests that failed is pretty useful, how hard would it be to bring that back?
There was a problem hiding this comment.
Test names are already surfaced — _analyze_job_log parses the pytest short test summary info section from each failed job's log and includes individual test IDs in a collapsible dropdown per job. This was added in the second commit on this PR.
b88771d to
418777d
Compare
CI Test Summary✅ All 0 test job(s) passed. |
418777d to
9fbc08f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ci/utils/pr_test_summary.py (1)
103-105: 💤 Low valueOptional: timestamp stripping heuristic could be more robust.
The current heuristic (split on "Z " and check length < 35) works for GitHub Actions timestamps but could incorrectly strip short prefixes in edge cases (e.g., "ABC Z content" → "content").
Since pytest output is predictable, the risk is low and the current implementation is acceptable. If you want perfect robustness, consider a regex like
r'^\d{4}-\d{2}-\d{2}T[\d:.]+Z '.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci/utils/pr_test_summary.py` around lines 103 - 105, The timestamp-stripping heuristic around variables raw, parts, and line can falsely strip non-timestamp prefixes; replace the split/length check with a strict regex match using r'^\d{4}-\d{2}-\d{2}T[\d:.]+Z ' to detect GHA timestamps, and only remove that prefix when the regex matches (e.g., use re.match on raw and slice off the matched span), keeping the rest of the logic that assigns to line unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 170-172: The current categorization in pr_test_summary.py filters
test_jobs into failed, passed, and skipped using explicit conclusion values and
ignores other conclusions (e.g., "timed_out", "cancelled"), causing incorrect
counts; update the logic around the failed/passed/skipped variables (those list
comprehensions that produce failed, passed, skipped from test_jobs) so that
passed only includes conclusion == "success", skipped only includes conclusion
== "skipped", and treat any other conclusion as failure (either by changing the
failed comprehension to include all non-success/non-skipped conclusions or by
computing failed = total_jobs - len(passed) - len(skipped)) to ensure totals
always sum to the number of test_jobs.
---
Nitpick comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 103-105: The timestamp-stripping heuristic around variables raw,
parts, and line can falsely strip non-timestamp prefixes; replace the
split/length check with a strict regex match using r'^\d{4}-\d{2}-\d{2}T[\d:.]+Z
' to detect GHA timestamps, and only remove that prefix when the regex matches
(e.g., use re.match on raw and slice off the matched span), keeping the rest of
the logic that assigns to line unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ce8da740-23e0-441f-9cd4-7d56beab2bcd
📒 Files selected for processing (2)
.github/workflows/pr.yamlci/utils/pr_test_summary.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/pr.yaml
Posts a sticky comment on each PR listing which matrix test jobs failed, with direct links to their logs. Updates on re-run; clears to a pass message if all jobs succeed. Uses only actions/github-script (already in repo) and the built-in GITHUB_TOKEN with pull-requests: write — no S3, no AWS credentials, no new third-party actions.
9fbc08f to
c2950df
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 58-63: The GitHub API calls currently use urllib.request.urlopen
without a timeout; add a shared timeout constant (e.g., TIMEOUT_SECONDS = 10)
and pass it to every urlopen call (for example in _paginate and the other two
call sites that use urllib.request.urlopen) as urllib.request.urlopen(req,
timeout=TIMEOUT_SECONDS); update all occurrences to use this constant so every
HTTP request to the GitHub REST API has an explicit timeout.
- Around line 130-133: The parser only looks for lines starting with "FAILED "
so ERROR-summary lines are missed; update the extraction logic (the block that
currently checks if line.startswith("FAILED ")) to also handle lines starting
with "ERROR " (e.g., check startswith for both "FAILED " and "ERROR "), then
extract the test identifier the same way into the failed list (retain the
existing test_id = line[7:].split(" - ")[0].strip() logic or adjust slicing if
you normalize by splitting on the first space). Ensure this change is applied
where the failed list is appended so ERROR entries appear in the comment
details.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 50acfa7f-8113-4bc0-a79c-eecf289cb47e
📒 Files selected for processing (2)
.github/workflows/pr.yamlci/utils/pr_test_summary.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/pr.yaml
Two test failures (one Python, one C++) to verify the PR test summary comment correctly captures and displays failed test names per job. Remove before merging.
- Add HTTP timeout to all urlopen calls - Treat timed_out/cancelled jobs as failures in counts - Capture ERROR lines alongside FAILED in pytest summary parsing
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/tests/routing/unit_tests/breaks.cu`:
- Around line 433-437: The failing GoogleTest TEST(vehicle_breaks,
ci_summary_demo_failure) won't be picked up by ci/utils/pr_test_summary.py
because that parser only extracts IDs from pytest's short test summary; fix by
either removing or converting this intentional failure to a pytest-based test so
the per-test extraction is exercised, or update the parser
(ci/utils/pr_test_summary.py) to explicitly support GTest output (e.g., parse
GoogleTest's short summary or the XML produced via --gtest_output to extract
test names such as vehicle_breaks.ci_summary_demo_failure) and ensure the CI job
emits that GTest output for the parser to consume.
- Around line 433-437: Remove the intentional failing test TEST(vehicle_breaks,
ci_summary_demo_failure) from cpp/tests/routing/unit_tests/breaks.cu (or move it
to a throwaway branch) so CI no longer fails; alternatively, wrap it behind an
explicit opt-in guard (e.g., a preprocessor flag or test filter) so it never
runs in normal PR CI, ensuring the file only contains tests that validate
numerical correctness/regressions.
In `@python/cuopt/cuopt/tests/test_ci_summary_demo.py`:
- Around line 4-17: Remove the intentional failing demo tests from the pytest
suite: delete or relocate python/cuopt/cuopt/tests/test_ci_summary_demo.py
(which contains test_ci_summary_demo_failure and test_ci_summary_demo_failure_2)
out of the python/**/tests/** tree into an opt-in or demo-only path so they no
longer run in the default PR test matrix; if relocating, ensure the new path is
excluded from normal pytest discovery (or only run via an explicit demo CI job).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: edad9b32-c92d-4a15-a868-547e28b66120
📒 Files selected for processing (2)
cpp/tests/routing/unit_tests/breaks.cupython/cuopt/cuopt/tests/test_ci_summary_demo.py
| // Intentional failure to exercise the PR test summary comment feature. | ||
| TEST(vehicle_breaks, ci_summary_demo_failure) | ||
| { | ||
| EXPECT_TRUE(false) << "Intentional failure: remove after verifying PR summary comment."; | ||
| } |
There was a problem hiding this comment.
This C++ demo does not exercise per-test name extraction.
ci/utils/pr_test_summary.py only pulls test IDs out of pytest’s short test summary info, so this GoogleTest failure will show the job as failed but will not populate the per-job failed-test details. If C++ per-test names are part of the validation goal, the parser needs explicit GTest support first.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/tests/routing/unit_tests/breaks.cu` around lines 433 - 437, The failing
GoogleTest TEST(vehicle_breaks, ci_summary_demo_failure) won't be picked up by
ci/utils/pr_test_summary.py because that parser only extracts IDs from pytest's
short test summary; fix by either removing or converting this intentional
failure to a pytest-based test so the per-test extraction is exercised, or
update the parser (ci/utils/pr_test_summary.py) to explicitly support GTest
output (e.g., parse GoogleTest's short summary or the XML produced via
--gtest_output to extract test names such as
vehicle_breaks.ci_summary_demo_failure) and ensure the CI job emits that GTest
output for the parser to consume.
Remove this unconditional red test before merge.
This keeps the C++ test job failing on every run and does not validate solver behavior or a real regression. If you still need it for one-off validation, keep it on a throwaway branch or behind an explicit opt-in path that never runs in normal PR CI.
🧹 Proposed fix
-// Intentional failure to exercise the PR test summary comment feature.
-TEST(vehicle_breaks, ci_summary_demo_failure)
-{
- EXPECT_TRUE(false) << "Intentional failure: remove after verifying PR summary comment.";
-}As per coding guidelines, cpp/tests/** should validate numerical correctness/regressions rather than ship intentionally failing tests, and the PR objective already describes these as temporary demo failures to remove before merging.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Intentional failure to exercise the PR test summary comment feature. | |
| TEST(vehicle_breaks, ci_summary_demo_failure) | |
| { | |
| EXPECT_TRUE(false) << "Intentional failure: remove after verifying PR summary comment."; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/tests/routing/unit_tests/breaks.cu` around lines 433 - 437, Remove the
intentional failing test TEST(vehicle_breaks, ci_summary_demo_failure) from
cpp/tests/routing/unit_tests/breaks.cu (or move it to a throwaway branch) so CI
no longer fails; alternatively, wrap it behind an explicit opt-in guard (e.g., a
preprocessor flag or test filter) so it never runs in normal PR CI, ensuring the
file only contains tests that validate numerical correctness/regressions.
Source: Coding guidelines
| # Intentional failures to exercise the PR test summary comment feature. | ||
| # Remove after verifying the summary comment shows correct output. | ||
|
|
||
|
|
||
| def test_ci_summary_demo_failure(): | ||
| assert False, ( | ||
| "Intentional failure: remove after verifying PR summary comment." | ||
| ) | ||
|
|
||
|
|
||
| def test_ci_summary_demo_failure_2(): | ||
| raise RuntimeError( | ||
| "Intentional error: remove after verifying PR summary comment." | ||
| ) |
There was a problem hiding this comment.
Drop these demo failures from the checked-in pytest suite.
They keep conda-python-tests red on every run and do not cover cuOpt behavior, edge cases, or a regression. Please remove the file before merge, or move this validation to an opt-in/demo-only path outside the normal PR test matrix.
🧹 Proposed fix
-# Intentional failures to exercise the PR test summary comment feature.
-# Remove after verifying the summary comment shows correct output.
-
-
-def test_ci_summary_demo_failure():
- assert False, (
- "Intentional failure: remove after verifying PR summary comment."
- )
-
-
-def test_ci_summary_demo_failure_2():
- raise RuntimeError(
- "Intentional error: remove after verifying PR summary comment."
- )As per coding guidelines, python/**/tests/** should validate numerical correctness and regressions, and the PR objective says these failures are only temporary for feature verification.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Intentional failures to exercise the PR test summary comment feature. | |
| # Remove after verifying the summary comment shows correct output. | |
| def test_ci_summary_demo_failure(): | |
| assert False, ( | |
| "Intentional failure: remove after verifying PR summary comment." | |
| ) | |
| def test_ci_summary_demo_failure_2(): | |
| raise RuntimeError( | |
| "Intentional error: remove after verifying PR summary comment." | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuopt/cuopt/tests/test_ci_summary_demo.py` around lines 4 - 17, Remove
the intentional failing demo tests from the pytest suite: delete or relocate
python/cuopt/cuopt/tests/test_ci_summary_demo.py (which contains
test_ci_summary_demo_failure and test_ci_summary_demo_failure_2) out of the
python/**/tests/** tree into an opt-in or demo-only path so they no longer run
in the default PR test matrix; if relocating, ensure the new path is excluded
from normal pytest discovery (or only run via an explicit demo CI job).
Source: Coding guidelines
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ci/utils/pr_test_summary.py (1)
141-145:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep skipped-job counts in the green-path summary.
Line 202 now preserves skipped jobs correctly, but Line 144 still renders
All {len(passed)} test job(s) passed.wheneverfailedis empty. That makes the sticky comment wrong on green runs with skipped matrix jobs and misses the PR objective of reporting pass/fail/skip totals across the full matrix.💡 Minimal fix
def _build_body(failed, passed, skipped, job_analysis): lines = [_MARKER, "## CI Test Summary", ""] if not failed: - lines.append(f"✅ All {len(passed)} test job(s) passed.") + lines.append( + f"✅ 0 failed · {len(passed)} passed · {len(skipped)} skipped" + ) else: lines.append( f"**{len(failed)} failed** · {len(passed)} passed · {len(skipped)} skipped" )Also applies to: 200-202
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci/utils/pr_test_summary.py` around lines 141 - 145, The green-path summary in _build_body currently prints "All {len(passed)} test job(s) passed." and omits skipped counts; update the branch where failed is empty to include skipped in the message (e.g., "All X passed, Y skipped" or "X passed and Y skipped" using len(passed) and len(skipped)), and ensure any other place in _build_body that renders the all-passed message (see the same logic around the lines handling passed/skipped at the end of the function) is updated consistently so the sticky comment reports pass/fail/skip totals across the full matrix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 141-145: The green-path summary in _build_body currently prints
"All {len(passed)} test job(s) passed." and omits skipped counts; update the
branch where failed is empty to include skipped in the message (e.g., "All X
passed, Y skipped" or "X passed and Y skipped" using len(passed) and
len(skipped)), and ensure any other place in _build_body that renders the
all-passed message (see the same logic around the lines handling passed/skipped
at the end of the function) is updated consistently so the sticky comment
reports pass/fail/skip totals across the full matrix.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a3d89df2-775a-435d-9608-0e605c043ed8
📒 Files selected for processing (1)
ci/utils/pr_test_summary.py
Add a
pr-test-summaryjob that runs after all test jobs complete and posts a sticky comment on the PR listing which matrix jobs failed with direct links to their logs. The comment updates in-place on re-run.