feat: use graph-based compare instead of timestamp when from-tag-name is set#317
feat: use graph-based compare instead of timestamp when from-tag-name is set#317miroslavpojer wants to merge 5 commits into
Conversation
- Introduced a new compare mode in the release notes generator to handle branching release histories. - The compare mode activates when `from-tag-name` is provided, allowing for accurate commit selection based on the commit graph rather than timestamps. - Added functionality to extract PR numbers from commit messages in both squash and merge formats. - Updated `DataMiner` to utilize the GitHub Compare API for fetching commits unique to the current release. - Enhanced filtering logic to retain PRs and commits in compare mode, regardless of their merge dates. - Added comprehensive unit and integration tests to validate the new compare mode functionality. - Documented the purpose, usage, and configuration of the compare mode in the relevant documentation files.
WalkthroughThis PR implements Compare Mode: when ChangesCompare Mode Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR adds a “compare mode” path to the release-notes mining pipeline so that when from-tag-name is provided, commits/PRs are selected via GitHub’s compare graph (ancestry-based) instead of a timestamp window—fixing incorrect results on parallel maintenance/develop branches and cherry-picks.
Changes:
- Added
compare_commit_shastoMinedDataas a sentinel for compare mode. - Implemented compare-mode mining via
repo.compare(from_tag, to_tag)and PR-number extraction from commit messages. - Updated filtering to bypass PR/commit timestamp reduction when compare mode is active; added unit + integration coverage and documentation.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| release_notes_generator/data/miner.py | Adds compare-mode mining and PR-number extraction helper. |
| release_notes_generator/data/filter.py | Skips PR/commit timestamp filtering when compare mode is detected. |
| release_notes_generator/model/mined_data.py | Adds compare_commit_shas sentinel field. |
| tests/unit/release_notes_generator/data/test_miner.py | Unit tests for PR extraction helper and compare-mode mining path. |
| tests/unit/release_notes_generator/data/test_filter.py | Unit tests for compare-mode filter bypass + timestamp regressions. |
| tests/integration/test_compare_mode.py | End-to-end integration coverage contrasting compare vs timestamp behavior. |
| docs/features/compare_mode.md | New documentation page explaining compare mode behavior and rationale. |
| docs/features/tag_range.md | Links tag-range feature docs to compare mode. |
| README.md | Adds Compare Mode to the feature tutorial index. |
| release_notes_generator/data/SPEC.md | Adds/updates spec describing compare-mode behavior and tests. |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
release_notes_generator/data/miner.py (1)
37-45: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winMove imports to the top; don't interleave the module constant.
from github.Commit import Commit as GithubCommit(Line 37) is placed after the first-party imports, and_PR_NUMBER_RE(Line 39) is defined in the middle of the import block, with more imports following on Lines 40-45. This trips Pylint (wrong-import-position/wrong-import-order).Group the
githubimport with the otherfrom github...imports (Lines 29-33) and define the constant after theloggerdeclaration (Line 47).♻️ Proposed reorganization
-from github.Commit import Commit as GithubCommit - -_PR_NUMBER_RE = re.compile(r"\(#(\d+)\)|Merge pull request #(\d+)") from release_notes_generator.model.record.issue_record import IssueRecord from release_notes_generator.model.mined_data import MinedData from release_notes_generator.model.record.pull_request_record import PullRequestRecord from release_notes_generator.utils.decorators import safe_call_decorator from release_notes_generator.utils.github_rate_limiter import GithubRateLimiter from release_notes_generator.utils.record_utils import get_id, parse_issue_idAdd the
githubimport alongside the existing ones (near Line 33):from github.Repository import Repository +from github.Commit import Commit as GithubCommitAnd define the constant after the logger (Line 47):
logger = logging.getLogger(__name__) + +_PR_NUMBER_RE = re.compile(r"\(#(\d+)\)|Merge pull request #(\d+)")As per coding guidelines: "Keep imports at top of file".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@release_notes_generator/data/miner.py` around lines 37 - 45, Move the github import and the module-level constant so all imports remain grouped at the top: relocate the line "from github.Commit import Commit as GithubCommit" to join the other from github... imports (near the existing github import group) and remove the inlined "_PR_NUMBER_RE" definition from the import block; then place the "_PR_NUMBER_RE = re.compile(...)" constant immediately after the logger declaration (the module-level logger variable) so imports are contiguous and the constant is defined after logger as requested.
🧹 Nitpick comments (1)
tests/unit/release_notes_generator/data/test_miner.py (1)
528-569: ⚖️ Poor tradeoffPrivate method exercised directly in tests.
These tests invoke
DataMiner._extract_pr_numbers_from_commits(...)directly. Consider exercising this logic through a public entry point (e.g. viamine_datain compare mode, which the later tests already drive), or extracting the parser into a module-level helper so it can be tested without reaching into a private member.As per coding guidelines: "Must not access private members (names starting with
_) of the class under test directly in tests".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/release_notes_generator/data/test_miner.py` around lines 528 - 569, Tests directly call the private method DataMiner._extract_pr_numbers_from_commits; instead, either exercise this behavior via the public DataMiner.mine_data (compare mode) or extract the PR-number parsing into a new module-level helper function (e.g., parse_pr_numbers_from_commits) and update tests to call that public helper; update references in tests from DataMiner._extract_pr_numbers_from_commits to the chosen public entry (mine_data with appropriate inputs) or the new helper to comply with the rule against accessing private members.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/features/compare_mode.md`:
- Around line 16-28: Update the fenced code blocks in
docs/features/compare_mode.md to include a language identifier (e.g., ```text)
so they no longer trigger MD040; specifically modify each triple-backtick block
showing the git graph snippets (the blocks starting with the
develop:/maintenance/v2.6.x lines and the other example block that starts with
"from-tag-name provided?") to use ```text instead of bare ```, ensuring all
occurrences noted in the comment (lines ~16–28 and ~103–118) are updated.
In `@release_notes_generator/data/filter.py`:
- Around line 96-109: This file fails Black formatting due to manually wrapped
logger.debug calls and comprehension wrapping around the pulls and commits
filtering; run Black (e.g., black release_notes_generator/data/filter.py or the
repo root) to auto-format the logger.debug invocations and the
commits_dict/pulls_dict comprehensions so they match the project's Black rules,
or manually reflow the logger.debug argument lines and the commits_dict
comprehension used with data.commits and data.since (and the earlier pulls_dict
usage with data.pull_requests) to conform to Black's preferred line breaks.
In `@release_notes_generator/data/miner.py`:
- Around line 74-75: Run Black on the repository (or this file) to fix
formatting; specifically reformat the multi-line logger.info call that passes
ActionInputs.get_from_tag_name() and ActionInputs.get_tag_name() so it matches
Black's expected layout (e.g. a single-line call or Black's preferred wrapped
form), and do the same for the other similar logger.info call later in the file
that wraps ActionInputs methods; search for logger.info and
ActionInputs.get_from_tag_name()/get_tag_name() to locate and reformat the
offending invocations.
- Around line 460-464: The PR-number extraction currently scans the entire
commit message via _PR_NUMBER_RE.finditer(message), which picks up references
from the body; limit the search to the commit subject only by extracting the
first line (e.g. subject = commit.commit.message.splitlines()[0] or
.partition('\n')[0]) and run _PR_NUMBER_RE.finditer(subject) so
pr_numbers.add(int(number_str)) only collects PRs referenced in the subject
(affects the loop handling commit, message, _PR_NUMBER_RE, and pr_numbers).
- Around line 76-81: The call to self._safe_call(repo.compare) can return None,
so before doing list(comparison.commits) guard against comparison being None
(similar to how get_release/get_repo are handled): check if comparison is falsy,
log or warn about the failed compare (include context like
ActionInputs.get_from_tag_name()/get_tag_name()), and return or populate data
with empty compare_commits/compare_commit_shas and data.commits as appropriate
to avoid the AttributeError; update the logic around comparison,
compare_commits, data.compare_commit_shas, and data.commits to handle the None
case.
- Around line 79-81: The code currently converts comparison.commits into
compare_commits and assumes it's complete; add a guard that compares
len(compare_commits) against comparison.total_commits (e.g., if
len(compare_commits) < comparison.total_commits:) and handle the PyGithub
pagination cap case by logging or raising a warning/error and avoiding silent
truncation before setting data.compare_commit_shas and data.commits (referencing
compare_commits, comparison.total_commits, data.compare_commit_shas, and
data.commits).
In `@release_notes_generator/data/SPEC.md`:
- Around line 14-26: The fenced code blocks in SPEC.md (the commit log block
starting with ``` and the git command block using ``` around git log --oneline
--right-only --cherry-pick v2.6.4...v2.6.5) are missing language identifiers and
trigger MD040; update those fences to include appropriate languages (e.g.,
change the commit-log fence to ```text and the git command fence to ```bash) so
the Markdown linter passes.
---
Outside diff comments:
In `@release_notes_generator/data/miner.py`:
- Around line 37-45: Move the github import and the module-level constant so all
imports remain grouped at the top: relocate the line "from github.Commit import
Commit as GithubCommit" to join the other from github... imports (near the
existing github import group) and remove the inlined "_PR_NUMBER_RE" definition
from the import block; then place the "_PR_NUMBER_RE = re.compile(...)" constant
immediately after the logger declaration (the module-level logger variable) so
imports are contiguous and the constant is defined after logger as requested.
---
Nitpick comments:
In `@tests/unit/release_notes_generator/data/test_miner.py`:
- Around line 528-569: Tests directly call the private method
DataMiner._extract_pr_numbers_from_commits; instead, either exercise this
behavior via the public DataMiner.mine_data (compare mode) or extract the
PR-number parsing into a new module-level helper function (e.g.,
parse_pr_numbers_from_commits) and update tests to call that public helper;
update references in tests from DataMiner._extract_pr_numbers_from_commits to
the chosen public entry (mine_data with appropriate inputs) or the new helper to
comply with the rule against accessing private members.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e9419181-1673-4e3c-a740-cbcd97701a09
📒 Files selected for processing (10)
README.mddocs/features/compare_mode.mddocs/features/tag_range.mdrelease_notes_generator/data/SPEC.mdrelease_notes_generator/data/filter.pyrelease_notes_generator/data/miner.pyrelease_notes_generator/model/mined_data.pytests/integration/test_compare_mode.pytests/unit/release_notes_generator/data/test_filter.pytests/unit/release_notes_generator/data/test_miner.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unit/release_notes_generator/data/test_miner.py (1)
723-744: ⚡ Quick winConsider covering the
comparison is Noneguard.The
repo.compare→Nonepath (miner.py Lines 82-88) that logs an error and callssys.exit(1)is not exercised by any unit test. A small test patchingmock_repo.compare.return_value = Noneand assertingsys.exitwould lock in that error-exit behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/release_notes_generator/data/test_miner.py` around lines 723 - 744, Add a unit test that exercises the repo.compare → None branch: in the test (similar to test_mine_data_compare_mode_warns_on_retrieval_cap_without_total) patch mock_repo.compare.return_value = None, patch sys.exit (or mock it) and patch the error logger used in release_notes_generator.data.miner, then call miner.mine_data() and assert that the error log was emitted and sys.exit was invoked; reference the miner.mine_data method and the repo.compare behavior so the test covers the comparison is None guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/release_notes_generator/data/test_miner.py`:
- Around line 702-744: Add a one-line docstring to each new test function to
state its scenario: for
test_mine_data_compare_mode_warns_on_total_commits_overflow add a docstring like
"Warns when compare mode reports a larger total_commits than retrieved commits
(overflow case)"; for
test_mine_data_compare_mode_warns_on_retrieval_cap_without_total add a docstring
like "Warns when retrieval reaches the internal cap (10,000) but no
total_commits value is provided"; place each docstring as the first statement
inside the respective functions
(test_mine_data_compare_mode_warns_on_total_commits_overflow and
test_mine_data_compare_mode_warns_on_retrieval_cap_without_total).
---
Nitpick comments:
In `@tests/unit/release_notes_generator/data/test_miner.py`:
- Around line 723-744: Add a unit test that exercises the repo.compare → None
branch: in the test (similar to
test_mine_data_compare_mode_warns_on_retrieval_cap_without_total) patch
mock_repo.compare.return_value = None, patch sys.exit (or mock it) and patch the
error logger used in release_notes_generator.data.miner, then call
miner.mine_data() and assert that the error log was emitted and sys.exit was
invoked; reference the miner.mine_data method and the repo.compare behavior so
the test covers the comparison is None guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7600d2f0-a0a7-43e0-9c0f-36bd6fca4832
📒 Files selected for processing (4)
docs/features/compare_mode.mdrelease_notes_generator/data/filter.pyrelease_notes_generator/data/miner.pytests/unit/release_notes_generator/data/test_miner.py
✅ Files skipped from review due to trivial changes (1)
- docs/features/compare_mode.md
🚧 Files skipped from review as they are similar to previous changes (1)
- release_notes_generator/data/filter.py
| def test_mine_data_compare_mode_warns_on_total_commits_overflow(mocker, mock_repo): | ||
| commit_mock = mocker.Mock() | ||
| commit_mock.sha = "abc123" | ||
| commit_mock.commit.message = "Fix service access role (#1363)" | ||
| warning_mock = mocker.patch("release_notes_generator.data.miner.logger.warning") | ||
|
|
||
| miner = _make_compare_miner( | ||
| mocker, | ||
| mock_repo, | ||
| compare_commits=[commit_mock], | ||
| total_commits=10_001, | ||
| ) | ||
| miner.mine_data() | ||
|
|
||
| warning_mock.assert_called_once_with( | ||
| "Compare mode: retrieved %d commit(s) but comparison reports %d total; results may be truncated.", | ||
| 1, | ||
| 10_001, | ||
| ) | ||
|
|
||
|
|
||
| def test_mine_data_compare_mode_warns_on_retrieval_cap_without_total(mocker, mock_repo): | ||
| commits = [] | ||
| for i in range(10_000): | ||
| commit_mock = mocker.Mock() | ||
| commit_mock.sha = f"sha{i}" | ||
| commit_mock.commit.message = "Commit without PR ref" | ||
| commits.append(commit_mock) | ||
|
|
||
| warning_mock = mocker.patch("release_notes_generator.data.miner.logger.warning") | ||
|
|
||
| miner = _make_compare_miner( | ||
| mocker, | ||
| mock_repo, | ||
| compare_commits=commits, | ||
| ) | ||
| miner.mine_data() | ||
|
|
||
| warning_mock.assert_called_once_with( | ||
| "Compare mode: retrieved %d commit(s); comparison ranges over %d commits may be truncated.", | ||
| 10_000, | ||
| 10_000, | ||
| ) |
There was a problem hiding this comment.
Add scenario docstrings to the new tests.
test_mine_data_compare_mode_warns_on_total_commits_overflow and test_mine_data_compare_mode_warns_on_retrieval_cap_without_total have no docstring describing their scenario.
As per coding guidelines: "each test must state its scenario in the docstring".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/release_notes_generator/data/test_miner.py` around lines 702 -
744, Add a one-line docstring to each new test function to state its scenario:
for test_mine_data_compare_mode_warns_on_total_commits_overflow add a docstring
like "Warns when compare mode reports a larger total_commits than retrieved
commits (overflow case)"; for
test_mine_data_compare_mode_warns_on_retrieval_cap_without_total add a docstring
like "Warns when retrieval reaches the internal cap (10,000) but no
total_commits value is provided"; place each docstring as the first statement
inside the respective functions
(test_mine_data_compare_mode_warns_on_total_commits_overflow and
test_mine_data_compare_mode_warns_on_retrieval_cap_without_total).
Problem
When generating release notes for a maintenance branch (
v2.6.x) while a parallel developmentbranch (
v2.7.x) is also active, the existing timestamp approach returns commits and PRs from bothbranches.
repo.get_commits(since=timestamp)has no concept of branch ancestry — it returnseverything in the time window regardless of which branch it lives on.
Example: generating notes for
v2.6.5(previousv2.6.4) picks upv2.7.0andv2.7.1PRsthat happened to be merged in the same window. The generated release notes are incorrect.
Cherry-picks compound the problem: the fetched PR object has
merged_atequal to the originaldevelop merge date, so timestamp filtering can silently drop the cherry-picked fix from the
maintenance release notes entirely.
Solution
When
from-tag-nameis explicitly provided, userepo.compare(from_tag, to_tag)(GitHub CompareAPI) instead of
repo.get_commits(since=…). This is a graph-based operation — it returns onlycommits reachable from
to_tagbut not fromfrom_tag, independent of timestamps or branches.PR numbers are then extracted from those commit messages and fetched individually by number.
FilterByReleasedetects compare mode via a sentinel field (MinedData.compare_commit_shas) andskips its timestamp check for PRs and commits — the set is already exact.
data.sinceis still derived from the previous release in compare mode; it is only used for issuefetching and release-note extraction date-gating — not for PR/commit filtering.
Timestamp mode is unchanged when
from-tag-nameis absent.Changes
release_notes_generator/model/mined_data.pycompare_commit_shas: set[str]sentinel fieldrelease_notes_generator/data/miner.pymine_data();_extract_pr_numbers_from_commits()static helperrelease_notes_generator/data/filter.pycompare_commit_shasis non-emptytests/unit/release_notes_generator/data/test_miner.pytests/unit/release_notes_generator/data/test_filter.pytests/integration/test_compare_mode.pydocs/features/compare_mode.mddocs/features/tag_range.mdREADME.mdrelease_notes_generator/data/SPEC.mdTest coverage
Release Notes
from-tag-nameis set, commits are selected via the GitHub Compare API instead of a timestamp window, giving correct results for branching release histories (maintenance + develop in parallel).from-tag-nameis absent.Related
Closes #316
Summary by CodeRabbit
New Features
Documentation