commit-reach: terminate merge-base walk when one side is exhausted#2149
commit-reach: terminate merge-base walk when one side is exhausted#2149spkrka wants to merge 8 commits into
Conversation
7d5b1bb to
3e1315e
Compare
|
/preview |
|
Preview email sent as pull.2149.git.1781946989.gitgitgadget@gmail.com |
|
/submit |
|
Submitted as pull.2149.git.1781951820.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via git@418052d. |
|
/submit |
|
Submitted as pull.2149.v2.git.1782303254.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This branch is now known as |
f574f35 to
4b9f192
Compare
|
/submit |
|
Submitted as pull.2149.v3.git.1782479286.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
Add a technical document describing the paint_down_to_common() algorithm used for merge-base computation, covering the paint walk, generation number regions, and termination conditions. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add test cases to t6600-test-reach.sh that exercise edge cases in the side-exhaustion optimization for paint_down_to_common(): - in_merge_bases_many:self: commit is both A and one of the X inputs - get_merge_bases_many:duplicate-twos: duplicate entries in X list - get_merge_bases_many:pending-stale: STALE transition on an already-painted commit (ps-* diamond topology) - get_merge_bases_many:infinity-both-sides: both tips outside the commit-graph with non-monotonic dates (pi-* topology) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add t6099 to test the case where multiple merge-base candidates exist and one is an ancestor of another. This exercises the side-exhaustion optimization in paint_down_to_common together with the remove_redundant safety net in get_merge_bases_many_0. Add a mixed finite/INFINITY test to t6600 where one tip is outside the commit-graph (INFINITY generation) and the other is inside. This exercises the region transition: the walk starts in the INFINITY region where side-exhaustion is disabled, then crosses into the finite region where it can fire. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add a step counter and trace2_data_intmax() call so that the number of commits visited during the paint walk is observable via GIT_TRACE2_EVENT. This provides a way to measure the impact of future optimizations without relying on wall-clock benchmarks alone. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add a paint_state struct for use by paint_down_to_common() that wraps a prio_queue with per-side commit counters. Each non-stale queued commit occupies exactly one counter bucket based on its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending merge-base candidate). The counters are maintained by paint_count_update() which adjusts the appropriate bucket by a signed delta. An exhaustive switch on the paint+stale bits documents all valid flag combinations in one place. Convert paint_down_to_common() to use paint_state. The loop now drains the queue via paint_queue_get() which returns NULL when all counters reach zero, replacing the old pointer-based termination (max_nonstale). This is equivalent behavior -- both conditions detect that no non-stale entries remain. paint_queue_get() uses a "pop first" form: it dequeues a commit, then checks the counters. This means the loop exits one iteration earlier than the old code in some topologies (the popped stale commit is never processed), so a few step counts drop by one. The existing nonstale_queue is left in place for ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com>
nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became unused after the previous commit. The core nonstale_queue functions remain in use by ahead_behind(). Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced earlier. Once the walk enters the
finite-generation region, terminate early when one side's exclusive
count drops to zero -- no new merge-base can form without both paint
sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been dequeued and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable. The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.
Step counts measured with trace2 on git.git with commit-graph:
merge-base --all v2.0.0 v2.55.0-rc1:
before: 72264 steps after: 44589 steps
merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
before: 110 steps after: 7 steps
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Consolidate the min_generation termination condition into paint_queue_get(), alongside the existing stale-entry and side-exhaustion checks. Move last_gen into struct paint_state so that commit_graph_generation() is called exactly once per dequeued commit and the result is shared across all termination checks and the monotonicity BUG assertion. Signed-off-by: Kristofer Karlsson <krka@spotify.com>
|
/submit |
|
Submitted as pull.2149.v4.git.1782649547.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
commit-reach: terminate merge-base walk when one paint side is exhausted
Optimize paint_down_to_common() for merge-base queries that hit
large one-sided histories.
When the walk from one side reaches a commit with a very low
generation number that the other side never paints, the walk is
forced to drain most of the graph. A common trigger is a
repository import that grafts a separate history with its own root,
but any merge that introduces a low-generation commit never painted
by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive
PARENT1 and PARENT2 paint meet. This series teaches
paint_down_to_common() to stop as soon as one side has no exclusive
commits left in the queue; once one side is exhausted, no further
candidates can appear.
In the RFC thread [1], Derrick Stolee provided a criss-cross
counterexample that sharpened the halt condition, and Elijah Newren
independently discovered the same optimization and shared an
implementation in PR #2150 [2]. Patches 2-3 incorporate test
cases from Elijah's branch.
This series implements the optimization only after the walk enters
the finite-generation region, where generation ordering guarantees
that paint on visited commits is final.
Patch layout:
1/8 Documentation/technical: add paint-down-to-common doc
2/8 t6600: add test cases for side-exhaustion edge cases
3/8 t6099, t6600: add side-exhaustion regression tests
4/8 commit-reach: add trace2 instrumentation to paint_down_to_common()
5/8 commit-reach: introduce struct paint_state with per-side counters
6/8 commit-reach: remove unused nonstale_queue dedup wrappers
7/8 commit-reach: terminate merge-base walk when one paint side is exhausted
8/8 commit-reach: move min_generation check into paint_queue_get()
Benchmarks
Step counts are deterministic (measured via trace2_data_intmax added
in patch 4). Wall-clock times are best-of-11 runs.
2.6M-commit monorepo with commit-graph:
git.git (88k commits, commit-graph):
Changes since v3:
Fixed BUG assertion that was accidentally made unconditional
in v3: restored the min_generation guard so it only fires
when generation-based ordering is active.
Moved generation cutoff and single-result termination
conditions into the documentation in patch 1/8, since they
describe existing behavior.
Renamed paint_state counter fields for clarity: p1_count ->
parent1_count, p2_count -> parent2_count, pending_merge_bases
-> mb_candidate_count. Changed counter types from int to
size_t. (Suggested by Rene Scharfe.)
Changes since v2:
New patch 8/8: moved the min_generation termination check and
the last_gen monotonicity assertion into paint_queue_get(),
consolidating halt conditions. commit_graph_generation() is
now called once per dequeued commit and shared across all
checks.
Moved all halt conditions inside paint_queue_get() with the
"pop first" form: pop, check, then
decrement counters. This keeps the optimization commit's
diff minimal (just inserting the new checks between pop and
decrement).
Shortened the doc comment on paint_queue_get() to describe
what it does rather than how. Inline comments on each
return NULL explain the specific halt condition.
Replaced the manual commit-graph setup in the step-count test
with run_all_modes, which now sets GIT_TRACE2_EVENT per mode
and produces trace-mode-{none,full,half,no-gdat}.txt files.
Added a test_paint_down_steps helper for concise 4-mode step
assertions with diagnostic output on mismatch (prints
"expected X, got Y" instead of a silent grep failure).
Added step-count assertions to the single-walk edge-case
tests: in_merge_bases_many:self, pending-stale,
infinity-both-sides, mixed-finite-infinity.
Included step counts alongside wall-clock times in the
benchmark tables.
Changes since v1:
Reordered patches: documentation first (describing the existing
algorithm), tests before code changes, so they demonstrate
passing with old logic first.
Dropped the ahead_behind decoupling patch. paint_state is now a
NEW struct alongside nonstale_queue instead of replacing it.
ahead_behind() is completely untouched.
Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup()
(dead code after the conversion) in a separate commit.
Renamed: struct paint_queue -> paint_state, field pq -> queue,
paint_count_add/remove -> paint_count_update (single function
with signed delta parameter).
Split the old paint_count_transition (which handled both old
and new flags in one call) into separate remove/add calls with
a signed delta. This eliminates the need for the case 0
handler (which tracked "not in the queue") and allows an
exhaustive switch on (PARENT1 | PARENT2 | STALE) that
documents all valid flag combinations, with BUG() in default.
Added trace2_data_intmax() instrumentation to report the number
of commits visited per paint walk (separate commit), with
deterministic step-count assertions in t6600.
Expanded switch statements to multi-line format per .clang-format.
Used !count style throughout instead of count == 0.
Updated technical documentation alongside code changes.
[1] https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] #2150
CC: Derrick Stolee stolee@gmail.com
CC: Elijah Newren newren@gmail.com