fix(bazel): pre-subscribe AXL build_events() via local() sink#1077
Open
gregmagolan wants to merge 2 commits into
Open
fix(bazel): pre-subscribe AXL build_events() via local() sink#1077gregmagolan wants to merge 2 commits into
gregmagolan wants to merge 2 commits into
Conversation
✨ Aspect Workflows Tasks📅 Wed May 13 04:05:49 UTC 2026 ✅ 16 successful tasks
Powered by Aspect Build · GitHub · X · LinkedIn · YouTube |
b2e1ca0 to
f1b6c61
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1b6c61af8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
f1b6c61 to
2057863
Compare
2057863 to
e9ee487
Compare
…calls
Per Codex review: the prior implementation unconditionally `.take()`'d
the pre-subscribed `early_event_subscriber` on every call and errored
when it was None, regressing the historical "each call returns a new
iterator" contract. AXL code that hands a fresh iterator to each
subroutine (e.g. helper functions that each obtain their own
`build_events()` handle) aborted at runtime.
Fix preserves both ends of the contract:
- **First call**: returns the race-free pre-subscribed
`CappedSubscriber` (registered inside `Build::spawn` before bazel
opens the BEP FIFO). No change.
- **Subsequent calls**: fall back to a fresh
`event_stream.subscribe()` wrapped in
`CappedSubscriber::from_unbounded(...)`. Normal late-subscriber
semantics — these iterators only see events broadcast after the
fresh subscribe call. The primary consumer (first caller) is past
the race window, so the extra iterators getting late-subscribe
semantics is acceptable.
- **No Local sink configured at all**: still errors. The
explicit-intent gate ("declare a `bazel.build_events.local(...)`
to opt in to local iteration") is preserved via a new
`had_local_sink: bool` field on `Build`. Without it the runtime
won't subscribe behind the caller's back, because every undrained
subscription is buffered memory.
Implementation:
- `CappedSubscriber.queued` is now `Option<Arc<AtomicUsize>>`. Plain
unbounded subscribers (the late-subscribe path) hold `None` and
skip the decrement on `recv` / `try_recv` — they're on the
broadcaster's `Subscription::Unbounded` path, so there's no
shared counter to track.
- New `CappedSubscriber::from_unbounded(rx)` constructor.
- `Build.had_local_sink` driven by `local_sink_cap.is_some()` at
spawn time, so the gating signal survives the first `.take()`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e4514bd to
e4e1c9d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes an intermittent late-subscribe race in
ctx.bazel.build(...)'sevent stream. The AXL-facing receiver returned by
build.build_events()was created lazily, only when the AXL taskcalled the method. On a warm bazel daemon with a fully cached build,
BES events stream within milliseconds — fast enough that any work
between
ctx.bazel.build(...)returning and the subsequentbuild.build_events()(atask_updateemit, abuildkite-agent annotatespawn, the gRPC sink's connect attempt) could let theearly burst (
build_started,target_completed,named_set_of_files) flow before the AXL subscriber registered.Observed symptoms:
runnable.determine_entrypoint()returnedNone→Failed to determine the formatter entrypoint.onaspect format(sistererrors on
aspect gazelle/aspect lint). Lint's SARIFcollector missed reports → "0 findings" on runs that should have
flagged.
first non-burst event arrived.
escapes local testing.
Two fixes together:
1. Explicit local-iteration intent
New
BuildEventSink::Localvariant + AXL factorybazel.build_events.local(buffer_cap=10000). Callers declarelocal-iteration intent up front instead of relying on the runtime
to guess from a subsequent
build.build_events()call:Build::spawnregisters the local subscriber inside the spawn,before bazel opens the BEP FIFO and before remote gRPC sinks touch
the network. The first call to
build.build_events()returns thepre-subscribed receiver; events broadcast in between were already
buffered.
If no
Localsink is configured,build.build_events()errors atcall time — the runtime won't subscribe behind the caller's back,
because every undrained subscription is buffered memory.
2. Capped buffer (memory bound)
A misbehaving task that subscribes but never drains would otherwise
grow the channel without bound. New
CappedSubscriber<T>wraps aReceiver+ sharedArc<AtomicUsize>. The broadcaster's subscriberlist becomes
Vec<Subscription<T>>withUnbounded(existing —tracing sink, gRPC sinks) and
Capped { tx, queued, cap }variants.On each send the broadcaster checks the capped entries; when
queued >= capit drops the subscription and the AXL iterator seesDisconnectedon its next recv. Default cap is 10000 events.Built-in tasks updated
Every built-in task that iterates BES events now prepends
bazel.build_events.local()to its sink list:build/test(vialib/bazel_runner.axl) — gated onneed_bes(anybazel_trait.build_eventhandler orlifecycle.task_updatehandler registered).lint,format,gazelle,delivery— always (each callsbuild.build_events()unconditionally).Docs
DEVELOPMENT.mdupdated: the broadcaster section now describes theexplicit-intent contract, the buffer-cap semantics, and the
late-subscribe escape hatch. The contributor checklist is updated to
remind authors of new tasks to include
bazel.build_events.local()when they iterate.
Relationship to
HANG_ANALYSIS.mdand PR #1071HANG_ANALYSIS.mdcovers a separate GHA-cancel hang with threeroot causes. PR #1071 (
fix(http): bound read and connect timeouts on the runtime HTTP client, already landed) handled one of them. Theremaining two (signal-handler / process-registry plumbing for
subprocess cleanup on cancel) are in a separate change set and not in
this PR. The two surface areas overlap (BES event-stream
plumbing / cancellation cleanup) but the fixes are independent.
Changes are visible to end-users: yes
Suggested release notes
bazel.build_events.local(buffer_cap=10000)AXL factory.Declare it explicitly in the
build_events=[...]list ofctx.bazel.build(...)/ctx.bazel.test(...)when your taskiterates the BES stream via
build.build_events(). The runtimepre-registers your receiver before bazel opens the BEP FIFO,
eliminating the warm-daemon late-subscribe race that caused
intermittent
Failed to determine the formatter entrypoint.andsimilar errors.
ctx.bazel.build(build_events = True)continues towork — it now expands to
[bazel.build_events.local()].build.build_events()now errors when called against a build thatwas configured with remote BES sinks but no local sink. Callers
that need to iterate locally must opt in via
bazel.build_events.local(...). Existing callers using theTrueshorthand are unaffected.
Test plan
normal-drain, overflow-drop, and capped-coexists-with-unbounded
paths.
suite (PR comment, BK annotation, template, lint, format, gazelle,
delivery) — 727 AXL tests + 7 snapshot suites pass.
aspect build,aspect test,aspect lint,aspect format,aspect gazelle,aspect deliveryagainst a warm daemon and confirmed entrypoint resolution, SARIF
collection, and live status-surface phase tags consistently see
the early-burst events.