PHOENIX-7871 :- Fail-fast batched mutation block via DNRIOE inheritance#2491
Open
lokiore wants to merge 1 commit into
Open
PHOENIX-7871 :- Fail-fast batched mutation block via DNRIOE inheritance#2491lokiore wants to merge 1 commit into
lokiore wants to merge 1 commit into
Conversation
690919c to
51f950b
Compare
tkhurana
reviewed
May 29, 2026
MutationBlockedIOException and StaleClusterRoleRecordException now extend DoNotRetryIOException (instead of IOException) so HBase's RPC retry layers fail-fast on the first hit instead of consuming the per-call retry budget. Empirical verification (debugger + testMutationBlockedFailsFastWithDNRIOE passing with the Phoenix-side intercept disabled via config flag) confirms that ProtobufUtil.toException rehydrates the wire payload as a real MutationBlockedIOException instance, so AsyncRequestFutureImpl.manageError:749 instanceof DoNotRetryIOException returns true post-inheritance and HBase's batched-RPC retry layer fails fast on the first hit. No Phoenix-side intercept needed. The IT testMutationBlockedFailsFastViaInheritanceAlone verifies the inheritance-alone path under elevated retry settings: assertion is timing-bound (<10s) which would fail if HBase started retrying despite DNRIOE. Generated-by: Claude Code (Opus 4.7)
51f950b to
0620c86
Compare
tkhurana
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The two HA failover signaling exception classes —
MutationBlockedIOExceptionandStaleClusterRoleRecordException— now extendorg.apache.hadoop.hbase.DoNotRetryIOExceptioninstead ofjava.io.IOException. This is sufficient to deliver fail-fast on the batched mutation path; no Phoenix-side intercept is needed.JIRA: https://issues.apache.org/jira/browse/PHOENIX-7871
Why are the changes needed?
Pre-fix, the HA failover exceptions extended
IOException. When the server-side mutation-block gate fired, HBase's RPC retry layers had no signal to fail fast — the failure was wrapped inRetriesExhaustedWithDetailsExceptiononly after the client retry budget was exhausted (default 16 retries × per-retry timeout = tens of seconds). For mutations issued during a brief mutation-block window or against a now-STANDBY cluster, this manifested as multi-second tail latencies that should have been single-RTT failures.With the inheritance change, server-side rehydration (via
ProtobufUtil.toException) delivers a realMutationBlockedIOExceptioninstance to HBase's batched-RPC retry layer (AsyncRequestFutureImpl.manageError); theinstanceof DoNotRetryIOExceptioncheck at line 749 returns true post-inheritance, so no retries fire and the failure surfaces immediately.An earlier draft of this PR included a Phoenix-side intercept in
MutationState.send(afindHaFailoverCauseInChainhelper + a config flagphoenix.ha.failfast.failover.exceptions.enabled). That intercept was empirically redundant — it never fired in practice because HBase already short-circuits via the inheritance path. This is the slimmed-down version: inheritance-only.Does this PR introduce any user-facing change?
No
The exception-inheritance change is internal — both exception classes still satisfy
instanceof IOException, so any existingcatch (IOException)site continues to match. The user-visible behavior change is reduced tail latency on the failover path, which is a strict improvement.How was this patch tested?
New unit tests (
HaFailoverExceptionInheritanceTest, 3 tests, all PASS) — verifies the inheritance change:mutationBlockedExtendsDoNotRetryIOExceptionstaleClusterRoleRecordExtendsDoNotRetryIOExceptionstaleClusterRoleRecordTwoArgConstructorPreservedIT (
IndexRegionObserverMutationBlockingIT, 7 tests, all PASS):testMutationBlockedFailsFastWithDNRIOE— wall-clock <10s assertion under default retry configtestMutationBlockedFailsFastUnderElevatedRetries— wall-clock <10s assertion underhbase.client.retries.number=16+hbase.client.pause=100testMutationBlockedFailsFastViaInheritanceAlone— crisp empirical proof of inheritance-alone fail-fast under elevated retry configLocal mvn output:
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)