Skip to content

PHOENIX-7871 :- Fail-fast batched mutation block via DNRIOE inheritance#2491

Open
lokiore wants to merge 1 commit into
apache:PHOENIX-7562-feature-newfrom
lokiore:PHOENIX-7871-dnrioe-fail-fast
Open

PHOENIX-7871 :- Fail-fast batched mutation block via DNRIOE inheritance#2491
lokiore wants to merge 1 commit into
apache:PHOENIX-7562-feature-newfrom
lokiore:PHOENIX-7871-dnrioe-fail-fast

Conversation

@lokiore
Copy link
Copy Markdown
Contributor

@lokiore lokiore commented May 28, 2026

What changes were proposed in this pull request?

The two HA failover signaling exception classes — MutationBlockedIOException and StaleClusterRoleRecordException — now extend org.apache.hadoop.hbase.DoNotRetryIOException instead of java.io.IOException. This is sufficient to deliver fail-fast on the batched mutation path; no Phoenix-side intercept is needed.

JIRA: https://issues.apache.org/jira/browse/PHOENIX-7871

Why are the changes needed?

Pre-fix, the HA failover exceptions extended IOException. When the server-side mutation-block gate fired, HBase's RPC retry layers had no signal to fail fast — the failure was wrapped in RetriesExhaustedWithDetailsException only after the client retry budget was exhausted (default 16 retries × per-retry timeout = tens of seconds). For mutations issued during a brief mutation-block window or against a now-STANDBY cluster, this manifested as multi-second tail latencies that should have been single-RTT failures.

With the inheritance change, server-side rehydration (via ProtobufUtil.toException) delivers a real MutationBlockedIOException instance to HBase's batched-RPC retry layer (AsyncRequestFutureImpl.manageError); the instanceof DoNotRetryIOException check at line 749 returns true post-inheritance, so no retries fire and the failure surfaces immediately.

An earlier draft of this PR included a Phoenix-side intercept in MutationState.send (a findHaFailoverCauseInChain helper + a config flag phoenix.ha.failfast.failover.exceptions.enabled). That intercept was empirically redundant — it never fired in practice because HBase already short-circuits via the inheritance path. This is the slimmed-down version: inheritance-only.

Does this PR introduce any user-facing change?

No

The exception-inheritance change is internal — both exception classes still satisfy instanceof IOException, so any existing catch (IOException) site continues to match. The user-visible behavior change is reduced tail latency on the failover path, which is a strict improvement.

How was this patch tested?

New unit tests (HaFailoverExceptionInheritanceTest, 3 tests, all PASS) — verifies the inheritance change:

  • mutationBlockedExtendsDoNotRetryIOException
  • staleClusterRoleRecordExtendsDoNotRetryIOException
  • staleClusterRoleRecordTwoArgConstructorPreserved

IT (IndexRegionObserverMutationBlockingIT, 7 tests, all PASS):

  • 4 baseline mutation-blocking tests (data table with index, allowed-when-not-blocked, transition, system HA group exemption)
  • testMutationBlockedFailsFastWithDNRIOE — wall-clock <10s assertion under default retry config
  • testMutationBlockedFailsFastUnderElevatedRetries — wall-clock <10s assertion under hbase.client.retries.number=16 + hbase.client.pause=100
  • testMutationBlockedFailsFastViaInheritanceAlone — crisp empirical proof of inheritance-alone fail-fast under elevated retry config

Local mvn output:

[INFO] --- failsafe:3.2.2:integration-test (default-cli) @ phoenix-core ---
[INFO] Running org.apache.phoenix.end2end.index.IndexRegionObserverMutationBlockingIT
[INFO] Running org.apache.phoenix.exception.HaFailoverExceptionInheritanceTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.034 s
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 107.3 s
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0
[INFO] BUILD SUCCESS
[INFO] Total time:  01:55 min

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

@lokiore lokiore force-pushed the PHOENIX-7871-dnrioe-fail-fast branch from 690919c to 51f950b Compare May 29, 2026 21:11
@lokiore lokiore changed the title PHOENIX-7871 :- Fail-fast batched mutation block via DNRIOE inheritance + Phoenix-side intercept PHOENIX-7871 :- Fail-fast batched mutation block via DNRIOE inheritance May 29, 2026
MutationBlockedIOException and StaleClusterRoleRecordException now extend
DoNotRetryIOException (instead of IOException) so HBase's RPC retry layers
fail-fast on the first hit instead of consuming the per-call retry budget.

Empirical verification (debugger + testMutationBlockedFailsFastWithDNRIOE
passing with the Phoenix-side intercept disabled via config flag) confirms
that ProtobufUtil.toException rehydrates the wire payload as a real
MutationBlockedIOException instance, so AsyncRequestFutureImpl.manageError:749
instanceof DoNotRetryIOException returns true post-inheritance and HBase's
batched-RPC retry layer fails fast on the first hit. No Phoenix-side
intercept needed.

The IT testMutationBlockedFailsFastViaInheritanceAlone verifies the
inheritance-alone path under elevated retry settings: assertion is
timing-bound (<10s) which would fail if HBase started retrying despite
DNRIOE.

Generated-by: Claude Code (Opus 4.7)
@lokiore lokiore force-pushed the PHOENIX-7871-dnrioe-fail-fast branch from 51f950b to 0620c86 Compare May 29, 2026 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants