HBASE-30169 Split completed parent region added to RIT list on host RS and master crash#8264
HBASE-30169 Split completed parent region added to RIT list on host RS and master crash#8264Umeshkumar9414 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes an assignment tracking bug where split/merged parent regions could be incorrectly added to the master’s Regions-In-Transition (RIT) list during RegionServer crash handling, especially across master restarts. This improves correctness of assignment state tracking and adds regression coverage for the crash + master restart scenario.
Changes:
- Skip adding split/merged (and replica) regions to RIT during
regionCrashed, and refactor shared guards into helpers. - Remove the broken
AssignmentManager.isRegionInTransition(RegionInfo)API and replace test checks with an encoded-name-based lookup helper. - Add a regression test that splits a region, expires the hosting RS, restarts the master, and verifies the split parent is not in RIT.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java | Updates tests to use the new encoded-name-based RIT check utility. |
| hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionSplit.java | Adds regression test for split-parent not appearing in RIT after RS+master crash/restart; modernizes a few assertions. |
| hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/AssignmentTestingUtil.java | Introduces encoded-name-based isRegionInTransition helper and updates existing wait helper to use it. |
| hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionInTransitionTracker.java | Adds split/merged guard in regionCrashed, extracts helper predicates, and removes the public containsKey-based RIT query method. |
| hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java | Removes the now-deleted isRegionInTransition(RegionInfo) API. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5bded49 to
a6fa265
Compare
|
Failing test is related to compaction. Unrelated to PR. Can look seperatly later. |
| * {@link AssignmentManager#markRegionAsSplit(RegionInfo, ServerName, RegionInfo, RegionInfo)} for | ||
| * more details on why we need to test two conditions. | ||
| */ | ||
| static boolean isSplitOrMerged(RegionStateNode regionStateNode) { |
There was a problem hiding this comment.
An AI assisted review finds:
This check isSplitOrMerged has been moved out of regionCrashed , where it formerly guarded calls to both visitRegionState and markRegionsAsCrashed. After this refactor, now only visitRegionState is guarded. A split parent in SPLIT state whose location is still equal to the crashed server (the parent stays in regionStates in this state until the catalog janitor cleans it) will now be processed in regionCrashed when this didn't used to be possible.
Recommendation:
Add & !AssignmentManagerUtil.isSplitOrMerged(node) to the if (crashedServerName.equals(node.getRegionLocation())) branch in markRegionsAsCrashed, just like the same check guard visitRegionState.
with encoded-name-based lookup in test utility.