HBASE-30142 Resolve NPE while running recoverUnknown command because null RegionLocation#8192
HBASE-30142 Resolve NPE while running recoverUnknown command because null RegionLocation#8192Umeshkumar9414 wants to merge 3 commits into
Conversation
wchevreuil
left a comment
There was a problem hiding this comment.
Makes sense. Should we add UT for this?
6e21a3a to
314f55e
Compare
@wchevreuil Added an test. Let me know if it looks good. |
There was a problem hiding this comment.
Pull request overview
Fixes HBASE-30142 by ensuring recoverUnknown/scheduleSCPsForUnknownServers does not treat regions with a null ServerName (i.e., null region location during transitions/failed-open handling) as “unknown servers,” avoiding spurious SCP scheduling and potential NPEs.
Changes:
- Update
ServerManager#isServerUnknownto returnfalsefornullserver names. - Add a new master-level mini-cluster test that drives a region into
ABNORMALLY_CLOSEDwith anulllocation and verifiesscheduleSCPsForUnknownServersschedules no SCPs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java |
Adjusts “unknown server” classification to ignore null server names and updates Javadoc. |
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRecoverUnknownWithNullRegionLocation.java |
Adds regression coverage for recoverUnknown when a region’s location is null but state is non-OFFLINE. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * any more, for example, a very old previous instance). A null serverName should not be | ||
| * considered unknown. We set regionLocation null before finding the assign candidate (in-between | ||
| * region transition) or while marking it OFFLINE/FAILED_OPEN For more info regarding null-check | ||
| * in this refer HBASE-30142 |
| assertTrue(node.isInState(RegionState.State.ABNORMALLY_CLOSED), | ||
| "regionClosedAbnormally must move state to ABNORMALLY_CLOSED"); | ||
| assertNull(node.getRegionLocation(), "regionClosedAbnormally must null out the location"); | ||
|
|
||
| MasterProtos.ScheduleSCPsForUnknownServersResponse response = | ||
| master.getMasterRpcServices().scheduleSCPsForUnknownServers(null, | ||
| MasterProtos.ScheduleSCPsForUnknownServersRequest.newBuilder().build()); |
There was a problem hiding this comment.
This is a good suggestion.
314f55e to
4488d62
Compare
apurtell
left a comment
There was a problem hiding this comment.
Test classification needs a change, otherwise seems ok
| * {@code shouldSubmitSCP(null)} then dereferenced the null and crashed. | ||
| */ | ||
| @Tag(MasterTests.TAG) | ||
| @Tag(SmallTests.TAG) |
There was a problem hiding this comment.
I think this needs to be @Tag(MediumTests.TAG) because it runs a minicluster.
MasterRpcServices.scheduleSCPsForUnknownServers() has no null guard before calling isServerUnknown(). With the old code, regions with null serverName (OFFLINE/FAILED_OPEN in-between state) would spuriously trigger a ServerCrashProcedure. The fix prevents that.