forked from zstackio/zstack
-
Notifications
You must be signed in to change notification settings - Fork 0
[ZSTAC-83890][5.4.8] HA pre-fence leftover qemu on suspect host via sibling #3899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MatheMatrix
wants to merge
2
commits into
5.4.8
Choose a base branch
from
sync/yingzhe.hu/fix/ZSTAC-83890@@3
base: 5.4.8
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
50 changes: 50 additions & 0 deletions
50
header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| package org.zstack.header.vm; | ||
|
|
||
| import org.zstack.header.host.HostMessage; | ||
| import org.zstack.header.message.NeedReplyMessage; | ||
|
|
||
| /** | ||
| * ZSTAC-83890 / TIC-5513 | ||
| * | ||
| * Sent from {@code VmInstanceBase.handle(HaStartVmInstanceMsg)} during HA pre-fence. Routed to a | ||
| * vetted Connected KVM sibling host's KVMHost service (the "peer"). The peer is picked at the | ||
| * management node from {@code siblingHostUuids} (hinted by HA decision) so that this message never | ||
| * has to be delivered to the suspect host whose service is by definition unreliable / Disconnected. | ||
| * | ||
| * The peer's {@code KVMHost.handle(FenceVmOnHostMsg)} loads the suspect host's SSH credentials from | ||
| * its KVMHostVO and asks its local kvmagent to SSH the suspect and force-destroy any leftover qemu | ||
| * of {@code vmUuid}, so the new VM start during HA is safe against split-brain even if Ceph | ||
| * watchers were transiently emptied by an OSD watch_ping timeout. | ||
| */ | ||
| public class FenceVmOnHostMsg extends NeedReplyMessage implements HostMessage { | ||
| /** Routing target: the peer (sibling) KVM host that will execute the fence. */ | ||
| private String hostUuid; | ||
| /** The suspect host whose qemu must be killed; supplied by the management node. */ | ||
| private String suspectHostUuid; | ||
| private String vmUuid; | ||
|
|
||
| @Override | ||
| public String getHostUuid() { | ||
| return hostUuid; | ||
| } | ||
|
|
||
| public void setHostUuid(String hostUuid) { | ||
| this.hostUuid = hostUuid; | ||
| } | ||
|
|
||
| public String getSuspectHostUuid() { | ||
| return suspectHostUuid; | ||
| } | ||
|
|
||
| public void setSuspectHostUuid(String suspectHostUuid) { | ||
| this.suspectHostUuid = suspectHostUuid; | ||
| } | ||
|
|
||
| public String getVmUuid() { | ||
| return vmUuid; | ||
| } | ||
|
|
||
| public void setVmUuid(String vmUuid) { | ||
| this.vmUuid = vmUuid; | ||
| } | ||
| } |
14 changes: 14 additions & 0 deletions
14
header/src/main/java/org/zstack/header/vm/FenceVmOnHostReply.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| package org.zstack.header.vm; | ||
|
|
||
| import org.zstack.header.message.MessageReply; | ||
|
|
||
| /** | ||
| * ZSTAC-83890 - reply to {@link FenceVmOnHostMsg}. | ||
| * | ||
| * If a sibling kvmagent confirmed the suspect qemu is gone (or could not even reach the suspect | ||
| * host), the reply is a plain success and HA-start is allowed to proceed. If the sibling reported | ||
| * that qemu is still alive on the suspect host, the reply is a failure with a descriptive error, | ||
| * and HA-start is refused to prevent split-brain. | ||
| */ | ||
| public class FenceVmOnHostReply extends MessageReply { | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要在真正启动前先把 VM 落库成
Stopped。这里先改库,再调用
startVm();而startVm()一进来会refreshVO(),随后把已经被改写后的状态当作自己的originState。这样一来,只要后面的分配、pre-start 或 start 失败,回滚就只会回到Stopped/hostUuid = null,而不是 HA 进入前的原始状态。这个改动会直接改写 5.4.8 的 HA 失败语义,而且还绕过了changeVmStateInDb(...)的状态事件/扩展点回调。建议把原始
state/hostUuid/lastHostUuid存进 flow data 并在失败时恢复,或者把这次落库延后到 HA start 真正成功之后。As per coding guidelines: “向后兼容原则……不应直接改动已有行为……开关控制等”.
🤖 Prompt for AI Agents