Skip to content

[ZSTAC-83890][5.4.8] HA pre-fence leftover qemu on suspect host via sibling#3899

Open
MatheMatrix wants to merge 1 commit into5.4.8from
sync/yingzhe.hu/fix/ZSTAC-83890@@3
Open

[ZSTAC-83890][5.4.8] HA pre-fence leftover qemu on suspect host via sibling#3899
MatheMatrix wants to merge 1 commit into5.4.8from
sync/yingzhe.hu/fix/ZSTAC-83890@@3

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Background

TIC-5513: Ceph OSD osd_client_watch_timeout (default 30s) evicts the watcher when watch_ping is delayed (e.g. source host OOM / storage net blocked). rbd status briefly returns Watchers: none even though qemu is still alive. HA's second watcher check hits this transient empty window -> VM starts on a new host -> split-brain after the source host recovers.

Fix

Add HaPreFenceVmExtensionPoint, called by VmInstanceBase.handle(HaStartVmInstanceMsg) before BeforeHaStartVmInstanceExtensionPoint. KVM impl KvmHaPreFenceVmExtension:

  1. Read vm.hostUuid (HA flow has not yet nulled it - this is the suspect host).
  2. Pick a sibling KVM host. Prefer HaStartVmInstanceMsg.siblingHostUuids (HA decision already vetted them via HaKvmHostSiblingChecker); fallback to fresh Connected+Enabled+KVM same-cluster query.
  3. Send KVMHostAsyncHttpCallMsg with FenceVmFromPeerCmd to that sibling.
  4. Sibling kvmagent SSHes the suspect host: virsh destroy + pkill -9 -f qemu.*vmUuid + pgrep verify.
  5. Verdict:
    • qemuConfirmedDead -> HA proceeds
    • targetHostUnreachable -> HA proceeds (host truly down)
    • qemuStillAlive -> HA refused (refuse to split-brain)
    • no sibling available -> HA refused (fail-safe)

Companion MRs

  • premium: yingzhe.hu/premium fix/ZSTAC-83890@@3 -> 5.4.8 (sibling propagation chain + integration test HaPreFenceVmFromPeerCase)
  • zstack-utility: yingzhe.hu/zstack-utility fix/ZSTAC-83890@@3 -> 5.4.8 (agent endpoint /ha/fence/vm/from/peer)

Refs

Resolves: ZSTAC-83890
Refs: TIC-5513

sync from gitlab !9783

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

在 HA 启动路径中添加 pre-fence:新增消息/agent 命令,KVM 主机实现 peer-SSH fencing,并在 VmInstanceBase 的 HA 启动 flow-chain 中插入 pre-fence 步骤、扩展聚合、状态更新与启动步骤,记录并上报调度结果。

变更说明

HA 启动前主机隔离

层级 / 文件 概述
消息和Agent命令数据契约
header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java, header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java, header/src/main/java/org/zstack/header/vm/FenceVmOnHostReply.java, plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerCmd.java, plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java, simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java
HaStartVmInstanceMsg 添加 siblingHostUuids;新增 FenceVmOnHostMsg / FenceVmOnHostReply;新增 agent 命令 FenceVmFromPeerCmd 与响应 FenceVmFromPeerRsp,含 qemu 生死与可达性字段;模拟器新增对应端点返回 qemu 已死 的模拟响应。
KVM 隔离处理实现
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
KVMHost 新增对 FenceVmOnHostMsg 的 dispatch 与 handle(FenceVmOnHostMsg) 实现:通过 pickFencePeerHost 选 peer,并向 peer 发送 FenceVmFromPeerCmd(经 KVMHostAsyncHttpCallMsg),解析 FenceVmFromPeerRsp 判定是否可继续 HA 启动(含 split-brain 防护)。
HA 启动流程集成
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
将 HA 启动重构为 FlowChain:在 whetherStartVm 后创建并开始 VmSchedHistoryRecorder;新增 ha-pre-fence-vm 步骤条件发送 FenceVmOnHostMsg;聚合执行 BeforeHaStartVmInstanceExtensionPoint;标记 VM 为 Stopped 并保存 lastHostUuid(如适用);调用 startVm(...) 并在链终端根据成功或失败结束 recorder 并回复。

Sequence Diagram(s)

sequenceDiagram
  participant HA as HA Starter
  participant Target as Target KVMHost
  participant Peer as Peer KVMHost
  participant Agent as Peer Agent
  HA->>Target: FenceVmOnHostMsg(hostUuid, vmUuid, siblings)
  Target->>Target: pickFencePeerHost(hintedSiblings, cluster, exclude)
  alt no peer
    Target->>HA: FenceVmOnHostReply(error)
  else peer found
    Target->>Peer: KVMHostAsyncHttpCallMsg(FenceVmFromPeerCmd)
    Peer->>Agent: FenceVmFromPeerCmd
    Agent->>Agent: SSH fence suspect host, check qemu
    Agent->>Peer: FenceVmFromPeerRsp(qemuConfirmedDead/qemuStillAlive/targetUnreachable)
    Peer->>Target: response
    alt qemuConfirmedDead or targetUnreachable
      Target->>HA: FenceVmOnHostReply(success)
    else qemuStillAlive or ambiguous
      Target->>HA: FenceVmOnHostReply(failure)
    end
  end
Loading
sequenceDiagram
  participant Msg as HaStartVmInstanceMsg
  participant Base as VmInstanceBase
  participant Recorder as VmSchedHistoryRecorder
  participant Fence as Pre-Fence Step
  participant Ext as BeforeHaStart Extensions
  participant DB as VM DB
  participant Starter as startVm()
  Msg->>Base: 处理 HA 启动请求
  Base->>Base: whetherStartVm()
  Base->>Recorder: recorder.begin()
  Base->>Fence: ha-pre-fence-vm (FenceVmOnHostMsg)
  alt pre-fence 失败
    Fence->>Base: fail
    Base->>Recorder: recorder.end(error)
    Base->>Msg: reply error
  else pre-fence 成功
    Fence->>Base: continue
    Base->>Ext: 执行扩展(聚合错误)
    alt 扩展失败
      Ext->>Base: fail
      Base->>Recorder: recorder.end(error)
      Base->>Msg: reply error
    else 扩展成功
      Ext->>Base: continue
      Base->>DB: 标记 VM 为 Stopped, 清 hostUuid, 设置 lastHostUuid
      Base->>Starter: startVm(...)
      alt 启动成功
        Starter->>Base: VM started
        Base->>Recorder: recorder.end(startedHostUuid)
        Base->>Msg: reply success (inventory)
      else 启动失败
        Starter->>Base: error
        Base->>Recorder: recorder.end(error)
        Base->>Msg: reply error
      end
    end
  end
Loading

🎯 4 (复杂) | ⏱️ ~45 分钟

🐰 我是小兔,蹦进代码园,
前置隔离声声唤,同行主机来相援。
SSH 清理残留 qemu,心中无惧防分裂。
流程连成链条走,起停有据写成篇。
🐇✨


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error PR标题不符合指定格式要求(必须为[scope]: ),且超过72字符限制(共75字符)。 请重新格式化标题为[scope]: 格式并确保不超过72字符,例如:'fix[ha]: Pre-fence leftover qemu on suspect host'
Docstring Coverage ⚠️ Warning Docstring coverage is 2.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed PR描述详细说明了背景(Ceph OSD watch timeout问题)、修复方案(HaPreFenceVmExtensionPoint)及具体实现步骤,与代码变更相关性强。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/yingzhe.hu/fix/ZSTAC-83890@@3

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/ha/KvmHaPreFenceVmExtension.java (1)

151-151: 💤 Low value

注释中存在拼写错误

"Silbling" 应为 "Sibling"。根据代码规范,注释应使用正确的英文。

📝 建议修复
-        // Fresh fallback - mirrors HaKvmHostSiblingChecker.findSilblingFromSameCluster.
+        // Fresh fallback - mirrors HaKvmHostSiblingChecker.findSiblingFromSameCluster.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/kvm/src/main/java/org/zstack/kvm/ha/KvmHaPreFenceVmExtension.java` at
line 151, The comment in KvmHaPreFenceVmExtension that reads "Silbling" is
misspelled; update the comment to use "Sibling" and, if it references
HaKvmHostSiblingChecker.findSilblingFromSameCluster, correct the referenced
method name in the comment to HaKvmHostSiblingChecker.findSiblingFromSameCluster
so the comment matches proper English and the actual identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 980-1028: preFenceHaStartVm currently treats any
HaPreFenceVmExtensionPoint failure as a hard rejection for all HA-starts; to
preserve backward compatibility, gate this strict behavior behind a
capability/global config or enable it only when the request explicitly indicates
peer-fence context (e.g. msg.getSiblingHostUuids() is non-empty) and the
environment reports peer-fence support via pluginRgty/Capability check; update
preFenceHaStartVm to (1) consult a new GlobalConfig or capability flag (or a
PluginRegistry capability query) before invoking/acting on
HaPreFenceVmExtensionPoint results, (2) fall back to next.run() when the gate is
disabled or siblingHostUuids is null/empty, and (3) only treat extension
failures as fatal when the gate is enabled and peer-fence capability is
confirmed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java`:
- Around line 10-17: The three primitive boolean fields in FenceVmFromPeerRsp
(qemuConfirmedDead, qemuStillAlive, targetHostUnreachable) cannot express
"absent" from older agents; change their types to boxed Boolean to allow null,
update constructors/serialization metadata if needed, and then update the
consumer logic in KvmHaPreFenceVmExtension (the decision branch around
qemuStillAlive and related checks) to treat null as "absent" and enforce a
strict rule: accept only when exactly one of the three Boolean fields is
Boolean.TRUE, otherwise follow the safe/reject branch. Ensure any usages assume
possible nulls and handle NPEs accordingly.

---

Nitpick comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/ha/KvmHaPreFenceVmExtension.java`:
- Line 151: The comment in KvmHaPreFenceVmExtension that reads "Silbling" is
misspelled; update the comment to use "Sibling" and, if it references
HaKvmHostSiblingChecker.findSilblingFromSameCluster, correct the referenced
method name in the comment to HaKvmHostSiblingChecker.findSiblingFromSameCluster
so the comment matches proper English and the actual identifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: a23821f8-76e1-41b8-b17f-b013056adf9d

📥 Commits

Reviewing files that changed from the base of the PR and between 6de8314 and 7395a66.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/Kvm.xml is excluded by !**/*.xml
📒 Files selected for processing (6)
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • header/src/main/java/org/zstack/header/vm/HaPreFenceVmExtensionPoint.java
  • header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerCmd.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/KvmHaPreFenceVmExtension.java

Comment thread compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java Outdated
Comment thread plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java Outdated
@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890@@3 branch from 7395a66 to c7d5bb3 Compare May 8, 2026 09:01
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java (1)

21-45: ⚡ Quick win

建议避免 siblingHostUuids 可变引用外泄。

当前 setSiblingHostUuids()getSiblingHostUuids() 直接传递内部 List 引用;调用方在发送消息后若继续修改原列表,消息内容会被“隐式改写”。建议在 Line 44 做 defensive copy,并在 Line 40 返回不可变列表(同时给空默认值,减少下游空判)。

可参考的修改
+import java.util.ArrayList;
+import java.util.Collections;
 import java.util.List;
@@
-    private List<String> siblingHostUuids;
+    private List<String> siblingHostUuids = Collections.emptyList();
@@
     public List<String> getSiblingHostUuids() {
-        return siblingHostUuids;
+        return siblingHostUuids;
     }
@@
     public void setSiblingHostUuids(List<String> siblingHostUuids) {
-        this.siblingHostUuids = siblingHostUuids;
+        this.siblingHostUuids = siblingHostUuids == null
+                ? Collections.emptyList()
+                : Collections.unmodifiableList(new ArrayList<>(siblingHostUuids));
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java` around lines
21 - 45, Avoid exposing a mutable reference for siblingHostUuids: in
setSiblingHostUuids(List<String>) make a defensive copy of the incoming list
before assigning to the siblingHostUuids field (handle null by assigning an
empty list), and in getSiblingHostUuids() return an unmodifiable/immutable view
(or a new copy) instead of the internal list so callers cannot mutate the
internal state; update references to siblingHostUuids, setSiblingHostUuids, and
getSiblingHostUuids accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 982-996: The code in VmInstanceBase unconditionally sends
FenceVmOnHostMsg when self.getHostUuid() != null, but only KVMHost implements
handling; modify the sending logic in the block around FenceVmOnHostMsg so you
first check the host's hypervisor/capability (e.g. via HostVO/host type or a
capability check method) and only construct/send FenceVmOnHostMsg for KVM hosts
(or hosts that advertise the peer-fence capability); for other hypervisors, skip
the fence step and call trigger.next() (or short-circuit a successful reply) to
preserve existing HA behavior and avoid sending FenceVmOnHostMsg to unsupported
hosts.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 1002-1042: The callback in KVMHost that builds and sends
FenceVmFromPeerCmd must not transmit host SSH credentials or raw remote
stdout/stderr; remove targetHostUsername/targetHostPassword (and
targetHostSshPort if not required) from FenceVmFromPeerCmd and instead pass a
credential reference or one‑time token (e.g., a credentialId or ephemeral key)
and ensure the peer agent uses server-side managed credentials; also sanitize
the error detail logic in the bus.send callback (where FenceVmFromPeerRsp is
inspected) so reply.setError never contains rsp.stdout/rsp.stderr or raw
rsp.getError()—replace them with a short redacted summary or hashed fingerprint
and log full outputs only to a secure local audit with restricted access.

---

Nitpick comments:
In `@header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java`:
- Around line 21-45: Avoid exposing a mutable reference for siblingHostUuids: in
setSiblingHostUuids(List<String>) make a defensive copy of the incoming list
before assigning to the siblingHostUuids field (handle null by assigning an
empty list), and in getSiblingHostUuids() return an unmodifiable/immutable view
(or a new copy) instead of the internal list so callers cannot mutate the
internal state; update references to siblingHostUuids, setSiblingHostUuids, and
getSiblingHostUuids accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 1e80c1a9-73e4-4f26-9d44-ab1b63bfc09b

📥 Commits

Reviewing files that changed from the base of the PR and between 7395a66 and c7d5bb3.

📒 Files selected for processing (7)
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostReply.java
  • header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerCmd.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java
✅ Files skipped from review due to trivial changes (2)
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostReply.java
  • header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerCmd.java

Comment thread compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
Comment on lines +1002 to +1042
FenceVmFromPeerCmd cmd = new FenceVmFromPeerCmd();
cmd.vmUuid = msg.getVmUuid();
cmd.targetHostUuid = suspectHostUuid;
cmd.targetHostIp = self.getManagementIp();
cmd.targetHostUsername = getSelf().getUsername();
cmd.targetHostPassword = getSelf().getPassword();
cmd.targetHostSshPort = getSelf().getPort() != null ? getSelf().getPort() : 22;
cmd.sshTimeoutSec = 20;

KVMHostAsyncHttpCallMsg fmsg = new KVMHostAsyncHttpCallMsg();
fmsg.setHostUuid(peerHostUuid);
fmsg.setPath(FenceVmFromPeerCmd.FENCE_VM_FROM_PEER_PATH);
fmsg.setCommand(cmd);
bus.makeTargetServiceIdByResourceUuid(fmsg, HostConstant.SERVICE_ID, peerHostUuid);

logger.info(String.format("[HA pre-fence] vm[%s] ask sibling[%s] to fence suspect host[%s ip=%s]",
msg.getVmUuid(), peerHostUuid, suspectHostUuid, self.getManagementIp()));

bus.send(fmsg, new CloudBusCallBack(msg) {
@Override
public void run(MessageReply r) {
if (!r.isSuccess()) {
reply.setError(operr("HA-start vm[%s]: failed to ask sibling[%s] to fence suspect host[%s]: %s. " +
"Refuse to start to prevent split-brain.",
msg.getVmUuid(), peerHostUuid, suspectHostUuid, r.getError().getDetails()));
bus.reply(msg, reply);
return;
}
FenceVmFromPeerRsp rsp = ((KVMHostAsyncHttpCallReply) r).toResponse(FenceVmFromPeerRsp.class);
if (!rsp.isSuccess() && !rsp.qemuStillAlive) {
reply.setError(operr("HA-start vm[%s]: sibling[%s] failed to fence suspect host[%s]: %s. " +
"Refuse to start to prevent split-brain.",
msg.getVmUuid(), peerHostUuid, suspectHostUuid, rsp.getError()));
bus.reply(msg, reply);
return;
}
if (rsp.qemuStillAlive) {
reply.setError(operr("HA-start vm[%s]: qemu is still alive on suspect host[%s ip=%s] after sibling[%s] " +
"force-destroy attempt. Refuse to start to prevent split-brain. stdout=[%s] stderr=[%s]",
msg.getVmUuid(), suspectHostUuid, self.getManagementIp(), peerHostUuid,
rsp.stdout, rsp.stderr));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

不要把宿主机 SSH 凭据和远端原始输出透传到 sibling fencing 链路

这里把 suspect host 的 username/password 放进 FenceVmFromPeerCmd,随后又把 rsp.stdout/stderr 直接拼进 error detail。这样会把宿主机凭据和远端脚本原始输出暴露给 CloudBus、kvmagent HTTP 以及异常/审计日志链路,明显扩大敏感信息落盘面。建议改成管理节点代执行,或只下发一次性凭据/私钥引用,并在返回给上层的错误里仅保留脱敏摘要。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 1002 -
1042, The callback in KVMHost that builds and sends FenceVmFromPeerCmd must not
transmit host SSH credentials or raw remote stdout/stderr; remove
targetHostUsername/targetHostPassword (and targetHostSshPort if not required)
from FenceVmFromPeerCmd and instead pass a credential reference or one‑time
token (e.g., a credentialId or ephemeral key) and ensure the peer agent uses
server-side managed credentials; also sanitize the error detail logic in the
bus.send callback (where FenceVmFromPeerRsp is inspected) so reply.setError
never contains rsp.stdout/rsp.stderr or raw rsp.getError()—replace them with a
short redacted summary or hashed fingerprint and log full outputs only to a
secure local audit with restricted access.

@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890@@3 branch from c7d5bb3 to 7b832c5 Compare May 8, 2026 10:49
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java`:
- Around line 759-767: The fenceVmFromPeer handler currently skips deserializing
the request and storing the command, making it hard to unit-test; modify the
method fenceVmFromPeer to deserialize the request body into a FenceVmFromPeerCmd
using JSONObjectUtil.toObject(entity.getBody(), FenceVmFromPeerCmd.class),
ensure KVMSimulatorConfig has a public List<FenceVmFromPeerCmd>
fenceVmFromPeerCmds field, add the deserialized cmd to
config.fenceVmFromPeerCmds, then construct FenceVmFromPeerRsp as before (set
qemuConfirmedDead and success) and call replyer.reply(entity, rsp); keep the
rest of the reply behavior unchanged so tests can assert the stored command.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 1bcd0710-b547-43c3-9a4b-c49692dcd006

📥 Commits

Reviewing files that changed from the base of the PR and between 7b832c5 and 150fc60.

📒 Files selected for processing (1)
  • simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java

Comment on lines +759 to +767
@RequestMapping(value = FenceVmFromPeerCmd.FENCE_VM_FROM_PEER_PATH, method = RequestMethod.POST)
public @ResponseBody String fenceVmFromPeer(HttpServletRequest req) {
HttpEntity<String> entity = restf.httpServletRequestToHttpEntity(req);
FenceVmFromPeerRsp rsp = new FenceVmFromPeerRsp();
rsp.qemuConfirmedDead = true;
rsp.setSuccess(true);
replyer.reply(entity, rsp);
return null;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

建议遵循模拟器处理器的既定模式以提升可测试性

当前实现与本文件中其他模拟器处理器不一致,缺少以下标准模式:

  1. 未反序列化命令:其他处理器(如 hardenVmConsolecheckVmState)都会通过 JSONObjectUtil.toObject 反序列化请求体,以便单元测试可以验证请求参数。
  2. 未存储命令到 config:其他处理器将命令存储到 config 中(例如 config.hardenVmConsoleCmds.add(cmd)),使得测试可以断言该端点被正确调用。

这会降低该端点的可测试性。

♻️ 建议的重构方案

首先,在 KVMSimulatorConfig.java 中添加存储字段(如果尚未添加):

public List<FenceVmFromPeerCmd> fenceVmFromPeerCmds = new ArrayList<>();

然后修改处理器:

 `@RequestMapping`(value = FenceVmFromPeerCmd.FENCE_VM_FROM_PEER_PATH, method = RequestMethod.POST)
 public `@ResponseBody` String fenceVmFromPeer(HttpServletRequest req) {
     HttpEntity<String> entity = restf.httpServletRequestToHttpEntity(req);
+    FenceVmFromPeerCmd cmd = JSONObjectUtil.toObject(entity.getBody(), FenceVmFromPeerCmd.class);
+    config.fenceVmFromPeerCmds.add(cmd);
     FenceVmFromPeerRsp rsp = new FenceVmFromPeerRsp();
     rsp.qemuConfirmedDead = true;
     rsp.setSuccess(true);
     replyer.reply(entity, rsp);
     return null;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java`
around lines 759 - 767, The fenceVmFromPeer handler currently skips
deserializing the request and storing the command, making it hard to unit-test;
modify the method fenceVmFromPeer to deserialize the request body into a
FenceVmFromPeerCmd using JSONObjectUtil.toObject(entity.getBody(),
FenceVmFromPeerCmd.class), ensure KVMSimulatorConfig has a public
List<FenceVmFromPeerCmd> fenceVmFromPeerCmds field, add the deserialized cmd to
config.fenceVmFromPeerCmds, then construct FenceVmFromPeerRsp as before (set
qemuConfirmedDead and success) and call replyer.reply(entity, rsp); keep the
rest of the reply behavior unchanged so tests can assert the stored command.

@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890@@3 branch 2 times, most recently from d06a0cd to cb6369c Compare May 8, 2026 13:52
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 1051-1079: The change persists VmInstanceVO state/hostUuid to
Stopped before calling startVm(), which causes startVm()'s refreshVO() to treat
the modified state as originState and breaks HA-failure semantics and
changeVmStateInDb callbacks; fix by reverting the
"mark-vm-stopped-and-save-last-host" flow: instead of updating the DB before
start, capture and store the original VmInstanceVO fields (state, hostUuid,
lastHostUuid) into the flow data map (inside the first NoRollbackFlow) and
either (a) perform the UpdateQuery that sets state=Stopped/hostUuid=null only
after startVm() success (move the DB update into the success() callback of
startVm) or (b) if you must keep the DB update early, ensure failures restore
the original values by invoking changeVmStateInDb or executing an UpdateQuery
using the saved originals in the failure path (trigger.fail/trigger.rollback).
Reference: the NoRollbackFlow named "mark-vm-stopped-and-save-last-host",
startVm(...), changeVmStateInDb(...), and VmInstanceVO/VmInstanceVO_.uuid to
locate the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 7939f7e6-3080-4a7d-879f-6b8807cda191

📥 Commits

Reviewing files that changed from the base of the PR and between 150fc60 and cb6369c.

📒 Files selected for processing (8)
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostReply.java
  • header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerCmd.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java
  • simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java
🚧 Files skipped from review as they are similar to previous changes (5)
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerRsp.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostReply.java
  • plugin/kvm/src/main/java/org/zstack/kvm/ha/FenceVmFromPeerCmd.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java

Comment on lines +1051 to +1079
fchain.then(new NoRollbackFlow() {
String __name__ = "mark-vm-stopped-and-save-last-host";

@Override
public void run(FlowTrigger trigger, Map data) {
logger.debug(String.format("HaStartVmJudger[%s] says the VM[uuid:%s, name:%s] is qualified for HA start, now we are starting it",
judger.getClass(), self.getUuid(), self.getName()));
UpdateQuery sql = SQL.New(VmInstanceVO.class)
.eq(VmInstanceVO_.uuid, self.getUuid())
.set(VmInstanceVO_.state, VmInstanceState.Stopped)
.set(VmInstanceVO_.hostUuid, null);

if (self.getHostUuid() != null) {
sql.set(VmInstanceVO_.lastHostUuid, self.getHostUuid());
}

sql.update();
trigger.next();
}
});

startVm(msg, new Completion(msg, chain) {
fchain.then(new NoRollbackFlow() {
String __name__ = "start-ha-vm";

@Override
public void run(FlowTrigger trigger, Map data) {
startVm(msg, new Completion(trigger) {
@Override
public void success() {
reply.setInventory(getSelfInventory());
bus.reply(msg, reply);
recorder.end(reply.getInventory().getHostUuid());
chain.next();
trigger.next();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

不要在真正启动前先把 VM 落库成 Stopped

这里先改库,再调用 startVm();而 startVm() 一进来会 refreshVO(),随后把已经被改写后的状态当作自己的 originState。这样一来,只要后面的分配、pre-start 或 start 失败,回滚就只会回到 Stopped/hostUuid = null,而不是 HA 进入前的原始状态。这个改动会直接改写 5.4.8 的 HA 失败语义,而且还绕过了 changeVmStateInDb(...) 的状态事件/扩展点回调。

建议把原始 state/hostUuid/lastHostUuid 存进 flow data 并在失败时恢复,或者把这次落库延后到 HA start 真正成功之后。

As per coding guidelines: “向后兼容原则……不应直接改动已有行为……开关控制等”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
1051 - 1079, The change persists VmInstanceVO state/hostUuid to Stopped before
calling startVm(), which causes startVm()'s refreshVO() to treat the modified
state as originState and breaks HA-failure semantics and changeVmStateInDb
callbacks; fix by reverting the "mark-vm-stopped-and-save-last-host" flow:
instead of updating the DB before start, capture and store the original
VmInstanceVO fields (state, hostUuid, lastHostUuid) into the flow data map
(inside the first NoRollbackFlow) and either (a) perform the UpdateQuery that
sets state=Stopped/hostUuid=null only after startVm() success (move the DB
update into the success() callback of startVm) or (b) if you must keep the DB
update early, ensure failures restore the original values by invoking
changeVmStateInDb or executing an UpdateQuery using the saved originals in the
failure path (trigger.fail/trigger.rollback). Reference: the NoRollbackFlow
named "mark-vm-stopped-and-save-last-host", startVm(...),
changeVmStateInDb(...), and VmInstanceVO/VmInstanceVO_.uuid to locate the code.

@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890@@3 branch 2 times, most recently from 436a16b to 746506a Compare May 9, 2026 05:57
@zstack-robot-2
Copy link
Copy Markdown
Collaborator

Comment from yaohua.wu:

Review: MR !9783 — ZSTAC-83890

关联 MR

  • premium !13780:HA 决策侧把已 vetted 的 sibling list 透传给 HaStartVmInstanceMsg.siblingHostUuids,并补集成测试 HaPreFenceVmFromPeerCase
  • zstack-utility !7019:kvmagent /ha/fence/vm/from/peer 端点(SSH 嫌疑 host 执行 virsh destroy + pkill -9 + pgrep 校验)

总体评价

本仓承担 ZSTAC-83890 fix 的核心架构:

  • 协议层:FenceVmOnHostMsg/ReplyKVMHost.FenceVmFromPeerCmd/RspKVM_HA_FENCE_VM_FROM_PEER_PATHHaStartVmInstanceMsg.siblingHostUuids
  • 业务层:VmInstanceBase.handle(HaStartVmInstanceMsg) 改造为 4-step FlowChain(ha-pre-fence-vm → before-ha-start-vm-instance → mark-vm-stopped → start-ha-vm)
  • KVM 实现:KVMHost.handle(FenceVmOnHostMsg) + pickFencePeerHost sibling 选取

设计合理:opt-in(仅当 siblingHostUuids 非空才触发)、KVM-only gate(非 KVM 直接 trigger.next)、ambiguous → refuse(保守防脑裂)、敏感字段 @NoLogging、boxed Boolean 处理 gray version、Serializable 标注便于跨节点。前几轮 CodeRabbit 提的核心问题(KVM-only gate、密码 @NoLogging、Serializable)都已修。但仍有几条需要关注。

🔴 Critical(关联仓 zstack-utility !7019)

本仓 KVMHost.handle(FenceVmOnHostMsg) 把 verdict 解析得相当严谨:verdicts != 1 → refuseqemuStillAlive → refusetargetHostUnreachable → 允许 HAqemuConfirmedDead → 允许 HA

但 zstack-utility !7019 的 fence_vm_from_peer 实现有一条 pkill self-match bug:远端命令 pkill -9 -f 'qemu.*<uuid>' 会把自己所在的 bash -c '<remote_cmd>' 父 shell 一起 SIGKILL(因为 bash 的 cmdline 字面包含 qemu.*<uuid>',被 pkill -f 的正则匹配上)。几乎所有真实 SSH 路径都返回 rc=137 → Python 端 targetHostUnreachable=True,绕过了 qemuStillAlive 这条唯一真正防脑裂的分支。

本仓侧没问题、但作为整套 fix 的一部分,utility 那条不修,本仓的 ambiguous-refuse 防护也起不到作用:utility 永远把 qemu D-state(唯一会触发 stillAlive 的真实场景)误报为 unreachable,本仓直接放行 HA → 脑裂。详见 MR !7019 的 Critical 评论。

🟡 Warning

  1. KVMHost.java 第 ~1006 行 cmd.sshTimeoutSec = 20:硬编码。建议挂 GlobalConfig(如 HaGlobalConfig.HA_PRE_FENCE_SSH_TIMEOUT,默认 20s),便于运维侧调。20s 在 rbd 卡 IO 时偏紧(详见 utility !7019 Warning <fix>[kvm]: filter out stopped VM when invoke getVirtualizerInfo #5)。

  2. pickFencePeerHost fallback 路径 Collections.shuffle(candidates) 后取第一个:非确定性、没有重试。如果第一个 sibling 自身网络抖动导致 fence cmd 投递失败,本仓直接走 transport-error 分支 → refuse → HA 失败。当前实现要等下一轮 host check 周期才会重选 sibling,会拉长 RTO。建议在投递失败(transport error,非 stillAlive)时至少试一次列表里的下一个 sibling。

  3. VmInstanceBase.handle(HaStartVmInstanceMsg) 第 ~952 行 final HaStartVmJudger judger:改 final 是因为下游 lambda 引用,OK。但 新结构里 judger 实际只在 mark-vm-stopped-and-save-last-host flow 的 logger.debug(...) 里被读 —— 仅日志而已。原本就是这样吗?如果 judger.judge(...) 应该真正参与 HA 决策,那这次 FlowChain 改造可能漏了一个步骤;如果 judger 历来就只是 logging 用,那建议直接 inline 成 getClass() 字符串、并把变量重命名("judger" 这名字读起来像决策器,但代码里不参与决策)。请确认是否仅日志用途。

  4. FlowChain 失败处理的状态语义fchain.error(...) 把 errCode set 进 reply 然后 bus.reply(msg, reply) + recorder.withFailReason(...) + chain.next()mark-vm-stopped-and-save-last-host 是后置 step:HA-pre-fence 失败(refuse)→ 跳过 mark-stopped → error handler → reply.setError → chain.next。这意味着 VM 状态保持 Running/hostUuid=<suspect> —— 行为本身是对的(HA 拒绝就不该改状态),但用户体验:MN 这边 vm 还是 Running on suspect host(虽然 ping 失败已经触发 HA);suspect host 现在 Disconnected,VM 在 DB 里却是 Running,看上去会迷惑。建议在 reply.setError 文案附一句 "vm[X] kept in Running state pending suspect host recovery / manual intervention"。

  5. rolling-upgrade 行为:fallback path pickFencePeerHost 选到旧 kvmagent(5.4.7,无 /ha/fence/vm/from/peer 端点)时,agent 返回 404 / 端点未知,KVMHostAsyncHttpCallReply.toResponse(FenceVmFromPeerRsp.class) 通常会返回 success=false + 全 null 的 rsp,被本仓判 verdicts != 1 → ambiguous → refuse。rolling upgrade 期间所有 HA 都会被 refuse,这是保守的,但运维必须知道:升级 zstack OSS 之前要先把所有 KVM host 的 kvmagent 升到 5.4.8;否则 HA 完全停摆。Release notes / 升级文档里要明确这条。建议在 KVMHost.handle(FenceVmOnHostMsg) 检测到 404 时输出一条特殊的 logger.warn(例如 agent endpoint not found, kvmagent likely older than 5.4.8),便于现场快速排查。

  6. VmInstanceBase 第 ~990 行 bus.makeTargetServiceIdByResourceUuid(fmsg, HostConstant.SERVICE_ID, suspectHostUuid):suspect host 当前 Disconnected,但 KVMHost service 是 MN 侧的 in-memory Java service,bus.send 是 MN 内部消息,不依赖 host 网络可达 → 没问题。但建议加一行注释说明,免得 reviewer 误以为这是远程消息(看到 hostUuid + Disconnected 状态会本能怀疑)。

  7. FenceVmFromPeerCmd.targetHostUuid 字段:被 cmd 携带但 utility 侧 Python 代码完全不用(只用 ip/user/pass/port)。本意应该是 audit / 日志 trace 用?建议在 utility !7019 那边把 vm_uuid + targetHostUuid 一起记进 logger.info,跨 MN/agent 关联日志。

🟢 Suggestions

  1. KVMHost.FenceVmFromPeerCmd / FenceVmFromPeerRsp 在 KVMHost.java 里作为 inner class(commit 436a16b inline 进来)。从其他 plugin / 测试引用 KVMHost.FenceVmFromPeerCmd —— OK 可工作,但 KVMHost.java 已经 7000+ 行,inner class 形式让维护更累。如果未来再加 fence 相关字段,建议拆出 kvm/ha/FenceVmFromPeerProtocol.java 之类的 package(恢复到 inline 之前的分文件结构,但放进 ha 子 package 更合理)。

  2. mark-vm-stopped-and-save-last-host step 是同步 DB 操作 + log,没必要包成独立 NoRollbackFlow。可以 inline 进 start-ha-vm step 起头,少一层嵌套;或者保留但在 step name 里说明 "synchronous prep"。

  3. KVMHost.handle(FenceVmOnHostMsg) 的 logger.info 里 peerHostUuid / suspectHostUuid 反复出现,可以封一个 prefix [HA pre-fence vm=X suspect=Y peer=Z] MDC 模式,看 log 更顺;和 utility 侧 [HA pre-fence] 前缀也对得起来便于 grep。

  4. KVMSimulatorController.fenceVmFromPeer 第 759 行:CodeRabbit 提了一条建议(反序列化 cmd 并存到 config 里便于测试 assert),作者标 resolved 但 diff 没改。premium 的集成测试用 AtomicReference<FenceVmFromPeerCmd> capturedCmd 自己 hook,可以工作;但如果将来想从 OSS 侧也写测试,simulator 里没存 cmd 就要重新加 hook。建议补上反序列化 + config.fenceVmFromPeerCmds.add(cmd) 的标准 simulator 模式。

Coverage 评估

  • 协议设计:✅ msg/reply/cmd/rsp 命名规范,@GrayVersion("5.4.8") 全员标注,@NoLogging 把密码挡掉
  • KVM-only 门控:✅ !KVM_HYPERVISOR_TYPE.equals(self.getHypervisorType()) 跳过
  • opt-in 默认:✅ siblingHostUuids 为空时跳过 pre-fence,老链路(非 HaKvmWorker)零行为变化
  • ambiguous-refuse:✅ verdicts != 1 / !rsp.isSuccess() 都走 refuse 分支
  • 安全:✅ password @NoLogging、stdout/stderr 仅记长度不进 error chain
  • 测试:⚠️ OSS 自己没集成测试(依赖 premium 的 HaPreFenceVmFromPeerCase),且该测试通过 env.simulator 跳过了 utility 的真实 SSH,整套 fix 的端到端契约没有真正被测过

Verdict

REVISION_REQUIRED —— 本仓自身实现合理,但 关联仓 zstack-utility !7019 有 Critical。三个 MR 必须一起 merge,utility 那条 pkill self-match bug 不修,本仓的 refuse 分支永远走不到、整套 fix 等于失效。本仓侧建议处理 Warning 里的 sshTimeoutSec 配置化、retry-on-different-sibling、rolling-upgrade 文档说明三条。


🤖 Robot Reviewer

ZSTAC-83890 / TIC-5513: Before allowing HA to start a VM on a new host,
ask a sibling KVM host to SSH into the suspect host and force-destroy
any leftover qemu process (virsh destroy + pkill). This prevents
split-brain when the suspect host is alive but unreachable from MN.

- KVMHost handles FenceVmOnHostMsg, picks a Connected sibling, sends
  FenceVmFromPeerCmd to the sibling's kvmagent via KVMHostAsyncHttpCallMsg
- kvmagent (ha_plugin.py) SSHes to target host, runs virsh destroy +
  pkill; returns qemuConfirmedDead / qemuStillAlive / targetHostUnreachable
- FenceVmFromPeerCmd/Rsp defined as KVMHost inner static classes;
  path constant KVM_HA_FENCE_VM_FROM_PEER_PATH in KVMConstant
- @nologging on targetHostPassword; implements Serializable for
  CloudBusGsonSkipLoggerCase

Resolves: ZSTAC-83890

Change-Id: Ibcd7b318f57248795006079c7059911a2ecde333
@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890@@3 branch from 746506a to 2cb4fef Compare May 9, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants