Skip to content

Initial aarch64/KVM support#1474

Merged
syntactically merged 29 commits into
mainfrom
lm/aarch64
Jun 26, 2026
Merged

Initial aarch64/KVM support#1474
syntactically merged 29 commits into
mainfrom
lm/aarch64

Conversation

@syntactically

Copy link
Copy Markdown
Member

Initial aarch64 support.

Tested:

  • Primarily on Linux 6.18 KVM, nested virt inside Hypervisor.framework MacOS Build 25F5042g
  • Secondarily on Linux 6.8.12-tegra on Nvidia Jetson T5000
  • Tertiary testing in the process of debugging on a variety of other available arm platforms

There's still some unsupported features (debug/trace/crashdump, etc) and further API surface needed to allow architecture-independent clients like Hyperlight-Wasm to access all of the virtual memory features they need; this is largely elaborated on in the commit messages.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces initial aarch64 support backed by KVM, extending Hyperlight’s VM, guest runtime, and memory layout to run the existing sandbox model on arm64 (including MMIO-based “outb” equivalents, paging, and exception handling).

Changes:

  • Add an aarch64 KVM VirtualMachine implementation, including vCPU reset support and MMIO-exit handling via an IO page.
  • Implement aarch64 guest runtime pieces (paging ops, exception vectors/handler, stack init, guest exits) and adjust test guests for aarch64.
  • Refactor layout constants (MAX_GPA/MAX_GVASCRATCH_TOP_*) and map the IO page into snapshots where applicable; update build tooling (Justfile/Nix/libc build lists) for aarch64 targets.

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/tests/rust_guests/simpleguest/src/main.rs Make the Rust test guest work on aarch64 (undef instruction, executable buffers, IO-page “outb” substitute, icache sync).
src/tests/rust_guests/dummyguest/src/main.rs Add aarch64 halt/MMIO paths and switch entrypoint ABI.
src/hyperlight_libc/build.rs Add aarch64 include paths and file lists to the picolibc build.
src/hyperlight_libc/build_files.rs Introduce aarch64 libc/libm file lists.
src/hyperlight_host/tests/integration_test.rs Adjust assertions and gate x86_64-only tests for architecture differences.
src/hyperlight_host/src/sandbox/uninitialized_evolve.rs Update VM initialization call signature (page size argument removed).
src/hyperlight_host/src/sandbox/snapshot/mod.rs Map IO page into guest page tables and switch to SCRATCH_TOP_* bounds.
src/hyperlight_host/src/sandbox/initialized_multi_use.rs Adapt tests/fixtures for aarch64 instruction encodings and exception strings.
src/hyperlight_host/src/mem/mgr.rs Use SCRATCH_TOP_GVA for crashdump region sizing.
src/hyperlight_host/src/hypervisor/virtual_machine/mod.rs Add ResetVcpuError and vCPU reset capability hooks to the VM trait.
src/hyperlight_host/src/hypervisor/virtual_machine/kvm/x86_64.rs Switch CPUID max-phys-address computation to SCRATCH_TOP_GPA.
src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs Implement aarch64 KVM backend: map/unmap, run loop, MMIO → VmExit translation, regs/fpu/sregs, vCPU reset.
src/hyperlight_host/src/hypervisor/regs/aarch64/special_regs.rs Define aarch64 special register structure and default register programming values.
src/hyperlight_host/src/hypervisor/regs/aarch64/mod.rs Wire up real aarch64 common regs/fpu/special-reg modules and KVM reg IDs.
src/hyperlight_host/src/hypervisor/regs/aarch64/kvm_reg.rs Define KVM register ID constants and typed get/set helpers for aarch64.
src/hyperlight_host/src/hypervisor/regs/aarch64/fpu.rs Adds an (apparently redundant) CommonFpu definition file.
src/hyperlight_host/src/hypervisor/regs/aarch64/common_regs.rs Define CommonRegisters for aarch64 (x0-x30, sp, pc, pstate).
src/hyperlight_host/src/hypervisor/regs/aarch64/common_fpu.rs Define CommonFpu for aarch64 (v0-v31, fpsr/fpcr).
src/hyperlight_host/src/hypervisor/mod.rs Update internal test setup to match new initialization signature and layout constants.
src/hyperlight_host/src/hypervisor/hyperlight_vm/x86_64.rs Remove page_size parameter from initialise path and store page size in the VM struct.
src/hyperlight_host/src/hypervisor/hyperlight_vm/mod.rs Plumb ResetVcpuError and add vm_can_reset_vcpu tracking in HyperlightVm.
src/hyperlight_host/src/hypervisor/hyperlight_vm/aarch64.rs Implement aarch64 HyperlightVm lifecycle: create VM, init regs, dispatch calls, reset vCPU, root PT access.
src/hyperlight_guest/src/layout.rs Switch scratch metadata pointers from MAX_GVA to SCRATCH_TOP_GVA.
src/hyperlight_guest/src/arch/amd64/prim_alloc.rs Switch allocator ceiling to SCRATCH_TOP_GPA.
src/hyperlight_guest/src/arch/aarch64/prim_alloc.rs Implement aarch64 physical page allocator using atomic add and scratch bounds checks.
src/hyperlight_guest/src/arch/aarch64/layout.rs Implement scratch size/base getters and move main stack to lower half for aarch64.
src/hyperlight_guest/src/arch/aarch64/exit.rs Implement aarch64 “out32” via IO-page MMIO writes.
src/hyperlight_guest_bin/src/paging.rs Split paging implementation into per-arch modules and re-export shared API.
src/hyperlight_guest_bin/src/lib.rs Make paging available on all arches; add shared init module.
src/hyperlight_guest_bin/src/init.rs Add shared stack init logic used by multiple architectures.
src/hyperlight_guest_bin/src/arch/amd64/paging.rs Move existing amd64 paging implementation under arch-specific module.
src/hyperlight_guest_bin/src/arch/amd64/init.rs Use shared stack init and update scratch-top constant usage.
src/hyperlight_guest_bin/src/arch/aarch64/paging.rs Add aarch64 paging implementation (volatile PTE ops, barriers, TTBR0 root).
src/hyperlight_guest_bin/src/arch/aarch64/mod.rs Implement aarch64 entrypoint, dispatch stub (with optional TLBI), VBAR init, and stack pivoting.
src/hyperlight_guest_bin/src/arch/aarch64/exception/types.rs Define saved exception context and exception origin/type enums.
src/hyperlight_guest_bin/src/arch/aarch64/exception/mod.rs Add aarch64 exception module plumbing.
src/hyperlight_guest_bin/src/arch/aarch64/exception/handle.rs Implement aarch64 exception decoding + handling for stack growth and CoW faults; emit abort diagnostics.
src/hyperlight_guest_bin/src/arch/aarch64/exception/entry.rs Provide AArch64 vector table and context save/restore in global asm.
src/hyperlight_common/src/vmem.rs Refactor shared UpdateParent* infrastructure and export aarch64 MAIR attribute index.
src/hyperlight_common/src/layout.rs Replace MAX_GPA/MAX_GVA exports with SCRATCH_TOP_* and add io_page.
src/hyperlight_common/src/arch/i686/layout.rs Rename max layout constants to scratch-top equivalents.
src/hyperlight_common/src/arch/amd64/vmem.rs Move UpdateParentTable/Root definitions into shared vmem module and tighten visibility.
src/hyperlight_common/src/arch/amd64/layout.rs Rename scratch-top constants and add io_page() stub returning None.
src/hyperlight_common/src/arch/aarch64/vmem.rs Implement aarch64 page table encode/decode and virt→phys walking logic.
src/hyperlight_common/src/arch/aarch64/layout.rs Define aarch64 scratch and IO page locations and scratch sizing logic.
Justfile Add arch-selectable guest build outputs via HYPERLIGHT_TARGET.
flake.nix Expand supported platforms (aarch64-linux) and adjust cargo-hyperlight source pinning/build.
docs/paging-development-notes.md Document aarch64 addressing assumptions (48-bit, lower-half/TTBR0).
c.just Make C guest compilation/linking target-arch aware.

Comment thread src/hyperlight_host/src/hypervisor/hyperlight_vm/aarch64.rs Outdated
Comment thread src/hyperlight_host/src/hypervisor/regs/aarch64/kvm_reg.rs Outdated
Comment thread src/hyperlight_host/src/hypervisor/regs/aarch64/fpu.rs Outdated
Comment thread src/tests/rust_guests/dummyguest/src/main.rs
Comment thread src/tests/rust_guests/dummyguest/src/main.rs
Comment thread src/hyperlight_guest/src/arch/aarch64/exit.rs
Comment thread src/hyperlight_guest/src/arch/aarch64/prim_alloc.rs
Comment thread src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs
Comment thread src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs
@syntactically syntactically added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label May 27, 2026
@ludfjig

ludfjig commented May 29, 2026

Copy link
Copy Markdown
Contributor

This is great and the commits are very clean. I'm not an expert on the aarch64 specific stuff, so I'll trust you on most of those. Some other stuff:

  • FPCR and FPSR seems to both be 0xd4
  • Off-by-one in host MMIO decode in src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs: addr > io_page_gpa excludes port 0; should be >=.
  • Off-by-one in guest exit in src/hyperlight_guest/src/arch/aarch64/exit.rs: port > PAGE_SIZE / 8 allows one write past the IO page; should be >=.
  • kvm-ioctls has VmFd::enable_cap(&cap) , can we use that instead of the raw KVM_CAP_ARM_NISV_TO_USER ioctls?
  • Unused file fpu.rs (with typo PartialEq1), we use common_fpu.rs I believe.
  • Are both the surrounding run_immediate_exist around the vcpu_init required in reset_vcpu, or is the first one sufficient?

I also have a suggestion for style on kvm_reg.rs and kvm/aarch64.rs (feel free to ignore if you don't agree).

Style patch (click to expand)
diff --git i/src/hyperlight_host/src/hypervisor/regs/aarch64/kvm_reg.rs w/src/hyperlight_host/src/hypervisor/regs/aarch64/kvm_reg.rs
index c6d5a51e9..96968172c 100644
--- i/src/hyperlight_host/src/hypervisor/regs/aarch64/kvm_reg.rs
+++ w/src/hyperlight_host/src/hypervisor/regs/aarch64/kvm_reg.rs
@@ -7,19 +7,7 @@ use kvm_bindings::{
 };
 use kvm_ioctls::VcpuFd;
 
-enum Size {
-    U32,
-    U64,
-    U128,
-}
-const fn size_kvm_bits(s: Size) -> u64 {
-    match s {
-        Size::U32 => KVM_REG_SIZE_U32,
-        Size::U64 => KVM_REG_SIZE_U64,
-        Size::U128 => KVM_REG_SIZE_U128,
-    }
-}
-const fn kvm_sys_reg(op0: u8, op1: u8, crn: u8, crm: u8, op2: u8, s: Size) -> u64 {
+const fn kvm_sys_reg(op0: u8, op1: u8, crn: u8, crm: u8, op2: u8, size: u64) -> u64 {
     KVM_REG_ARM64
         | (KVM_REG_ARM64_SYSREG as u64)
         | (((op0 as u64) << KVM_REG_ARM64_SYSREG_OP0_SHIFT) & KVM_REG_ARM64_SYSREG_OP0_MASK as u64)
@@ -27,98 +15,54 @@ const fn kvm_sys_reg(op0: u8, op1: u8, crn: u8, crm: u8, op2: u8, s: Size) -> u6
         | (((crn as u64) << KVM_REG_ARM64_SYSREG_CRN_SHIFT) & KVM_REG_ARM64_SYSREG_CRN_MASK as u64)
         | (((crm as u64) << KVM_REG_ARM64_SYSREG_CRM_SHIFT) & KVM_REG_ARM64_SYSREG_CRM_MASK as u64)
         | (((op2 as u64) << KVM_REG_ARM64_SYSREG_OP2_SHIFT) & KVM_REG_ARM64_SYSREG_OP2_MASK as u64)
-        | size_kvm_bits(s)
+        | size
 }
 macro_rules! decl_sys_reg {
     ($name:ident, $op0:expr, $op1:expr, $crn:expr, $crm:expr, $op2:expr, $size:ident) => {
-        pub const $name: u64 = kvm_sys_reg($op0, $op1, $crn, $crm, $op2, Size::$size);
+        pub const $name: u64 = kvm_sys_reg($op0, $op1, $crn, $crm, $op2, $size);
     };
 }
-decl_sys_reg!(TTBR0_EL1, 0b11, 0b000, 0b0010, 0b0000, 0b000, U64);
-decl_sys_reg!(TCR_EL1, 0b11, 0b000, 0b0010, 0b0000, 0b010, U64);
-decl_sys_reg!(MAIR_EL1, 0b11, 0b000, 0b1010, 0b0010, 0b000, U64);
-decl_sys_reg!(SCTLR_EL1, 0b11, 0b000, 0b0001, 0b0000, 0b000, U64);
-decl_sys_reg!(CPACR_EL1, 0b11, 0b000, 0b0001, 0b0000, 0b010, U64);
-decl_sys_reg!(VBAR_EL1, 0b11, 0b000, 0b1100, 0b0000, 0b000, U64);
+decl_sys_reg!(TTBR0_EL1, 0b11, 0b000, 0b0010, 0b0000, 0b000, KVM_REG_SIZE_U64);
+decl_sys_reg!(TCR_EL1,   0b11, 0b000, 0b0010, 0b0000, 0b010, KVM_REG_SIZE_U64);
+decl_sys_reg!(MAIR_EL1,  0b11, 0b000, 0b1010, 0b0010, 0b000, KVM_REG_SIZE_U64);
+decl_sys_reg!(SCTLR_EL1, 0b11, 0b000, 0b0001, 0b0000, 0b000, KVM_REG_SIZE_U64);
+decl_sys_reg!(CPACR_EL1, 0b11, 0b000, 0b0001, 0b0000, 0b010, KVM_REG_SIZE_U64);
+decl_sys_reg!(VBAR_EL1,  0b11, 0b000, 0b1100, 0b0000, 0b000, KVM_REG_SIZE_U64);
 
-const fn kvm_core_reg(offset: u8, s: Size) -> u64 {
-    KVM_REG_ARM64 | 0x10_0000u64 | offset as u64 | size_kvm_bits(s)
+const fn kvm_core_reg(offset: u8, size: u64) -> u64 {
+    KVM_REG_ARM64 | 0x10_0000u64 | offset as u64 | size
 }
-macro_rules! decl_core_reg {
-    ($name:ident, $offset:expr, $size:ident) => {
-        pub const $name: u64 = kvm_core_reg($offset, Size::$size);
-    };
-}
-decl_core_reg!(X0, 0x00, U64);
-decl_core_reg!(X1, 0x02, U64);
-decl_core_reg!(X2, 0x04, U64);
-decl_core_reg!(X3, 0x06, U64);
-decl_core_reg!(X4, 0x08, U64);
-decl_core_reg!(X5, 0x0A, U64);
-decl_core_reg!(X6, 0x0C, U64);
-decl_core_reg!(X7, 0x0E, U64);
-decl_core_reg!(X8, 0x10, U64);
-decl_core_reg!(X9, 0x12, U64);
-decl_core_reg!(X10, 0x14, U64);
-decl_core_reg!(X11, 0x16, U64);
-decl_core_reg!(X12, 0x18, U64);
-decl_core_reg!(X13, 0x1A, U64);
-decl_core_reg!(X14, 0x1C, U64);
-decl_core_reg!(X15, 0x1E, U64);
-decl_core_reg!(X16, 0x20, U64);
-decl_core_reg!(X17, 0x22, U64);
-decl_core_reg!(X18, 0x24, U64);
-decl_core_reg!(X19, 0x26, U64);
-decl_core_reg!(X20, 0x28, U64);
-decl_core_reg!(X21, 0x2A, U64);
-decl_core_reg!(X22, 0x2C, U64);
-decl_core_reg!(X23, 0x2E, U64);
-decl_core_reg!(X24, 0x30, U64);
-decl_core_reg!(X25, 0x32, U64);
-decl_core_reg!(X26, 0x34, U64);
-decl_core_reg!(X27, 0x36, U64);
-decl_core_reg!(X28, 0x38, U64);
-decl_core_reg!(X29, 0x3A, U64);
-decl_core_reg!(X30, 0x3C, U64);
-decl_core_reg!(SP, 0x3E, U64);
-decl_core_reg!(PC, 0x40, U64);
-decl_core_reg!(PSTATE, 0x42, U64);
-decl_core_reg!(SP_EL1, 0x44, U64);
-// ignore the other SPSRs that are just for AA32-compat
-decl_core_reg!(V0, 0x54, U128);
-decl_core_reg!(V1, 0x58, U128);
-decl_core_reg!(V2, 0x5c, U128);
-decl_core_reg!(V3, 0x60, U128);
-decl_core_reg!(V4, 0x64, U128);
-decl_core_reg!(V5, 0x68, U128);
-decl_core_reg!(V6, 0x6c, U128);
-decl_core_reg!(V7, 0x70, U128);
-decl_core_reg!(V8, 0x74, U128);
-decl_core_reg!(V9, 0x78, U128);
-decl_core_reg!(V10, 0x7c, U128);
-decl_core_reg!(V11, 0x80, U128);
-decl_core_reg!(V12, 0x84, U128);
-decl_core_reg!(V13, 0x88, U128);
-decl_core_reg!(V14, 0x8c, U128);
-decl_core_reg!(V15, 0x90, U128);
-decl_core_reg!(V16, 0x94, U128);
-decl_core_reg!(V17, 0x98, U128);
-decl_core_reg!(V18, 0x9c, U128);
-decl_core_reg!(V19, 0xa0, U128);
-decl_core_reg!(V20, 0xa4, U128);
-decl_core_reg!(V21, 0xa8, U128);
-decl_core_reg!(V22, 0xac, U128);
-decl_core_reg!(V23, 0xb0, U128);
-decl_core_reg!(V24, 0xb4, U128);
-decl_core_reg!(V25, 0xb8, U128);
-decl_core_reg!(V26, 0xbc, U128);
-decl_core_reg!(V27, 0xc0, U128);
-decl_core_reg!(V28, 0xc4, U128);
-decl_core_reg!(V29, 0xc8, U128);
-decl_core_reg!(V30, 0xcc, U128);
-decl_core_reg!(V31, 0xd0, U128);
-decl_core_reg!(FPSR, 0xd4, U32);
-decl_core_reg!(FPCR, 0xd4, U32);
+
+/// KVM register IDs for the 31 general-purpose registers X0..X30.
+/// Each consecutive u64 register is 2 u32-slots apart in `kvm_regs`.
+pub const X: [u64; 31] = {
+    let mut r = [0u64; 31];
+    let mut i = 0;
+    while i < 31 {
+        r[i] = kvm_core_reg((i * 2) as u8, KVM_REG_SIZE_U64);
+        i += 1;
+    }
+    r
+};
+pub const SP: u64 = kvm_core_reg(0x3E, KVM_REG_SIZE_U64);
+pub const PC: u64 = kvm_core_reg(0x40, KVM_REG_SIZE_U64);
+pub const PSTATE: u64 = kvm_core_reg(0x42, KVM_REG_SIZE_U64);
+pub const SP_EL1: u64 = kvm_core_reg(0x44, KVM_REG_SIZE_U64);
+// The other SPSRs are AA32-compat only.
+
+/// KVM register IDs for the 32 NEON/FP registers V0..V31.
+/// Each consecutive u128 register is 4 u32-slots apart in `kvm_regs`.
+pub const V: [u64; 32] = {
+    let mut r = [0u64; 32];
+    let mut i = 0;
+    while i < 32 {
+        r[i] = kvm_core_reg((0x54 + i * 4) as u8, KVM_REG_SIZE_U128);
+        i += 1;
+    }
+    r
+};
+pub const FPSR: u64 = kvm_core_reg(0xd4, KVM_REG_SIZE_U32);
+pub const FPCR: u64 = kvm_core_reg(0xd4, KVM_REG_SIZE_U32);
 
 pub(crate) fn get_reg_bytes<const N: usize, E>(
     fd: &VcpuFd,
diff --git i/src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs w/src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs
index d20b6fd8b..da28d3081 100644
--- i/src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs
+++ w/src/hyperlight_host/src/hypervisor/virtual_machine/kvm/aarch64.rs
@@ -53,7 +53,7 @@ pub(crate) struct KvmVm {
 }
 
 impl KvmVm {
-    pub(self) fn vcpu_init(&mut self) -> Result<(), HypervisorError> {
+    fn vcpu_init(&mut self) -> Result<(), HypervisorError> {
         let mut kvi = kvm_bindings::kvm_vcpu_init::default();
         self.vm_fd.get_preferred_target(&mut kvi)?;
         self.vcpu_fd.vcpu_init(&kvi)?;
@@ -295,178 +295,63 @@ impl VirtualMachine for KvmVm {
     }
 
     fn regs(&self) -> std::result::Result<CommonRegisters, RegisterError> {
-        use crate::hypervisor::regs::kvm_reg::get_reg;
+        use crate::hypervisor::regs::kvm_reg::{PC, PSTATE, SP, X, get_reg_bytes};
         fn err(e: kvm_ioctls::Error) -> RegisterError {
             RegisterError::GetSregs(e.into())
         }
+        let mut x = [0u64; 31];
+        for (i, &id) in X.iter().enumerate() {
+            x[i] = u64::from_ne_bytes(get_reg_bytes::<8, _>(&self.vcpu_fd, id, err)?);
+        }
         Ok(CommonRegisters {
-            x: [
-                get_reg!(&self.vcpu_fd, err, X0, u64)?,
-                get_reg!(&self.vcpu_fd, err, X1, u64)?,
-                get_reg!(&self.vcpu_fd, err, X2, u64)?,
-                get_reg!(&self.vcpu_fd, err, X3, u64)?,
-                get_reg!(&self.vcpu_fd, err, X4, u64)?,
-                get_reg!(&self.vcpu_fd, err, X5, u64)?,
-                get_reg!(&self.vcpu_fd, err, X6, u64)?,
-                get_reg!(&self.vcpu_fd, err, X7, u64)?,
-                get_reg!(&self.vcpu_fd, err, X8, u64)?,
-                get_reg!(&self.vcpu_fd, err, X9, u64)?,
-                get_reg!(&self.vcpu_fd, err, X10, u64)?,
-                get_reg!(&self.vcpu_fd, err, X11, u64)?,
-                get_reg!(&self.vcpu_fd, err, X12, u64)?,
-                get_reg!(&self.vcpu_fd, err, X13, u64)?,
-                get_reg!(&self.vcpu_fd, err, X14, u64)?,
-                get_reg!(&self.vcpu_fd, err, X15, u64)?,
-                get_reg!(&self.vcpu_fd, err, X16, u64)?,
-                get_reg!(&self.vcpu_fd, err, X17, u64)?,
-                get_reg!(&self.vcpu_fd, err, X18, u64)?,
-                get_reg!(&self.vcpu_fd, err, X19, u64)?,
-                get_reg!(&self.vcpu_fd, err, X20, u64)?,
-                get_reg!(&self.vcpu_fd, err, X21, u64)?,
-                get_reg!(&self.vcpu_fd, err, X22, u64)?,
-                get_reg!(&self.vcpu_fd, err, X23, u64)?,
-                get_reg!(&self.vcpu_fd, err, X24, u64)?,
-                get_reg!(&self.vcpu_fd, err, X25, u64)?,
-                get_reg!(&self.vcpu_fd, err, X26, u64)?,
-                get_reg!(&self.vcpu_fd, err, X27, u64)?,
-                get_reg!(&self.vcpu_fd, err, X28, u64)?,
-                get_reg!(&self.vcpu_fd, err, X29, u64)?,
-                get_reg!(&self.vcpu_fd, err, X30, u64)?,
-            ],
-            sp: get_reg!(&self.vcpu_fd, err, SP, u64)?,
-            pc: get_reg!(&self.vcpu_fd, err, PC, u64)?,
-            pstate: get_reg!(&self.vcpu_fd, err, PSTATE, u64)?,
+            x,
+            sp: u64::from_ne_bytes(get_reg_bytes::<8, _>(&self.vcpu_fd, SP, err)?),
+            pc: u64::from_ne_bytes(get_reg_bytes::<8, _>(&self.vcpu_fd, PC, err)?),
+            pstate: u64::from_ne_bytes(get_reg_bytes::<8, _>(&self.vcpu_fd, PSTATE, err)?),
         })
     }
 
     fn set_regs(&self, regs: &CommonRegisters) -> std::result::Result<(), RegisterError> {
-        use crate::hypervisor::regs::kvm_reg::set_reg;
+        use crate::hypervisor::regs::kvm_reg::{PC, PSTATE, SP, X, set_reg_bytes};
         fn err(e: kvm_ioctls::Error) -> RegisterError {
             RegisterError::SetSregs(e.into())
         }
-        set_reg!(&self.vcpu_fd, err, X0, u64, regs.x[0])?;
-        set_reg!(&self.vcpu_fd, err, X1, u64, regs.x[1])?;
-        set_reg!(&self.vcpu_fd, err, X2, u64, regs.x[2])?;
-        set_reg!(&self.vcpu_fd, err, X3, u64, regs.x[3])?;
-        set_reg!(&self.vcpu_fd, err, X4, u64, regs.x[4])?;
-        set_reg!(&self.vcpu_fd, err, X5, u64, regs.x[5])?;
-        set_reg!(&self.vcpu_fd, err, X6, u64, regs.x[6])?;
-        set_reg!(&self.vcpu_fd, err, X7, u64, regs.x[7])?;
-        set_reg!(&self.vcpu_fd, err, X8, u64, regs.x[8])?;
-        set_reg!(&self.vcpu_fd, err, X9, u64, regs.x[9])?;
-        set_reg!(&self.vcpu_fd, err, X10, u64, regs.x[10])?;
-        set_reg!(&self.vcpu_fd, err, X11, u64, regs.x[11])?;
-        set_reg!(&self.vcpu_fd, err, X12, u64, regs.x[12])?;
-        set_reg!(&self.vcpu_fd, err, X13, u64, regs.x[13])?;
-        set_reg!(&self.vcpu_fd, err, X14, u64, regs.x[14])?;
-        set_reg!(&self.vcpu_fd, err, X15, u64, regs.x[15])?;
-        set_reg!(&self.vcpu_fd, err, X16, u64, regs.x[16])?;
-        set_reg!(&self.vcpu_fd, err, X17, u64, regs.x[17])?;
-        set_reg!(&self.vcpu_fd, err, X18, u64, regs.x[18])?;
-        set_reg!(&self.vcpu_fd, err, X19, u64, regs.x[19])?;
-        set_reg!(&self.vcpu_fd, err, X20, u64, regs.x[20])?;
-        set_reg!(&self.vcpu_fd, err, X21, u64, regs.x[21])?;
-        set_reg!(&self.vcpu_fd, err, X22, u64, regs.x[22])?;
-        set_reg!(&self.vcpu_fd, err, X23, u64, regs.x[23])?;
-        set_reg!(&self.vcpu_fd, err, X24, u64, regs.x[24])?;
-        set_reg!(&self.vcpu_fd, err, X25, u64, regs.x[25])?;
-        set_reg!(&self.vcpu_fd, err, X26, u64, regs.x[26])?;
-        set_reg!(&self.vcpu_fd, err, X27, u64, regs.x[27])?;
-        set_reg!(&self.vcpu_fd, err, X28, u64, regs.x[28])?;
-        set_reg!(&self.vcpu_fd, err, X29, u64, regs.x[29])?;
-        set_reg!(&self.vcpu_fd, err, X30, u64, regs.x[30])?;
-        set_reg!(&self.vcpu_fd, err, SP, u64, regs.sp)?;
-        set_reg!(&self.vcpu_fd, err, PC, u64, regs.pc)?;
-        set_reg!(&self.vcpu_fd, err, PSTATE, u64, regs.pstate)?;
-
+        for (i, &id) in X.iter().enumerate() {
+            set_reg_bytes::<8, _>(&self.vcpu_fd, err, id, regs.x[i].to_ne_bytes())?;
+        }
+        set_reg_bytes::<8, _>(&self.vcpu_fd, err, SP, regs.sp.to_ne_bytes())?;
+        set_reg_bytes::<8, _>(&self.vcpu_fd, err, PC, regs.pc.to_ne_bytes())?;
+        set_reg_bytes::<8, _>(&self.vcpu_fd, err, PSTATE, regs.pstate.to_ne_bytes())?;
         Ok(())
     }
 
     fn fpu(&self) -> Result<CommonFpu, RegisterError> {
         use crate::hypervisor::regs::CommonFpu;
-        use crate::hypervisor::regs::kvm_reg::get_reg;
+        use crate::hypervisor::regs::kvm_reg::{FPCR, FPSR, V, get_reg_bytes};
         fn err(e: kvm_ioctls::Error) -> RegisterError {
             RegisterError::GetFpu(e.into())
         }
+        let mut v = [0u128; 32];
+        for (i, &id) in V.iter().enumerate() {
+            v[i] = u128::from_ne_bytes(get_reg_bytes::<16, _>(&self.vcpu_fd, id, err)?);
+        }
         Ok(CommonFpu {
-            v: [
-                get_reg!(&self.vcpu_fd, err, V0, u128)?,
-                get_reg!(&self.vcpu_fd, err, V1, u128)?,
-                get_reg!(&self.vcpu_fd, err, V2, u128)?,
-                get_reg!(&self.vcpu_fd, err, V3, u128)?,
-                get_reg!(&self.vcpu_fd, err, V4, u128)?,
-                get_reg!(&self.vcpu_fd, err, V5, u128)?,
-                get_reg!(&self.vcpu_fd, err, V6, u128)?,
-                get_reg!(&self.vcpu_fd, err, V7, u128)?,
-                get_reg!(&self.vcpu_fd, err, V8, u128)?,
-                get_reg!(&self.vcpu_fd, err, V9, u128)?,
-                get_reg!(&self.vcpu_fd, err, V10, u128)?,
-                get_reg!(&self.vcpu_fd, err, V11, u128)?,
-                get_reg!(&self.vcpu_fd, err, V12, u128)?,
-                get_reg!(&self.vcpu_fd, err, V13, u128)?,
-                get_reg!(&self.vcpu_fd, err, V14, u128)?,
-                get_reg!(&self.vcpu_fd, err, V15, u128)?,
-                get_reg!(&self.vcpu_fd, err, V16, u128)?,
-                get_reg!(&self.vcpu_fd, err, V17, u128)?,
-                get_reg!(&self.vcpu_fd, err, V18, u128)?,
-                get_reg!(&self.vcpu_fd, err, V19, u128)?,
-                get_reg!(&self.vcpu_fd, err, V20, u128)?,
-                get_reg!(&self.vcpu_fd, err, V21, u128)?,
-                get_reg!(&self.vcpu_fd, err, V22, u128)?,
-                get_reg!(&self.vcpu_fd, err, V23, u128)?,
-                get_reg!(&self.vcpu_fd, err, V24, u128)?,
-                get_reg!(&self.vcpu_fd, err, V25, u128)?,
-                get_reg!(&self.vcpu_fd, err, V26, u128)?,
-                get_reg!(&self.vcpu_fd, err, V27, u128)?,
-                get_reg!(&self.vcpu_fd, err, V28, u128)?,
-                get_reg!(&self.vcpu_fd, err, V29, u128)?,
-                get_reg!(&self.vcpu_fd, err, V30, u128)?,
-                get_reg!(&self.vcpu_fd, err, V31, u128)?,
-            ],
-            fpsr: get_reg!(&self.vcpu_fd, err, FPSR, u32)?,
-            fpcr: get_reg!(&self.vcpu_fd, err, FPCR, u32)?,
+            v,
+            fpsr: u32::from_ne_bytes(get_reg_bytes::<4, _>(&self.vcpu_fd, FPSR, err)?),
+            fpcr: u32::from_ne_bytes(get_reg_bytes::<4, _>(&self.vcpu_fd, FPCR, err)?),
         })
     }
 
     fn set_fpu(&self, fpu: &CommonFpu) -> Result<(), RegisterError> {
-        use crate::hypervisor::regs::kvm_reg::set_reg;
+        use crate::hypervisor::regs::kvm_reg::{FPCR, FPSR, V, set_reg_bytes};
         fn err(e: kvm_ioctls::Error) -> RegisterError {
             RegisterError::SetFpu(e.into())
         }
-        set_reg!(&self.vcpu_fd, err, V0, u128, fpu.v[0])?;
-        set_reg!(&self.vcpu_fd, err, V1, u128, fpu.v[1])?;
-        set_reg!(&self.vcpu_fd, err, V2, u128, fpu.v[2])?;
-        set_reg!(&self.vcpu_fd, err, V3, u128, fpu.v[3])?;
-        set_reg!(&self.vcpu_fd, err, V4, u128, fpu.v[4])?;
-        set_reg!(&self.vcpu_fd, err, V5, u128, fpu.v[5])?;
-        set_reg!(&self.vcpu_fd, err, V6, u128, fpu.v[6])?;
-        set_reg!(&self.vcpu_fd, err, V7, u128, fpu.v[7])?;
-        set_reg!(&self.vcpu_fd, err, V8, u128, fpu.v[8])?;
-        set_reg!(&self.vcpu_fd, err, V9, u128, fpu.v[9])?;
-        set_reg!(&self.vcpu_fd, err, V10, u128, fpu.v[10])?;
-        set_reg!(&self.vcpu_fd, err, V11, u128, fpu.v[11])?;
-        set_reg!(&self.vcpu_fd, err, V12, u128, fpu.v[12])?;
-        set_reg!(&self.vcpu_fd, err, V13, u128, fpu.v[13])?;
-        set_reg!(&self.vcpu_fd, err, V14, u128, fpu.v[14])?;
-        set_reg!(&self.vcpu_fd, err, V15, u128, fpu.v[15])?;
-        set_reg!(&self.vcpu_fd, err, V16, u128, fpu.v[16])?;
-        set_reg!(&self.vcpu_fd, err, V17, u128, fpu.v[17])?;
-        set_reg!(&self.vcpu_fd, err, V18, u128, fpu.v[18])?;
-        set_reg!(&self.vcpu_fd, err, V19, u128, fpu.v[19])?;
-        set_reg!(&self.vcpu_fd, err, V20, u128, fpu.v[20])?;
-        set_reg!(&self.vcpu_fd, err, V21, u128, fpu.v[21])?;
-        set_reg!(&self.vcpu_fd, err, V22, u128, fpu.v[22])?;
-        set_reg!(&self.vcpu_fd, err, V23, u128, fpu.v[23])?;
-        set_reg!(&self.vcpu_fd, err, V24, u128, fpu.v[24])?;
-        set_reg!(&self.vcpu_fd, err, V25, u128, fpu.v[25])?;
-        set_reg!(&self.vcpu_fd, err, V26, u128, fpu.v[26])?;
-        set_reg!(&self.vcpu_fd, err, V27, u128, fpu.v[27])?;
-        set_reg!(&self.vcpu_fd, err, V28, u128, fpu.v[28])?;
-        set_reg!(&self.vcpu_fd, err, V29, u128, fpu.v[29])?;
-        set_reg!(&self.vcpu_fd, err, V30, u128, fpu.v[30])?;
-        set_reg!(&self.vcpu_fd, err, V31, u128, fpu.v[31])?;
-        set_reg!(&self.vcpu_fd, err, FPSR, u32, fpu.fpsr)?;
-        set_reg!(&self.vcpu_fd, err, FPCR, u32, fpu.fpcr)?;
+        for (i, &id) in V.iter().enumerate() {
+            set_reg_bytes::<16, _>(&self.vcpu_fd, err, id, fpu.v[i].to_ne_bytes())?;
+        }
+        set_reg_bytes::<4, _>(&self.vcpu_fd, err, FPSR, fpu.fpsr.to_ne_bytes())?;
+        set_reg_bytes::<4, _>(&self.vcpu_fd, err, FPCR, fpu.fpcr.to_ne_bytes())?;
         Ok(())
     }

Comment thread src/hyperlight_common/src/arch/aarch64/vmem.rs Outdated
andreiltd
andreiltd previously approved these changes Jun 8, 2026

@andreiltd andreiltd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to Ludvig, I don't have a ton of experience with the code that this PR touches but this is a huge improvement and I trust your expertise here.

@simongdavies

Copy link
Copy Markdown
Member

Great work !!

I too have no experience with aarch64, I did however run this through a review locally using a parallel review skill. Other than the things above this is what it came up with, I don't know if these are valid , apologies if this is noise:

  • TCR_EL1_TG1_4K uses reserved encoding 0b00 << 30
    For TG1, 4 KiB is 0b10; 0b00 is reserved. Architecturally invalid even though TTBR1 is unused. Fix: 0b10 << 30, ideally also set EPD1 to disable TTBR1 walks.
  • IO page mapped as Normal cacheable memory
    One reviewer flags this blocking/warning (speculative reads → spurious MMIO exits, needs Device-nGnRE in MAIR). Another reviewer explicitly investigated and concluded it's not a bug: the IO-page GPA has no stage-2 memslot, so accesses translation-fault with ISV valid and decode correctly.
  • EL1t exception handler uses SP_EL0, not the EL1 exception stack
    One reviewer found Guest runs in EL1t, so same-EL faults enter the CurrentSP0 vector using the main guest stack — a stack-growth/page fault could re-fault while saving the frame. Another reviewer viewed the stack handler as sound (relies on eret context-sync). Needs the author to confirm the exception-stack story — potential reentrancy hazard.
  • TCR_EL1 page-table-walk cacheability (IRGN/ORGN/SH = 0)
    Walks default to Non-cacheable/Non-shareable while the guest updates tables cacheably with only a dmb st — risk of the MMU reading stale PTEs.
  • SCTLR_EL1.I (I-cache) left disabled
  • 8-byte vs 4-byte MMIO payload convention
    aarch64 issues 64-bit stores where x86 writes 4 bytes; MMIO analysis suggests the data path copes, but it's a convention mismatch worth tightening.
  • PSTATE bit 21 (SS) asymmetry between initialise and dispatch_call_from_host
    likely harmless with debug off, but looks unintentional.
  • dummyguest test bugs
    hardcoded 0xffff_ffff_e000 no longer matches the new lower-half VA layout (0x0000_ffff_ffff_e000); and dubious ldr {0:x}, [..] with a u8 out-binding.

I also separately had Claude implement an mshv implementation based on this PR and eventually it was able to pass all tests.

This was never actually architecture-dependent, since it used the
virtual memory APIs from hyperlight_guest.  Extract it, so that it can
be used by aarch64 init in the near future.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This will be useful on multiple architectures.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This includes adding the concept of a fixed "IO page" to the memory
layout, to be used on architectures where MMIO is convenient. The IO
page is above the scratch region at the top of memory, and is mapped
in the guest, but not in the host, so that accesses to it will trigger
vmexits.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
…specific module

This changes hyperlight_guest_bin::paging to be a reexport of the
interface of an architecture-specific implementation, much like
hyperlight_common::vmem.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
@syntactically

Copy link
Copy Markdown
Member Author

While rebasing this on main, I ran into a couple of issues:

  • There was a bunch of architecture-specific stuff in the snapshot file serde wrappers. I pretty much just got rid of them in d8c8f3b, and the tests seem to pass. @ludfjig mentioned that this was intended to make backwards compatibility for format changes easier, but I think if we do do that we should probably do it just by tweaking the Serialize/Deserialize implementations. Mentioning this just in case anyone else objects.
  • The page table walker for aarch64 was based on the multi-space walker API, because I would like to migrate the amd64 one to that as I have some use cases for it in mind. I reverted the portion of Remove i686-guest, nanvix-unstable and guest-counter features #1525 that removed that API. @simongdavies Since you removed it in the first place, are you OK with that, or would you rather that it be re-added later with a more concrete use case?

The fixup! commits were getting a little bit unwieldy, so the tip of this branch now has them autosquash'd. For reviewers who would like to see deltas, I did first push the un-squash'd version of the branch, so take a look at the log of 1bccef1.

Apart from that, I addressed most of the comments in fixup commits a bit ago without commenting on each one. A couple of comments could do with more of an explanation though:

kvm-ioctls has VmFd::enable_cap(&cap) , can we use that instead of the raw KVM_CAP_ARM_NISV_TO_USER ioctls?

Unfortunately it's cfg-gated to not compile on aarch64 for some reason :(

Are both the surrounding run_immediate_exist around the vcpu_init required in reset_vcpu, or is the first one sufficient?

I'm not 100% sure without diving into the KVM implementation of stuff again to figure out if it is doing odd things, so I'd rather be safe than sorry and revisit if it becomes a perf opportunity.

IO page mapped as Normal cacheable memory

Indeed, the attributes only really matter when the access touches the real memory subsystem. Before the final attributes for a putative access could be computed, they would need to be merged with the attributes from the Stage 2 table, whose absence would cause a translation fault.

EL1t exception handler uses SP_EL0, not the EL1 exception stack

The intended discipline here is that code outside the exception handler runs in EL1t on the main guest stack (which is always in SP_EL0), and the exception handler runs in EL1h on the exception stack (which is always in SP_EL1). The hardware automatically sets SPsel when an exception is taken.

Like on x86, the exception handler is explicitly not intended to be reentrant.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This does not yet support the debug registers, and the
architecture-independent interface will have to be rationalised in the
future, since the xsave operations are unlikely to be useful on aarch64.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This is very similar to the amd64 one, but implemented with different
assembly instructions.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
These just implement the bare minimum for stack lazy-allocation and
CoW.  There is no extension mechanism for other crates (analogous to
the raw exception handler table that is presently exported on amd64)
yet; we will shortly need to add architecture-independent interfaces
to allow other crates (e.g. hyperlight-wasm) to register
specially-behaving memory ranges and intercept other errors.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
…stubs

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
There are a number of features not yet supported (e.g. debugging,
trace collection etc); however, this should implement a
minimal coherent subset of Hyperlight functionality.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Either by disabling them, or by rewriting them to do a similar thing
on aarch64.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
@ludfjig

ludfjig commented Jun 25, 2026

Copy link
Copy Markdown
Contributor
  • There was a bunch of architecture-specific stuff in the snapshot file serde wrappers. I pretty much just got rid of them in d8c8f3b, and the tests seem to pass. @ludfjig mentioned that this was intended to make backwards compatibility for format changes easier, but I think if we do do that we should probably do it just by tweaking the Serialize/Deserialize implementations. Mentioning this just in case anyone else objects.

Sorry about that, and thanks for fixing it

  • The page table walker for aarch64 was based on the multi-space walker API, because I would like to migrate the amd64 one to that as I have some use cases for it in mind. I reverted the portion of Remove i686-guest, nanvix-unstable and guest-counter features #1525 that removed that API. @simongdavies Since you removed it in the first place, are you OK with that, or would you rather that it be re-added later with a more concrete use case?

This was actually my suggestion to Simon on that PR, he didn't do this initially. I thought we didn't need it anymore without i686, whoops sorry

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This makes the logic in libc build.rs consistent with that in cargo-hyperlight.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This is required to have a version with support for building aarch64
guests.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This had resulted in a whole bunch of x86-64-specific code in the
snapshot-to-file case.  This commit repalces those mirror structs with
serde derives in the architecture-specific files that define the
system register structure for the machine.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
This is only true on x86_64.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
…ce-dump

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
danbugs
danbugs previously approved these changes Jun 25, 2026
Previously, target selection was in the calling workflow, but the
architecture selection was in the dep_fuzzing workflow, making it
difficult to skip unsupported targets on some architectures.

Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
@syntactically syntactically merged commit ea98d03 into main Jun 26, 2026
98 of 100 checks passed
@syntactically syntactically deleted the lm/aarch64 branch June 26, 2026 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants