Skip to content

Commit feb06d2

Browse files
committed
Merge tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu: - Enhancements to Linux as the root partition for Microsoft Hypervisor: - Support a new mode called L1VH, which allows Linux to drive the hypervisor running the Azure Host directly - Support for MSHV crash dump collection - Allow Linux's memory management subsystem to better manage guest memory regions - Fix issues that prevented a clean shutdown of the whole system on bare metal and nested configurations - ARM64 support for the MSHV driver - Various other bug fixes and cleanups - Add support for Confidential VMBus for Linux guest on Hyper-V - Secure AVIC support for Linux guests on Hyper-V - Add the mshv_vtl driver to allow Linux to run as the secure kernel in a higher virtual trust level for Hyper-V * tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits) mshv: Cleanly shutdown root partition with MSHV mshv: Use reboot notifier to configure sleep state mshv: Add definitions for MSHV sleep state configuration mshv: Add support for movable memory regions mshv: Add refcount and locking to mem regions mshv: Fix huge page handling in memory region traversal mshv: Move region management to mshv_regions.c mshv: Centralize guest memory region destruction mshv: Refactor and rename memory region handling functions mshv: adjust interrupt control structure for ARM64 Drivers: hv: use kmalloc_array() instead of kmalloc() mshv: Add ioctl for self targeted passthrough hvcalls Drivers: hv: Introduce mshv_vtl driver Drivers: hv: Export some symbols for mshv_vtl static_call: allow using STATIC_CALL_TRAMP_STR() from assembly mshv: Extend create partition ioctl to support cpu features mshv: Allow mappings that overlap in uaddr mshv: Fix create memory region overlap check mshv: add WQ_PERCPU to alloc_workqueue users Drivers: hv: Use kmalloc_array() instead of kmalloc() ...
2 parents c2f2b01 + 615a6e7 commit feb06d2

42 files changed

Lines changed: 5003 additions & 692 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/virt/hyperv/coco.rst

Lines changed: 138 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ These Hyper-V and VMBus memory pages are marked as decrypted:
178178

179179
* VMBus monitor pages
180180

181-
* Synthetic interrupt controller (synic) related pages (unless supplied by
181+
* Synthetic interrupt controller (SynIC) related pages (unless supplied by
182182
the paravisor)
183183

184184
* Per-cpu hypercall input and output pages (unless running with a paravisor)
@@ -232,6 +232,143 @@ with arguments explicitly describing the access. See
232232
_hv_pcifront_read_config() and _hv_pcifront_write_config() and the
233233
"use_calls" flag indicating to use hypercalls.
234234

235+
Confidential VMBus
236+
------------------
237+
The confidential VMBus enables the confidential guest not to interact with
238+
the untrusted host partition and the untrusted hypervisor. Instead, the guest
239+
relies on the trusted paravisor to communicate with the devices processing
240+
sensitive data. The hardware (SNP or TDX) encrypts the guest memory and the
241+
register state while measuring the paravisor image using the platform security
242+
processor to ensure trusted and confidential computing.
243+
244+
Confidential VMBus provides a secure communication channel between the guest
245+
and the paravisor, ensuring that sensitive data is protected from hypervisor-
246+
level access through memory encryption and register state isolation.
247+
248+
Confidential VMBus is an extension of Confidential Computing (CoCo) VMs
249+
(a.k.a. "Isolated" VMs in Hyper-V terminology). Without Confidential VMBus,
250+
guest VMBus device drivers (the "VSC"s in VMBus terminology) communicate
251+
with VMBus servers (the VSPs) running on the Hyper-V host. The
252+
communication must be through memory that has been decrypted so the
253+
host can access it. With Confidential VMBus, one or more of the VSPs reside
254+
in the trusted paravisor layer in the guest VM. Since the paravisor layer also
255+
operates in encrypted memory, the memory used for communication with
256+
such VSPs does not need to be decrypted and thereby exposed to the
257+
Hyper-V host. The paravisor is responsible for communicating securely
258+
with the Hyper-V host as necessary.
259+
260+
The data is transferred directly between the VM and a vPCI device (a.k.a.
261+
a PCI pass-thru device, see :doc:`vpci`) that is directly assigned to VTL2
262+
and that supports encrypted memory. In such a case, neither the host partition
263+
nor the hypervisor has any access to the data. The guest needs to establish
264+
a VMBus connection only with the paravisor for the channels that process
265+
sensitive data, and the paravisor abstracts the details of communicating
266+
with the specific devices away providing the guest with the well-established
267+
VSP (Virtual Service Provider) interface that has had support in the Hyper-V
268+
drivers for a decade.
269+
270+
In the case the device does not support encrypted memory, the paravisor
271+
provides bounce-buffering, and although the data is not encrypted, the backing
272+
pages aren't mapped into the host partition through SLAT. While not impossible,
273+
it becomes much more difficult for the host partition to exfiltrate the data
274+
than it would be with a conventional VMBus connection where the host partition
275+
has direct access to the memory used for communication.
276+
277+
Here is the data flow for a conventional VMBus connection (`C` stands for the
278+
client or VSC, `S` for the server or VSP, the `DEVICE` is a physical one, might
279+
be with multiple virtual functions)::
280+
281+
+---- GUEST ----+ +----- DEVICE ----+ +----- HOST -----+
282+
| | | | | |
283+
| | | | | |
284+
| | | ========== |
285+
| | | | | |
286+
| | | | | |
287+
| | | | | |
288+
+----- C -------+ +-----------------+ +------- S ------+
289+
|| ||
290+
|| ||
291+
+------||------------------ VMBus --------------------------||------+
292+
| Interrupts, MMIO |
293+
+-------------------------------------------------------------------+
294+
295+
and the Confidential VMBus connection::
296+
297+
+---- GUEST --------------- VTL0 ------+ +-- DEVICE --+
298+
| | | |
299+
| +- PARAVISOR --------- VTL2 -----+ | | |
300+
| | +-- VMBus Relay ------+ ====+================ |
301+
| | | Interrupts, MMIO | | | | |
302+
| | +-------- S ----------+ | | +------------+
303+
| | || | |
304+
| +---------+ || | |
305+
| | Linux | || OpenHCL | |
306+
| | kernel | || | |
307+
| +---- C --+-----||---------------+ |
308+
| || || |
309+
+-------++------- C -------------------+ +------------+
310+
|| | HOST |
311+
|| +---- S -----+
312+
+-------||----------------- VMBus ---------------------------||-----+
313+
| Interrupts, MMIO |
314+
+-------------------------------------------------------------------+
315+
316+
An implementation of the VMBus relay that offers the Confidential VMBus
317+
channels is available in the OpenVMM project as a part of the OpenHCL
318+
paravisor. Please refer to
319+
320+
* https://openvmm.dev/, and
321+
* https://github.com/microsoft/openvmm
322+
323+
for more information about the OpenHCL paravisor.
324+
325+
A guest that is running with a paravisor must determine at runtime if
326+
Confidential VMBus is supported by the current paravisor. The x86_64-specific
327+
approach relies on the CPUID Virtualization Stack leaf; the ARM64 implementation
328+
is expected to support the Confidential VMBus unconditionally when running
329+
ARM CCA guests.
330+
331+
Confidential VMBus is a characteristic of the VMBus connection as a whole,
332+
and of each VMBus channel that is created. When a Confidential VMBus
333+
connection is established, the paravisor provides the guest the message-passing
334+
path that is used for VMBus device creation and deletion, and it provides a
335+
per-CPU synthetic interrupt controller (SynIC) just like the SynIC that is
336+
offered by the Hyper-V host. Each VMBus device that is offered to the guest
337+
indicates the degree to which it participates in Confidential VMBus. The offer
338+
indicates if the device uses encrypted ring buffers, and if the device uses
339+
encrypted memory for DMA that is done outside the ring buffer. These settings
340+
may be different for different devices using the same Confidential VMBus
341+
connection.
342+
343+
Although these settings are separate, in practice it'll always be encrypted
344+
ring buffer only, or both encrypted ring buffer and external data. If a channel
345+
is offered by the paravisor with confidential VMBus, the ring buffer can always
346+
be encrypted since it's strictly for communication between the VTL2 paravisor
347+
and the VTL0 guest. However, other memory regions are often used for e.g. DMA,
348+
so they need to be accessible by the underlying hardware, and must be
349+
unencrypted (unless the device supports encrypted memory). Currently, there are
350+
not any VSPs in OpenHCL that support encrypted external memory, but future
351+
versions are expected to enable this capability.
352+
353+
Because some devices on a Confidential VMBus may require decrypted ring buffers
354+
and DMA transfers, the guest must interact with two SynICs -- the one provided
355+
by the paravisor and the one provided by the Hyper-V host when Confidential
356+
VMBus is not offered. Interrupts are always signaled by the paravisor SynIC,
357+
but the guest must check for messages and for channel interrupts on both SynICs.
358+
359+
In the case of a confidential VMBus, regular SynIC access by the guest is
360+
intercepted by the paravisor (this includes various MSRs such as the SIMP and
361+
SIEFP, as well as hypercalls like HvPostMessage and HvSignalEvent). If the
362+
guest actually wants to communicate with the hypervisor, it has to use special
363+
mechanisms (GHCB page on SNP, or tdcall on TDX). Messages can be of either
364+
kind: with confidential VMBus, messages use the paravisor SynIC, and if the
365+
guest chose to communicate directly to the hypervisor, they use the hypervisor
366+
SynIC. For interrupt signaling, some channels may be running on the host
367+
(non-confidential, using the VMBus relay) and use the hypervisor SynIC, and
368+
some on the paravisor and use its SynIC. The RelIDs are coordinated by the
369+
OpenHCL VMBus server and are guaranteed to be unique regardless of whether
370+
the channel originated on the host or the paravisor.
371+
235372
load_unaligned_zeropad()
236373
------------------------
237374
When transitioning memory between encrypted and decrypted, the caller of

MAINTAINERS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11705,6 +11705,7 @@ M: "K. Y. Srinivasan" <kys@microsoft.com>
1170511705
M: Haiyang Zhang <haiyangz@microsoft.com>
1170611706
M: Wei Liu <wei.liu@kernel.org>
1170711707
M: Dexuan Cui <decui@microsoft.com>
11708+
M: Long Li <longli@microsoft.com>
1170811709
L: linux-hyperv@vger.kernel.org
1170911710
S: Supported
1171011711
T: git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
@@ -11722,6 +11723,7 @@ F: arch/x86/kernel/cpu/mshyperv.c
1172211723
F: drivers/clocksource/hyperv_timer.c
1172311724
F: drivers/hid/hid-hyperv.c
1172411725
F: drivers/hv/
11726+
F: drivers/infiniband/hw/mana/
1172511727
F: drivers/input/serio/hyperv-keyboard.c
1172611728
F: drivers/iommu/hyperv-iommu.c
1172711729
F: drivers/net/ethernet/microsoft/
@@ -11740,6 +11742,7 @@ F: include/hyperv/hvhdk_mini.h
1174011742
F: include/linux/hyperv.h
1174111743
F: include/net/mana
1174211744
F: include/uapi/linux/hyperv.h
11745+
F: include/uapi/rdma/mana-abi.h
1174311746
F: net/vmw_vsock/hyperv_transport.c
1174411747
F: tools/hv/
1174511748

arch/x86/hyperv/Makefile

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,22 @@
11
# SPDX-License-Identifier: GPL-2.0-only
22
obj-y := hv_init.o mmu.o nested.o irqdomain.o ivm.o
33
obj-$(CONFIG_X86_64) += hv_apic.o
4-
obj-$(CONFIG_HYPERV_VTL_MODE) += hv_vtl.o
4+
obj-$(CONFIG_HYPERV_VTL_MODE) += hv_vtl.o mshv_vtl_asm.o
5+
6+
$(obj)/mshv_vtl_asm.o: $(obj)/mshv-asm-offsets.h
7+
8+
$(obj)/mshv-asm-offsets.h: $(obj)/mshv-asm-offsets.s FORCE
9+
$(call filechk,offsets,__MSHV_ASM_OFFSETS_H__)
510

611
ifdef CONFIG_X86_64
712
obj-$(CONFIG_PARAVIRT_SPINLOCKS) += hv_spinlock.o
13+
14+
ifdef CONFIG_MSHV_ROOT
15+
CFLAGS_REMOVE_hv_trampoline.o += -pg
16+
CFLAGS_hv_trampoline.o += -fno-stack-protector
17+
obj-$(CONFIG_CRASH_DUMP) += hv_crash.o hv_trampoline.o
18+
endif
819
endif
20+
21+
targets += mshv-asm-offsets.s
22+
clean-files += mshv-asm-offsets.h

arch/x86/hyperv/hv_apic.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,11 @@ static void hv_apic_icr_write(u32 low, u32 id)
5353
wrmsrq(HV_X64_MSR_ICR, reg_val);
5454
}
5555

56+
void hv_enable_coco_interrupt(unsigned int cpu, unsigned int vector, bool set)
57+
{
58+
apic_update_vector(cpu, vector, set);
59+
}
60+
5661
static u32 hv_apic_read(u32 reg)
5762
{
5863
u32 reg_val, hi;
@@ -293,6 +298,9 @@ static void hv_send_ipi_self(int vector)
293298

294299
void __init hv_apic_init(void)
295300
{
301+
if (cc_platform_has(CC_ATTR_SNP_SECURE_AVIC))
302+
return;
303+
296304
if (ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) {
297305
pr_info("Hyper-V: Using IPI hypercalls\n");
298306
/*

0 commit comments

Comments
 (0)