@@ -147,10 +147,29 @@ described as 'basic' will be available.
147147The new VM has no virtual cpus and no memory.
148148You probably want to use 0 as machine type.
149149
150+ X86:
151+ ^^^^
152+
153+ Supported X86 VM types can be queried via KVM_CAP_VM_TYPES.
154+
155+ S390:
156+ ^^^^^
157+
150158In order to create user controlled virtual machines on S390, check
151159KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
152160privileged user (CAP_SYS_ADMIN).
153161
162+ MIPS:
163+ ^^^^^
164+
165+ To use hardware assisted virtualization on MIPS (VZ ASE) rather than
166+ the default trap & emulate implementation (which changes the virtual
167+ memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
168+ flag KVM_VM_MIPS_VZ.
169+
170+ ARM64:
171+ ^^^^^^
172+
154173On arm64, the physical address size for a VM (IPA Size limit) is limited
155174to 40bits by default. The limit can be configured if the host supports the
156175extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
@@ -547,7 +566,7 @@ ioctl is useful if the in-kernel PIC is not used.
547566PPC:
548567^^^^
549568
550- Queues an external interrupt to be injected. This ioctl is overleaded
569+ Queues an external interrupt to be injected. This ioctl is overloaded
551570with 3 different irq values:
552571
553572a) KVM_INTERRUPT_SET
@@ -998,7 +1017,7 @@ be set in the flags field of this ioctl:
9981017The KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag requests KVM to generate
9991018the contents of the hypercall page automatically; hypercalls will be
10001019intercepted and passed to userspace through KVM_EXIT_XEN. In this
1001- ase , all of the blob size and address fields must be zero.
1020+ case , all of the blob size and address fields must be zero.
10021021
10031022The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates to KVM that userspace
10041023will always use the KVM_XEN_HVM_EVTCHN_SEND ioctl to deliver event
@@ -1103,7 +1122,7 @@ Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored.
11031122:Extended by: KVM_CAP_INTR_SHADOW
11041123:Architectures: x86, arm64
11051124:Type: vcpu ioctl
1106- :Parameters: struct kvm_vcpu_event (out)
1125+ :Parameters: struct kvm_vcpu_events (out)
11071126:Returns: 0 on success, -1 on error
11081127
11091128X86:
@@ -1226,7 +1245,7 @@ directly to the virtual CPU).
12261245:Extended by: KVM_CAP_INTR_SHADOW
12271246:Architectures: x86, arm64
12281247:Type: vcpu ioctl
1229- :Parameters: struct kvm_vcpu_event (in)
1248+ :Parameters: struct kvm_vcpu_events (in)
12301249:Returns: 0 on success, -1 on error
12311250
12321251X86:
@@ -3115,7 +3134,7 @@ as follow::
31153134 };
31163135
31173136An entry with a "page_shift" of 0 is unused. Because the array is
3118- organized in increasing order, a lookup can stop when encoutering
3137+ organized in increasing order, a lookup can stop when encountering
31193138such an entry.
31203139
31213140The "slb_enc" field provides the encoding to use in the SLB for the
@@ -3509,7 +3528,7 @@ Possible features:
35093528 - KVM_RUN and KVM_GET_REG_LIST are not available;
35103529
35113530 - KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access
3512- the scalable archietctural SVE registers
3531+ the scalable architectural SVE registers
35133532 KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or
35143533 KVM_REG_ARM64_SVE_FFR;
35153534
@@ -4455,7 +4474,7 @@ This will have undefined effects on the guest if it has not already
44554474placed itself in a quiescent state where no vcpu will make MMU enabled
44564475memory accesses.
44574476
4458- On succsful completion, the pending HPT will become the guest's active
4477+ On successful completion, the pending HPT will become the guest's active
44594478HPT and the previous HPT will be discarded.
44604479
44614480On failure, the guest will still be operating on its previous HPT.
@@ -5070,7 +5089,7 @@ before the vcpu is fully usable.
50705089
50715090Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be
50725091configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration
5073- that should be performaned and how to do it are feature-dependent.
5092+ that should be performed and how to do it are feature-dependent.
50745093
50755094Other calls that depend on a particular feature being finalized, such as
50765095KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with
@@ -5178,6 +5197,24 @@ Valid values for 'action'::
51785197 #define KVM_PMU_EVENT_ALLOW 0
51795198 #define KVM_PMU_EVENT_DENY 1
51805199
5200+ Via this API, KVM userspace can also control the behavior of the VM's fixed
5201+ counters (if any) by configuring the "action" and "fixed_counter_bitmap" fields.
5202+
5203+ Specifically, KVM follows the following pseudo-code when determining whether to
5204+ allow the guest FixCtr[i] to count its pre-defined fixed event::
5205+
5206+ FixCtr[i]_is_allowed = (action == ALLOW) && (bitmap & BIT(i)) ||
5207+ (action == DENY) && !(bitmap & BIT(i));
5208+ FixCtr[i]_is_denied = !FixCtr[i]_is_allowed;
5209+
5210+ KVM always consumes fixed_counter_bitmap, it's userspace's responsibility to
5211+ ensure fixed_counter_bitmap is set correctly, e.g. if userspace wants to define
5212+ a filter that only affects general purpose counters.
5213+
5214+ Note, the "events" field also applies to fixed counters' hardcoded event_select
5215+ and unit_mask values. "fixed_counter_bitmap" has higher priority than "events"
5216+ if there is a contradiction between the two.
5217+
518152184.121 KVM_PPC_SVM_OFF
51825219---------------------
51835220
@@ -5529,7 +5566,7 @@ KVM_XEN_ATTR_TYPE_EVTCHN
55295566 from the guest. A given sending port number may be directed back to
55305567 a specified vCPU (by APIC ID) / port / priority on the guest, or to
55315568 trigger events on an eventfd. The vCPU and priority can be changed
5532- by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but but other
5569+ by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but other
55335570 fields cannot change for a given sending port. A port mapping is
55345571 removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags field. Passing
55355572 KVM_XEN_EVTCHN_RESET in the flags field removes all interception of
@@ -6174,6 +6211,137 @@ to know what fields can be changed for the system register described by
61746211``op0, op1, crn, crm, op2 ``. KVM rejects ID register values that describe a
61756212superset of the features supported by the system.
61766213
6214+ 4.140 KVM_SET_USER_MEMORY_REGION2
6215+ ---------------------------------
6216+
6217+ :Capability: KVM_CAP_USER_MEMORY2
6218+ :Architectures: all
6219+ :Type: vm ioctl
6220+ :Parameters: struct kvm_userspace_memory_region2 (in)
6221+ :Returns: 0 on success, -1 on error
6222+
6223+ KVM_SET_USER_MEMORY_REGION2 is an extension to KVM_SET_USER_MEMORY_REGION that
6224+ allows mapping guest_memfd memory into a guest. All fields shared with
6225+ KVM_SET_USER_MEMORY_REGION identically. Userspace can set KVM_MEM_PRIVATE in
6226+ flags to have KVM bind the memory region to a given guest_memfd range of
6227+ [guest_memfd_offset, guest_memfd_offset + memory_size]. The target guest_memfd
6228+ must point at a file created via KVM_CREATE_GUEST_MEMFD on the current VM, and
6229+ the target range must not be bound to any other memory region. All standard
6230+ bounds checks apply (use common sense).
6231+
6232+ ::
6233+
6234+ struct kvm_userspace_memory_region2 {
6235+ __u32 slot;
6236+ __u32 flags;
6237+ __u64 guest_phys_addr;
6238+ __u64 memory_size; /* bytes */
6239+ __u64 userspace_addr; /* start of the userspace allocated memory */
6240+ __u64 guest_memfd_offset;
6241+ __u32 guest_memfd;
6242+ __u32 pad1;
6243+ __u64 pad2[14];
6244+ };
6245+
6246+ A KVM_MEM_PRIVATE region _must_ have a valid guest_memfd (private memory) and
6247+ userspace_addr (shared memory). However, "valid" for userspace_addr simply
6248+ means that the address itself must be a legal userspace address. The backing
6249+ mapping for userspace_addr is not required to be valid/populated at the time of
6250+ KVM_SET_USER_MEMORY_REGION2, e.g. shared memory can be lazily mapped/allocated
6251+ on-demand.
6252+
6253+ When mapping a gfn into the guest, KVM selects shared vs. private, i.e consumes
6254+ userspace_addr vs. guest_memfd, based on the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE
6255+ state. At VM creation time, all memory is shared, i.e. the PRIVATE attribute
6256+ is '0' for all gfns. Userspace can control whether memory is shared/private by
6257+ toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as needed.
6258+
6259+ 4.141 KVM_SET_MEMORY_ATTRIBUTES
6260+ -------------------------------
6261+
6262+ :Capability: KVM_CAP_MEMORY_ATTRIBUTES
6263+ :Architectures: x86
6264+ :Type: vm ioctl
6265+ :Parameters: struct kvm_memory_attributes(in)
6266+ :Returns: 0 on success, <0 on error
6267+
6268+ KVM_SET_MEMORY_ATTRIBUTES allows userspace to set memory attributes for a range
6269+ of guest physical memory.
6270+
6271+ ::
6272+
6273+ struct kvm_memory_attributes {
6274+ __u64 address;
6275+ __u64 size;
6276+ __u64 attributes;
6277+ __u64 flags;
6278+ };
6279+
6280+ #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
6281+
6282+ The address and size must be page aligned. The supported attributes can be
6283+ retrieved via ioctl(KVM_CHECK_EXTENSION) on KVM_CAP_MEMORY_ATTRIBUTES. If
6284+ executed on a VM, KVM_CAP_MEMORY_ATTRIBUTES precisely returns the attributes
6285+ supported by that VM. If executed at system scope, KVM_CAP_MEMORY_ATTRIBUTES
6286+ returns all attributes supported by KVM. The only attribute defined at this
6287+ time is KVM_MEMORY_ATTRIBUTE_PRIVATE, which marks the associated gfn as being
6288+ guest private memory.
6289+
6290+ Note, there is no "get" API. Userspace is responsible for explicitly tracking
6291+ the state of a gfn/page as needed.
6292+
6293+ The "flags" field is reserved for future extensions and must be '0'.
6294+
6295+ 4.142 KVM_CREATE_GUEST_MEMFD
6296+ ----------------------------
6297+
6298+ :Capability: KVM_CAP_GUEST_MEMFD
6299+ :Architectures: none
6300+ :Type: vm ioctl
6301+ :Parameters: struct struct kvm_create_guest_memfd(in)
6302+ :Returns: 0 on success, <0 on error
6303+
6304+ KVM_CREATE_GUEST_MEMFD creates an anonymous file and returns a file descriptor
6305+ that refers to it. guest_memfd files are roughly analogous to files created
6306+ via memfd_create(), e.g. guest_memfd files live in RAM, have volatile storage,
6307+ and are automatically released when the last reference is dropped. Unlike
6308+ "regular" memfd_create() files, guest_memfd files are bound to their owning
6309+ virtual machine (see below), cannot be mapped, read, or written by userspace,
6310+ and cannot be resized (guest_memfd files do however support PUNCH_HOLE).
6311+
6312+ ::
6313+
6314+ struct kvm_create_guest_memfd {
6315+ __u64 size;
6316+ __u64 flags;
6317+ __u64 reserved[6];
6318+ };
6319+
6320+ #define KVM_GUEST_MEMFD_ALLOW_HUGEPAGE (1ULL << 0)
6321+
6322+ Conceptually, the inode backing a guest_memfd file represents physical memory,
6323+ i.e. is coupled to the virtual machine as a thing, not to a "struct kvm". The
6324+ file itself, which is bound to a "struct kvm", is that instance's view of the
6325+ underlying memory, e.g. effectively provides the translation of guest addresses
6326+ to host memory. This allows for use cases where multiple KVM structures are
6327+ used to manage a single virtual machine, e.g. when performing intrahost
6328+ migration of a virtual machine.
6329+
6330+ KVM currently only supports mapping guest_memfd via KVM_SET_USER_MEMORY_REGION2,
6331+ and more specifically via the guest_memfd and guest_memfd_offset fields in
6332+ "struct kvm_userspace_memory_region2", where guest_memfd_offset is the offset
6333+ into the guest_memfd instance. For a given guest_memfd file, there can be at
6334+ most one mapping per page, i.e. binding multiple memory regions to a single
6335+ guest_memfd range is not allowed (any number of memory regions can be bound to
6336+ a single guest_memfd file, but the bound ranges must not overlap).
6337+
6338+ If KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is set in flags, KVM will attempt to allocate
6339+ and map hugepages for the guest_memfd file. This is currently best effort. If
6340+ KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is set, the size must be aligned to the maximum
6341+ transparent hugepage size supported by the kernel
6342+
6343+ See KVM_SET_USER_MEMORY_REGION2 for additional details.
6344+
617763455. The kvm_run structure
61786346========================
61796347
@@ -6806,6 +6974,30 @@ array field represents return values. The userspace should update the return
68066974values of SBI call before resuming the VCPU. For more details on RISC-V SBI
68076975spec refer, https://github.com/riscv/riscv-sbi-doc.
68086976
6977+ ::
6978+
6979+ /* KVM_EXIT_MEMORY_FAULT */
6980+ struct {
6981+ #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3)
6982+ __u64 flags;
6983+ __u64 gpa;
6984+ __u64 size;
6985+ } memory;
6986+
6987+ KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that
6988+ could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the
6989+ guest physical address range [gpa, gpa + size) of the fault. The 'flags' field
6990+ describes properties of the faulting access that are likely pertinent:
6991+
6992+ - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
6993+ on a private memory access. When clear, indicates the fault occurred on a
6994+ shared access.
6995+
6996+ Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it
6997+ accompanies a return code of '-1', not '0'! errno will always be set to EFAULT
6998+ or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace should assume
6999+ kvm_run.exit_reason is stale/undefined for all other error numbers.
7000+
68097001::
68107002
68117003 /* KVM_EXIT_NOTIFY */
@@ -7840,6 +8032,27 @@ This capability is aimed to mitigate the threat that malicious VMs can
78408032cause CPU stuck (due to event windows don't open up) and make the CPU
78418033unavailable to host or other VMs.
78428034
8035+ 7.34 KVM_CAP_MEMORY_FAULT_INFO
8036+ ------------------------------
8037+
8038+ :Architectures: x86
8039+ :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
8040+
8041+ The presence of this capability indicates that KVM_RUN will fill
8042+ kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. if
8043+ there is a valid memslot but no backing VMA for the corresponding host virtual
8044+ address.
8045+
8046+ The information in kvm_run.memory_fault is valid if and only if KVM_RUN returns
8047+ an error with errno=EFAULT or errno=EHWPOISON *and * kvm_run.exit_reason is set
8048+ to KVM_EXIT_MEMORY_FAULT.
8049+
8050+ Note: Userspaces which attempt to resolve memory faults so that they can retry
8051+ KVM_RUN are encouraged to guard against repeatedly receiving the same
8052+ error/annotated fault.
8053+
8054+ See KVM_EXIT_MEMORY_FAULT for more information.
8055+
784380568. Other capabilities.
78448057======================
78458058
@@ -8578,6 +8791,19 @@ block sizes is exposed in KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
8578879164-bit bitmap (each bit describing a block size). The default value is
857987920, to disable the eager page splitting.
85808793
8794+ 8.41 KVM_CAP_VM_TYPES
8795+ ---------------------
8796+
8797+ :Capability: KVM_CAP_MEMORY_ATTRIBUTES
8798+ :Architectures: x86
8799+ :Type: system ioctl
8800+
8801+ This capability returns a bitmap of support VM types. The 1-setting of bit @n
8802+ means the VM type with value @n is supported. Possible values of @n are::
8803+
8804+ #define KVM_X86_DEFAULT_VM 0
8805+ #define KVM_X86_SW_PROTECTED_VM 1
8806+
858188079. Known KVM API problems
85828808=========================
85838809
0 commit comments