Skip to content

Commit c8c655c

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2 parents d75439d + b3c9805 commit c8c655c

111 files changed

Lines changed: 4008 additions & 1854 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/virt/kvm/api.rst

Lines changed: 70 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5645,7 +5645,8 @@ with the KVM_XEN_VCPU_GET_ATTR ioctl.
56455645
};
56465646

56475647
Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
5648-
``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
5648+
``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned.
5649+
``length`` must not be bigger than 2^31 - PAGE_SIZE bytes. The ``addr``
56495650
field must point to a buffer which the tags will be copied to or from.
56505651

56515652
``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
@@ -6029,6 +6030,44 @@ delivery must be provided via the "reg_aen" struct.
60296030
The "pad" and "reserved" fields may be used for future extensions and should be
60306031
set to 0s by userspace.
60316032

6033+
4.138 KVM_ARM_SET_COUNTER_OFFSET
6034+
--------------------------------
6035+
6036+
:Capability: KVM_CAP_COUNTER_OFFSET
6037+
:Architectures: arm64
6038+
:Type: vm ioctl
6039+
:Parameters: struct kvm_arm_counter_offset (in)
6040+
:Returns: 0 on success, < 0 on error
6041+
6042+
This capability indicates that userspace is able to apply a single VM-wide
6043+
offset to both the virtual and physical counters as viewed by the guest
6044+
using the KVM_ARM_SET_CNT_OFFSET ioctl and the following data structure:
6045+
6046+
::
6047+
6048+
struct kvm_arm_counter_offset {
6049+
__u64 counter_offset;
6050+
__u64 reserved;
6051+
};
6052+
6053+
The offset describes a number of counter cycles that are subtracted from
6054+
both virtual and physical counter views (similar to the effects of the
6055+
CNTVOFF_EL2 and CNTPOFF_EL2 system registers, but only global). The offset
6056+
always applies to all vcpus (already created or created after this ioctl)
6057+
for this VM.
6058+
6059+
It is userspace's responsibility to compute the offset based, for example,
6060+
on previous values of the guest counters.
6061+
6062+
Any value other than 0 for the "reserved" field may result in an error
6063+
(-EINVAL) being returned. This ioctl can also return -EBUSY if any vcpu
6064+
ioctl is issued concurrently.
6065+
6066+
Note that using this ioctl results in KVM ignoring subsequent userspace
6067+
writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
6068+
interface. No error will be returned, but the resulting offset will not be
6069+
applied.
6070+
60326071
5. The kvm_run structure
60336072
========================
60346073

@@ -6218,15 +6257,40 @@ to the byte array.
62186257
__u64 nr;
62196258
__u64 args[6];
62206259
__u64 ret;
6221-
__u32 longmode;
6222-
__u32 pad;
6260+
__u64 flags;
62236261
} hypercall;
62246262

6225-
Unused. This was once used for 'hypercall to userspace'. To implement
6226-
such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
6263+
6264+
It is strongly recommended that userspace use ``KVM_EXIT_IO`` (x86) or
6265+
``KVM_EXIT_MMIO`` (all except s390) to implement functionality that
6266+
requires a guest to interact with host userpace.
62276267

62286268
.. note:: KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
62296269

6270+
For arm64:
6271+
----------
6272+
6273+
SMCCC exits can be enabled depending on the configuration of the SMCCC
6274+
filter. See the Documentation/virt/kvm/devices/vm.rst
6275+
``KVM_ARM_SMCCC_FILTER`` for more details.
6276+
6277+
``nr`` contains the function ID of the guest's SMCCC call. Userspace is
6278+
expected to use the ``KVM_GET_ONE_REG`` ioctl to retrieve the call
6279+
parameters from the vCPU's GPRs.
6280+
6281+
Definition of ``flags``:
6282+
- ``KVM_HYPERCALL_EXIT_SMC``: Indicates that the guest used the SMC
6283+
conduit to initiate the SMCCC call. If this bit is 0 then the guest
6284+
used the HVC conduit for the SMCCC call.
6285+
6286+
- ``KVM_HYPERCALL_EXIT_16BIT``: Indicates that the guest used a 16bit
6287+
instruction to initiate the SMCCC call. If this bit is 0 then the
6288+
guest used a 32bit instruction. An AArch64 guest always has this
6289+
bit set to 0.
6290+
6291+
At the point of exit, PC points to the instruction immediately following
6292+
the trapping instruction.
6293+
62306294
::
62316295

62326296
/* KVM_EXIT_TPR_ACCESS */
@@ -7266,6 +7330,7 @@ and injected exceptions.
72667330
will clear DR6.RTM.
72677331
72687332
7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
7333+
--------------------------------------
72697334

72707335
:Architectures: x86, arm64, mips
72717336
:Parameters: args[0] whether feature should be enabled or not

Documentation/virt/kvm/devices/vm.rst

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,3 +321,82 @@ Allows userspace to query the status of migration mode.
321321
if it is enabled
322322
:Returns: -EFAULT if the given address is not accessible from kernel space;
323323
0 in case of success.
324+
325+
6. GROUP: KVM_ARM_VM_SMCCC_CTRL
326+
===============================
327+
328+
:Architectures: arm64
329+
330+
6.1. ATTRIBUTE: KVM_ARM_VM_SMCCC_FILTER (w/o)
331+
---------------------------------------------
332+
333+
:Parameters: Pointer to a ``struct kvm_smccc_filter``
334+
335+
:Returns:
336+
337+
====== ===========================================
338+
EEXIST Range intersects with a previously inserted
339+
or reserved range
340+
EBUSY A vCPU in the VM has already run
341+
EINVAL Invalid filter configuration
342+
ENOMEM Failed to allocate memory for the in-kernel
343+
representation of the SMCCC filter
344+
====== ===========================================
345+
346+
Requests the installation of an SMCCC call filter described as follows::
347+
348+
enum kvm_smccc_filter_action {
349+
KVM_SMCCC_FILTER_HANDLE = 0,
350+
KVM_SMCCC_FILTER_DENY,
351+
KVM_SMCCC_FILTER_FWD_TO_USER,
352+
};
353+
354+
struct kvm_smccc_filter {
355+
__u32 base;
356+
__u32 nr_functions;
357+
__u8 action;
358+
__u8 pad[15];
359+
};
360+
361+
The filter is defined as a set of non-overlapping ranges. Each
362+
range defines an action to be applied to SMCCC calls within the range.
363+
Userspace can insert multiple ranges into the filter by using
364+
successive calls to this attribute.
365+
366+
The default configuration of KVM is such that all implemented SMCCC
367+
calls are allowed. Thus, the SMCCC filter can be defined sparsely
368+
by userspace, only describing ranges that modify the default behavior.
369+
370+
The range expressed by ``struct kvm_smccc_filter`` is
371+
[``base``, ``base + nr_functions``). The range is not allowed to wrap,
372+
i.e. userspace cannot rely on ``base + nr_functions`` overflowing.
373+
374+
The SMCCC filter applies to both SMC and HVC calls initiated by the
375+
guest. The SMCCC filter gates the in-kernel emulation of SMCCC calls
376+
and as such takes effect before other interfaces that interact with
377+
SMCCC calls (e.g. hypercall bitmap registers).
378+
379+
Actions:
380+
381+
- ``KVM_SMCCC_FILTER_HANDLE``: Allows the guest SMCCC call to be
382+
handled in-kernel. It is strongly recommended that userspace *not*
383+
explicitly describe the allowed SMCCC call ranges.
384+
385+
- ``KVM_SMCCC_FILTER_DENY``: Rejects the guest SMCCC call in-kernel
386+
and returns to the guest.
387+
388+
- ``KVM_SMCCC_FILTER_FWD_TO_USER``: The guest SMCCC call is forwarded
389+
to userspace with an exit reason of ``KVM_EXIT_HYPERCALL``.
390+
391+
The ``pad`` field is reserved for future use and must be zero. KVM may
392+
return ``-EINVAL`` if the field is nonzero.
393+
394+
KVM reserves the 'Arm Architecture Calls' range of function IDs and
395+
will reject attempts to define a filter for any portion of these ranges:
396+
397+
=========== ===============
398+
Start End (inclusive)
399+
=========== ===============
400+
0x8000_0000 0x8000_FFFF
401+
0xC000_0000 0xC000_FFFF
402+
=========== ===============

Documentation/virt/kvm/locking.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The acquisition orders for mutexes are as follows:
2121
- kvm->mn_active_invalidate_count ensures that pairs of
2222
invalidate_range_start() and invalidate_range_end() callbacks
2323
use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock
24-
are taken on the waiting side in install_new_memslots, so MMU notifiers
24+
are taken on the waiting side when modifying memslots, so MMU notifiers
2525
must not take either kvm->slots_lock or kvm->slots_arch_lock.
2626

2727
For SRCU:

arch/arm64/include/asm/kvm_host.h

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include <linux/types.h>
1717
#include <linux/jump_label.h>
1818
#include <linux/kvm_types.h>
19+
#include <linux/maple_tree.h>
1920
#include <linux/percpu.h>
2021
#include <linux/psci.h>
2122
#include <asm/arch_gicv3.h>
@@ -199,6 +200,9 @@ struct kvm_arch {
199200
/* Mandated version of PSCI */
200201
u32 psci_version;
201202

203+
/* Protects VM-scoped configuration data */
204+
struct mutex config_lock;
205+
202206
/*
203207
* If we encounter a data abort without valid instruction syndrome
204208
* information, report this to user space. User space can (and
@@ -221,7 +225,12 @@ struct kvm_arch {
221225
#define KVM_ARCH_FLAG_EL1_32BIT 4
222226
/* PSCI SYSTEM_SUSPEND enabled for the guest */
223227
#define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED 5
224-
228+
/* VM counter offset */
229+
#define KVM_ARCH_FLAG_VM_COUNTER_OFFSET 6
230+
/* Timer PPIs made immutable */
231+
#define KVM_ARCH_FLAG_TIMER_PPIS_IMMUTABLE 7
232+
/* SMCCC filter initialized for the VM */
233+
#define KVM_ARCH_FLAG_SMCCC_FILTER_CONFIGURED 8
225234
unsigned long flags;
226235

227236
/*
@@ -242,6 +251,7 @@ struct kvm_arch {
242251

243252
/* Hypercall features firmware registers' descriptor */
244253
struct kvm_smccc_features smccc_feat;
254+
struct maple_tree smccc_filter;
245255

246256
/*
247257
* For an untrusted host VM, 'pkvm.handle' is used to lookup
@@ -365,6 +375,10 @@ enum vcpu_sysreg {
365375
TPIDR_EL2, /* EL2 Software Thread ID Register */
366376
CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
367377
SP_EL2, /* EL2 Stack Pointer */
378+
CNTHP_CTL_EL2,
379+
CNTHP_CVAL_EL2,
380+
CNTHV_CTL_EL2,
381+
CNTHV_CVAL_EL2,
368382

369383
NR_SYS_REGS /* Nothing after this line! */
370384
};
@@ -522,6 +536,7 @@ struct kvm_vcpu_arch {
522536

523537
/* vcpu power state */
524538
struct kvm_mp_state mp_state;
539+
spinlock_t mp_state_lock;
525540

526541
/* Cache some mmu pages needed inside spinlock regions */
527542
struct kvm_mmu_memory_cache mmu_page_cache;
@@ -939,6 +954,9 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
939954

940955
int __init kvm_sys_reg_table_init(void);
941956

957+
bool lock_all_vcpus(struct kvm *kvm);
958+
void unlock_all_vcpus(struct kvm *kvm);
959+
942960
/* MMIO helpers */
943961
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
944962
unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
@@ -1022,8 +1040,10 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
10221040
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
10231041
struct kvm_device_attr *attr);
10241042

1025-
long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
1026-
struct kvm_arm_copy_mte_tags *copy_tags);
1043+
int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
1044+
struct kvm_arm_copy_mte_tags *copy_tags);
1045+
int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
1046+
struct kvm_arm_counter_offset *offset);
10271047

10281048
/* Guest/host FPSIMD coordination helpers */
10291049
int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
@@ -1078,6 +1098,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
10781098
(system_supports_32bit_el0() && \
10791099
!static_branch_unlikely(&arm64_mismatched_32bit_el0))
10801100

1101+
#define kvm_vm_has_ran_once(kvm) \
1102+
(test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &(kvm)->arch.flags))
1103+
10811104
int kvm_trng_call(struct kvm_vcpu *vcpu);
10821105
#ifdef CONFIG_KVM
10831106
extern phys_addr_t hyp_mem_base;

arch/arm64/include/asm/kvm_mmu.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,13 +63,15 @@
6363
* specific registers encoded in the instructions).
6464
*/
6565
.macro kern_hyp_va reg
66+
#ifndef __KVM_VHE_HYPERVISOR__
6667
alternative_cb ARM64_ALWAYS_SYSTEM, kvm_update_va_mask
6768
and \reg, \reg, #1 /* mask with va_mask */
6869
ror \reg, \reg, #1 /* rotate to the first tag bit */
6970
add \reg, \reg, #0 /* insert the low 12 bits of the tag */
7071
add \reg, \reg, #0, lsl 12 /* insert the top 12 bits of the tag */
7172
ror \reg, \reg, #63 /* rotate back */
7273
alternative_cb_end
74+
#endif
7375
.endm
7476

7577
/*
@@ -127,6 +129,7 @@ void kvm_apply_hyp_relocations(void);
127129

128130
static __always_inline unsigned long __kern_hyp_va(unsigned long v)
129131
{
132+
#ifndef __KVM_VHE_HYPERVISOR__
130133
asm volatile(ALTERNATIVE_CB("and %0, %0, #1\n"
131134
"ror %0, %0, #1\n"
132135
"add %0, %0, #0\n"
@@ -135,6 +138,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
135138
ARM64_ALWAYS_SYSTEM,
136139
kvm_update_va_mask)
137140
: "+r" (v));
141+
#endif
138142
return v;
139143
}
140144

arch/arm64/include/asm/sysreg.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,7 @@
388388

389389
#define SYS_CNTFRQ_EL0 sys_reg(3, 3, 14, 0, 0)
390390

391+
#define SYS_CNTPCT_EL0 sys_reg(3, 3, 14, 0, 1)
391392
#define SYS_CNTPCTSS_EL0 sys_reg(3, 3, 14, 0, 5)
392393
#define SYS_CNTVCTSS_EL0 sys_reg(3, 3, 14, 0, 6)
393394

@@ -400,7 +401,9 @@
400401

401402
#define SYS_AARCH32_CNTP_TVAL sys_reg(0, 0, 14, 2, 0)
402403
#define SYS_AARCH32_CNTP_CTL sys_reg(0, 0, 14, 2, 1)
404+
#define SYS_AARCH32_CNTPCT sys_reg(0, 0, 0, 14, 0)
403405
#define SYS_AARCH32_CNTP_CVAL sys_reg(0, 2, 0, 14, 0)
406+
#define SYS_AARCH32_CNTPCTSS sys_reg(0, 8, 0, 14, 0)
404407

405408
#define __PMEV_op2(n) ((n) & 0x7)
406409
#define __CNTR_CRm(n) (0x8 | (((n) >> 3) & 0x3))

0 commit comments

Comments
 (0)