Skip to content

Commit f587661

Browse files
ouptonMarc Zyngier
authored andcommitted
KVM: arm64: Don't split hugepages outside of MMU write lock
It is possible to take a stage-2 permission fault on a page larger than PAGE_SIZE. For example, when running a guest backed by 2M HugeTLB, KVM eagerly maps at the largest possible block size. When dirty logging is enabled on a memslot, KVM does *not* eagerly split these 2M stage-2 mappings and instead clears the write bit on the pte. Since dirty logging is always performed at PAGE_SIZE granularity, KVM lazily splits these 2M block mappings down to PAGE_SIZE in the stage-2 fault handler. This operation must be done under the write lock. Since commit f783ef1 ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging"), the stage-2 fault handler conditionally takes the read lock on permission faults with dirty logging enabled. To that end, it is possible to split a 2M block mapping while only holding the read lock. The problem is demonstrated by running kvm_page_table_test with 2M anonymous HugeTLB, which splats like so: WARNING: CPU: 5 PID: 15276 at arch/arm64/kvm/hyp/pgtable.c:153 stage2_map_walk_leaf+0x124/0x158 [...] Call trace: stage2_map_walk_leaf+0x124/0x158 stage2_map_walker+0x5c/0xf0 __kvm_pgtable_walk+0x100/0x1d4 __kvm_pgtable_walk+0x140/0x1d4 __kvm_pgtable_walk+0x140/0x1d4 kvm_pgtable_walk+0xa0/0xf8 kvm_pgtable_stage2_map+0x15c/0x198 user_mem_abort+0x56c/0x838 kvm_handle_guest_abort+0x1fc/0x2a4 handle_exit+0xa4/0x120 kvm_arch_vcpu_ioctl_run+0x200/0x448 kvm_vcpu_ioctl+0x588/0x664 __arm64_sys_ioctl+0x9c/0xd4 invoke_syscall+0x4c/0x144 el0_svc_common+0xc4/0x190 do_el0_svc+0x30/0x8c el0_svc+0x28/0xcc el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x1a4/0x1a8 Fix the issue by only acquiring the read lock if the guest faulted on a PAGE_SIZE granule w/ dirty logging enabled. Add a WARN to catch locking bugs in future changes. Fixes: f783ef1 ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging") Cc: Jing Zhang <jingzhangos@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220401194652.950240-1-oupton@google.com
1 parent 73b725c commit f587661

1 file changed

Lines changed: 7 additions & 4 deletions

File tree

arch/arm64/kvm/mmu.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1079,7 +1079,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
10791079
gfn_t gfn;
10801080
kvm_pfn_t pfn;
10811081
bool logging_active = memslot_is_logging(memslot);
1082-
bool logging_perm_fault = false;
1082+
bool use_read_lock = false;
10831083
unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
10841084
unsigned long vma_pagesize, fault_granule;
10851085
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
@@ -1114,7 +1114,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
11141114
if (logging_active) {
11151115
force_pte = true;
11161116
vma_shift = PAGE_SHIFT;
1117-
logging_perm_fault = (fault_status == FSC_PERM && write_fault);
1117+
use_read_lock = (fault_status == FSC_PERM && write_fault &&
1118+
fault_granule == PAGE_SIZE);
11181119
} else {
11191120
vma_shift = get_vma_page_shift(vma, hva);
11201121
}
@@ -1218,7 +1219,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
12181219
* logging dirty logging, only acquire read lock for permission
12191220
* relaxation.
12201221
*/
1221-
if (logging_perm_fault)
1222+
if (use_read_lock)
12221223
read_lock(&kvm->mmu_lock);
12231224
else
12241225
write_lock(&kvm->mmu_lock);
@@ -1268,6 +1269,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
12681269
if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
12691270
ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
12701271
} else {
1272+
WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n");
1273+
12711274
ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
12721275
__pfn_to_phys(pfn), prot,
12731276
memcache);
@@ -1280,7 +1283,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
12801283
}
12811284

12821285
out_unlock:
1283-
if (logging_perm_fault)
1286+
if (use_read_lock)
12841287
read_unlock(&kvm->mmu_lock);
12851288
else
12861289
write_unlock(&kvm->mmu_lock);

0 commit comments

Comments
 (0)