Skip to content

Commit 616007c

Browse files
sean-jcbonzini
authored andcommitted
KVM: x86: Enhance comments for MMU roles and nested transition trickiness
Expand the comments for the MMU roles. The interactions with gfn_track PGD reuse in particular are hairy. Regarding PGD reuse, add comments in the nested virtualization flows to call out why kvm_init_mmu() is unconditionally called even when nested TDP is used. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-50-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 parent 3b77daa commit 616007c

3 files changed

Lines changed: 49 additions & 10 deletions

File tree

arch/x86/include/asm/kvm_host.h

Lines changed: 47 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -269,12 +269,36 @@ enum x86_intercept_stage;
269269
struct kvm_kernel_irq_routing_entry;
270270

271271
/*
272-
* the pages used as guest page table on soft mmu are tracked by
273-
* kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
274-
* by indirect shadow page can not be more than 15 bits.
272+
* kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
273+
* also includes TDP pages) to determine whether or not a page can be used in
274+
* the given MMU context. This is a subset of the overall kvm_mmu_role to
275+
* minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
276+
* 2 bytes per gfn instead of 4 bytes per gfn.
275277
*
276-
* Currently, we used 14 bits that are @level, @gpte_is_8_bytes, @quadrant, @access,
277-
* @efer_nx, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp.
278+
* Indirect upper-level shadow pages are tracked for write-protection via
279+
* gfn_track. As above, gfn_track is a 16 bit counter, so KVM must not create
280+
* more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
281+
* gfn_track will overflow and explosions will ensure.
282+
*
283+
* A unique shadow page (SP) for a gfn is created if and only if an existing SP
284+
* cannot be reused. The ability to reuse a SP is tracked by its role, which
285+
* incorporates various mode bits and properties of the SP. Roughly speaking,
286+
* the number of unique SPs that can theoretically be created is 2^n, where n
287+
* is the number of bits that are used to compute the role.
288+
*
289+
* But, even though there are 18 bits in the mask below, not all combinations
290+
* of modes and flags are possible. The maximum number of possible upper-level
291+
* shadow pages for a single gfn is in the neighborhood of 2^13.
292+
*
293+
* - invalid shadow pages are not accounted.
294+
* - level is effectively limited to four combinations, not 16 as the number
295+
* bits would imply, as 4k SPs are not tracked (allowed to go unsync).
296+
* - level is effectively unused for non-PAE paging because there is exactly
297+
* one upper level (see 4k SP exception above).
298+
* - quadrant is used only for non-PAE paging and is exclusive with
299+
* gpte_is_8_bytes.
300+
* - execonly and ad_disabled are used only for nested EPT, which makes it
301+
* exclusive with quadrant.
278302
*/
279303
union kvm_mmu_page_role {
280304
u32 word;
@@ -303,13 +327,26 @@ union kvm_mmu_page_role {
303327
};
304328
};
305329

306-
union kvm_mmu_extended_role {
307330
/*
308-
* This structure complements kvm_mmu_page_role caching everything needed for
309-
* MMU configuration. If nothing in both these structures changed, MMU
310-
* re-configuration can be skipped. @valid bit is set on first usage so we don't
311-
* treat all-zero structure as valid data.
331+
* kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
332+
* relevant to the current MMU configuration. When loading CR0, CR4, or EFER,
333+
* including on nested transitions, if nothing in the full role changes then
334+
* MMU re-configuration can be skipped. @valid bit is set on first usage so we
335+
* don't treat all-zero structure as valid data.
336+
*
337+
* The properties that are tracked in the extended role but not the page role
338+
* are for things that either (a) do not affect the validity of the shadow page
339+
* or (b) are indirectly reflected in the shadow page's role. For example,
340+
* CR4.PKE only affects permission checks for software walks of the guest page
341+
* tables (because KVM doesn't support Protection Keys with shadow paging), and
342+
* CR0.PG, CR4.PAE, and CR4.PSE are indirectly reflected in role.level.
343+
*
344+
* Note, SMEP and SMAP are not redundant with sm*p_andnot_wp in the page role.
345+
* If CR0.WP=1, KVM can reuse shadow pages for the guest regardless of SMEP and
346+
* SMAP, but the MMU's permission checks for software walks need to be SMEP and
347+
* SMAP aware regardless of CR0.WP.
312348
*/
349+
union kvm_mmu_extended_role {
313350
u32 word;
314351
struct {
315352
unsigned int valid:1;

arch/x86/kvm/svm/nested.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -424,6 +424,7 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
424424
vcpu->arch.cr3 = cr3;
425425
kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
426426

427+
/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
427428
kvm_init_mmu(vcpu);
428429

429430
return 0;

arch/x86/kvm/vmx/nested.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1098,6 +1098,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
10981098
vcpu->arch.cr3 = cr3;
10991099
kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
11001100

1101+
/* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */
11011102
kvm_init_mmu(vcpu);
11021103

11031104
return 0;

0 commit comments

Comments
 (0)