Skip to content

Commit 447000e

Browse files
committed
Disable MADV_DONTNEED zeroing optimisation on mshv
In 1f94fb3, some odd behaviour with MSHV and MAP_PRIVATE mappings was noticed. It turns out that this was actually due to the hypervisor pinning the old page structures, which meant that `madvise(MADV_DONTNEED)` resulted in the userspace view of the memory in question becoming completely divorced from the hypervisor view. Switching (back) to MAP_SHARED "fixed" this only because it meant that zeroing was actually ineffective for both the host and the guest (so at least host writes were reflected in the guest): `madvise(2)` notes that shared anonymous mappings will have their contents repopulated on access after an `MADV_DONTNEED`. This commit switches back to `MAP_PRIVATE` (so that the optimisation is correct on KVM, where it works) and disables the optimisation on MSHV, where the scratch region will always be zeroed by writing zeroes to it. The original intent of lazily zeroing/populating the memory will likely only be possible on MSHV with kernel/hypervisor changes for support. Signed-off-by: Lucy Menon <168595099+syntactically@users.noreply.github.com>
1 parent b54dd6d commit 447000e

1 file changed

Lines changed: 26 additions & 6 deletions

File tree

src/hyperlight_host/src/mem/shared_mem.rs

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,8 @@ unsafe impl Send for GuestSharedMemory {}
169169
/// communication buffers, allowing it to be used concurrently with a
170170
/// GuestSharedMemory.
171171
///
172+
/// # Concurrency model
173+
///
172174
/// Given future requirements for asynchronous I/O with a minimum
173175
/// amount of copying (e.g. WASIp3 streams), we would like it to be
174176
/// possible to safely access these buffers concurrently with the
@@ -299,6 +301,24 @@ unsafe impl Send for GuestSharedMemory {}
299301
/// \[4\] P1152R0: Deprecating `volatile`. JF Bastien. <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1152r0.html>
300302
/// \[5\] P1382R1: `volatile_load<T>` and `volatile_store<T>`. JF Bastien, Paul McKenney, Jeffrey Yasskin, and the indefatigable TBD. <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1382r1.pdf>
301303
/// \[6\] Documentation for std::sync::atomic::fence. <https://doc.rust-lang.org/std/sync/atomic/fn.fence.html>
304+
///
305+
/// # Note \[Keeping mappings in sync between userspace and the guest\]
306+
///
307+
/// When using this structure with mshv on Linux, it is necessary to
308+
/// be a little bit careful: since the hypervisor is not directly
309+
/// integrated with the host kernel virtual memory subsystem, it is
310+
/// easy for the memory region in userspace to get out of sync with
311+
/// the memory region mapped into the guest. Generally speaking, when
312+
/// the [`SharedMemory`] is mapped into a partition, the MSHV kernel
313+
/// module will call `pin_user_pages(FOLL_PIN|FOLL_WRITE)` on it,
314+
/// which will eagerly do any CoW, etc needing to obtain backing pages
315+
/// pinned in memory, and then map precisely those backing pages into
316+
/// the virtual machine. After that, the backing pages mapped into the
317+
/// VM will not change until the region is unmapped or remapped. This
318+
/// means that code in this module needs to be very careful to avoid
319+
/// changing the backing pages of the region in the host userspace,
320+
/// since that would result in hyperlight-host's view of the memory
321+
/// becoming completely divorced from the view of the VM.
302322
#[derive(Clone, Debug)]
303323
pub struct HostSharedMemory {
304324
region: Arc<HostMapping>,
@@ -314,11 +334,9 @@ impl ExclusiveSharedMemory {
314334
#[cfg(target_os = "linux")]
315335
#[instrument(skip_all, parent = Span::current(), level= "Trace")]
316336
pub fn new(min_size_bytes: usize) -> Result<Self> {
317-
#[cfg(miri)]
318-
use libc::MAP_PRIVATE;
319-
use libc::{MAP_ANONYMOUS, MAP_FAILED, PROT_READ, PROT_WRITE, c_int, mmap, off_t, size_t};
337+
use libc::{MAP_ANONYMOUS, MAP_PRIVATE, MAP_FAILED, PROT_READ, PROT_WRITE, c_int, mmap, off_t, size_t};
320338
#[cfg(not(miri))]
321-
use libc::{MAP_NORESERVE, MAP_SHARED, PROT_NONE, mprotect};
339+
use libc::{MAP_NORESERVE, PROT_NONE, mprotect};
322340

323341
if min_size_bytes == 0 {
324342
return Err(new_error!("Cannot create shared memory with size 0"));
@@ -346,7 +364,7 @@ impl ExclusiveSharedMemory {
346364

347365
// allocate the memory
348366
#[cfg(not(miri))]
349-
let flags = MAP_ANONYMOUS | MAP_SHARED | MAP_NORESERVE;
367+
let flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE;
350368
#[cfg(miri)]
351369
let flags = MAP_ANONYMOUS | MAP_PRIVATE;
352370

@@ -764,7 +782,9 @@ pub trait SharedMemory {
764782
#[allow(unused_mut)] // unused on some platforms, although not others
765783
let mut do_copy = true;
766784
// TODO: Compare & add heuristic thresholds: mmap, MADV_DONTNEED, MADV_REMOVE, MADV_FREE (?)
767-
#[cfg(target_os = "linux")]
785+
// TODO: Find a similar lazy zeroing approach that works on MSHV.
786+
// (See Note [Keeping mappings in sync between userspace and the guest])
787+
#[cfg(all(target_os = "linux", feature = "kvm", not(any(feature = "mshv3"))))]
768788
unsafe {
769789
let ret = libc::madvise(
770790
e.region.ptr as *mut libc::c_void,

0 commit comments

Comments
 (0)