Skip to content

Commit e914d8f

Browse files
minchanktorvalds
authored andcommitted
mm: fix unexpected zeroed page mapping with zram swap
Two processes under CLONE_VM cloning, user process can be corrupted by seeing zeroed page unexpectedly. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage valid data swap_slot_free_notify delete zram entry swap_readpage zeroed(invalid) data pte_lock map the *zero data* to userspace pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return and next refault will read zeroed data The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't increase the refcount of swap slot at copy_mm so it couldn't catch up whether it's safe or not to discard data from backing device. In the case, only the lock it could rely on to synchronize swap slot freeing is page table lock. Thus, this patch gets rid of the swap_slot_free_notify function. With this patch, CPU A will see correct data. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage original data pte_lock map the original data swap_free swap_range_free bd_disk->fops->swap_slot_free_notify swap_readpage read zeroed data pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return on next refault will see mapped data by CPU B The concern of the patch would increase memory consumption since it could keep wasted memory with compressed form in zram as well as uncompressed form in address space. However, most of cases of zram uses no readahead and do_swap_page is followed by swap_free so it will free the compressed form from in zram quickly. Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com Fixes: 0bcac06 ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by: Ivan Babrou <ivan@cloudflare.com> Tested-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent e553f62 commit e914d8f

1 file changed

Lines changed: 0 additions & 54 deletions

File tree

mm/page_io.c

Lines changed: 0 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -51,54 +51,6 @@ void end_swap_bio_write(struct bio *bio)
5151
bio_put(bio);
5252
}
5353

54-
static void swap_slot_free_notify(struct page *page)
55-
{
56-
struct swap_info_struct *sis;
57-
struct gendisk *disk;
58-
swp_entry_t entry;
59-
60-
/*
61-
* There is no guarantee that the page is in swap cache - the software
62-
* suspend code (at least) uses end_swap_bio_read() against a non-
63-
* swapcache page. So we must check PG_swapcache before proceeding with
64-
* this optimization.
65-
*/
66-
if (unlikely(!PageSwapCache(page)))
67-
return;
68-
69-
sis = page_swap_info(page);
70-
if (data_race(!(sis->flags & SWP_BLKDEV)))
71-
return;
72-
73-
/*
74-
* The swap subsystem performs lazy swap slot freeing,
75-
* expecting that the page will be swapped out again.
76-
* So we can avoid an unnecessary write if the page
77-
* isn't redirtied.
78-
* This is good for real swap storage because we can
79-
* reduce unnecessary I/O and enhance wear-leveling
80-
* if an SSD is used as the as swap device.
81-
* But if in-memory swap device (eg zram) is used,
82-
* this causes a duplicated copy between uncompressed
83-
* data in VM-owned memory and compressed data in
84-
* zram-owned memory. So let's free zram-owned memory
85-
* and make the VM-owned decompressed page *dirty*,
86-
* so the page should be swapped out somewhere again if
87-
* we again wish to reclaim it.
88-
*/
89-
disk = sis->bdev->bd_disk;
90-
entry.val = page_private(page);
91-
if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) {
92-
unsigned long offset;
93-
94-
offset = swp_offset(entry);
95-
96-
SetPageDirty(page);
97-
disk->fops->swap_slot_free_notify(sis->bdev,
98-
offset);
99-
}
100-
}
101-
10254
static void end_swap_bio_read(struct bio *bio)
10355
{
10456
struct page *page = bio_first_page_all(bio);
@@ -114,7 +66,6 @@ static void end_swap_bio_read(struct bio *bio)
11466
}
11567

11668
SetPageUptodate(page);
117-
swap_slot_free_notify(page);
11869
out:
11970
unlock_page(page);
12071
WRITE_ONCE(bio->bi_private, NULL);
@@ -394,11 +345,6 @@ int swap_readpage(struct page *page, bool synchronous)
394345
if (sis->flags & SWP_SYNCHRONOUS_IO) {
395346
ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
396347
if (!ret) {
397-
if (trylock_page(page)) {
398-
swap_slot_free_notify(page);
399-
unlock_page(page);
400-
}
401-
402348
count_vm_event(PSWPIN);
403349
goto out;
404350
}

0 commit comments

Comments
 (0)