Commit 6268f0a
mm: compaction: use the proper flag to determine watermarks
There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of
memory. I have configured 16GB of CMA memory on each NUMA node, and
starting a 32GB virtual machine with device passthrough is extremely slow,
taking almost an hour.
Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB
of no-CMA memory on a NUMA node can be used as virtual machine memory.
There is 16GB of free CMA memory on a NUMA node, which is sufficient to
pass the order-0 watermark check, causing the __compaction_suitable()
function to consistently return true.
For costly allocations, if the __compaction_suitable() function always
returns true, it causes the __alloc_pages_slowpath() function to fail to
exit at the appropriate point. This prevents timely fallback to
allocating memory on other nodes, ultimately resulting in excessively long
virtual machine startup times.
Call trace:
__alloc_pages_slowpath
if (compact_result == COMPACT_SKIPPED ||
compact_result == COMPACT_DEFERRED)
goto nopage; // should exit __alloc_pages_slowpath() from here
We could use the real unmovable allocation context to have
__zone_watermark_unusable_free() subtract CMA pages, and thus we won't
pass the order-0 check anymore once the non-CMA part is exhausted. There
is some risk that in some different scenario the compaction could in fact
migrate pages from the exhausted non-CMA part of the zone to the CMA part
and succeed, and we'll skip it instead. But only __GFP_NORETRY
allocations should be affected in the immediate "goto nopage" when
compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY
anyway and won't fail without trying to compact-migrate the non-CMA
pageblocks into CMA pageblocks first, so it should be fine.
After this fix, it only takes a few tens of seconds to start a 32GB
virtual machine with device passthrough functionality.
Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/
Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com
Signed-off-by: yangge <yangge1116@126.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>1 parent 64c37e1 commit 6268f0a
1 file changed
Lines changed: 25 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2491 | 2491 | | |
2492 | 2492 | | |
2493 | 2493 | | |
2494 | | - | |
| 2494 | + | |
| 2495 | + | |
2495 | 2496 | | |
2496 | 2497 | | |
2497 | 2498 | | |
| |||
2500 | 2501 | | |
2501 | 2502 | | |
2502 | 2503 | | |
| 2504 | + | |
| 2505 | + | |
| 2506 | + | |
| 2507 | + | |
| 2508 | + | |
| 2509 | + | |
| 2510 | + | |
| 2511 | + | |
| 2512 | + | |
| 2513 | + | |
| 2514 | + | |
| 2515 | + | |
| 2516 | + | |
| 2517 | + | |
| 2518 | + | |
| 2519 | + | |
| 2520 | + | |
2503 | 2521 | | |
2504 | 2522 | | |
2505 | 2523 | | |
| |||
2535 | 2553 | | |
2536 | 2554 | | |
2537 | 2555 | | |
2538 | | - | |
| 2556 | + | |
| 2557 | + | |
2539 | 2558 | | |
2540 | 2559 | | |
2541 | 2560 | | |
| |||
3038 | 3057 | | |
3039 | 3058 | | |
3040 | 3059 | | |
3041 | | - | |
| 3060 | + | |
| 3061 | + | |
3042 | 3062 | | |
3043 | 3063 | | |
3044 | 3064 | | |
| |||
3079 | 3099 | | |
3080 | 3100 | | |
3081 | 3101 | | |
3082 | | - | |
| 3102 | + | |
| 3103 | + | |
3083 | 3104 | | |
3084 | 3105 | | |
3085 | 3106 | | |
| |||
0 commit comments