Skip to content

Commit 7203ca4

Browse files
committed
Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2 parents ac20755 + faf3c92 commit 7203ca4

228 files changed

Lines changed: 11614 additions & 5113 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.clang-format

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,8 +140,8 @@ ForEachMacros:
140140
- 'damon_for_each_scheme_safe'
141141
- 'damon_for_each_target'
142142
- 'damon_for_each_target_safe'
143-
- 'damos_for_each_filter'
144-
- 'damos_for_each_filter_safe'
143+
- 'damos_for_each_core_filter'
144+
- 'damos_for_each_core_filter_safe'
145145
- 'damos_for_each_ops_filter'
146146
- 'damos_for_each_ops_filter_safe'
147147
- 'damos_for_each_quota_goal'

Documentation/ABI/testing/sysfs-kernel-mm-damon

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,13 @@ Description: Writing to and reading from this file sets and gets the pid of
164164
the target process if the context is for virtual address spaces
165165
monitoring, respectively.
166166

167+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/obsolete_target
168+
Date: Oct 2025
169+
Contact: SeongJae Park <sj@kernel.org>
170+
Description: Writing to and reading from this file sets and gets the
171+
obsoleteness of the matching parameters commit destination
172+
target.
173+
167174
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/nr_regions
168175
Date: Mar 2022
169176
Contact: SeongJae Park <sj@kernel.org>
@@ -303,6 +310,12 @@ Contact: SeongJae Park <sj@kernel.org>
303310
Description: Writing to and reading from this file sets and gets the nid
304311
parameter of the goal.
305312

313+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/path
314+
Date: Oct 2025
315+
Contact: SeongJae Park <sj@kernel.org>
316+
Description: Writing to and reading from this file sets and gets the path
317+
parameter of the goal.
318+
306319
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
307320
Date: Mar 2022
308321
Contact: SeongJae Park <sj@kernel.org>

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1513,6 +1513,10 @@ The following nested keys are defined.
15131513
oom_group_kill
15141514
The number of times a group OOM has occurred.
15151515

1516+
sock_throttled
1517+
The number of times network sockets associated with
1518+
this cgroup are throttled.
1519+
15161520
memory.events.local
15171521
Similar to memory.events but the fields in the file are local
15181522
to the cgroup i.e. not hierarchical. The file modified event

Documentation/admin-guide/mm/damon/lru_sort.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,28 @@ End of target memory region in physical address.
211211
The end physical address of memory region that DAMON_LRU_SORT will do work
212212
against. By default, biggest System RAM is used as the region.
213213

214+
addr_unit
215+
---------
216+
217+
A scale factor for memory addresses and bytes.
218+
219+
This parameter is for setting and getting the :ref:`address unit
220+
<damon_design_addr_unit>` parameter of the DAMON instance for DAMON_RECLAIM.
221+
222+
``monitor_region_start`` and ``monitor_region_end`` should be provided in this
223+
unit. For example, let's suppose ``addr_unit``, ``monitor_region_start`` and
224+
``monitor_region_end`` are set as ``1024``, ``0`` and ``10``, respectively.
225+
Then DAMON_LRU_SORT will work for 10 KiB length of physical address range that
226+
starts from address zero (``[0 * 1024, 10 * 1024)`` in bytes).
227+
228+
Stat parameters having ``bytes_`` prefix are also in this unit. For example,
229+
let's suppose values of ``addr_unit``, ``bytes_lru_sort_tried_hot_regions`` and
230+
``bytes_lru_sorted_hot_regions`` are ``1024``, ``42``, and ``32``,
231+
respectively. Then it means DAMON_LRU_SORT tried to LRU-sort 42 KiB of hot
232+
memory and successfully LRU-sorted 32 KiB of the memory in total.
233+
234+
If unsure, use only the default value (``1``) and forget about this.
235+
214236
kdamond_pid
215237
-----------
216238

Documentation/admin-guide/mm/damon/reclaim.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,28 @@ The end physical address of memory region that DAMON_RECLAIM will do work
232232
against. That is, DAMON_RECLAIM will find cold memory regions in this region
233233
and reclaims. By default, biggest System RAM is used as the region.
234234

235+
addr_unit
236+
---------
237+
238+
A scale factor for memory addresses and bytes.
239+
240+
This parameter is for setting and getting the :ref:`address unit
241+
<damon_design_addr_unit>` parameter of the DAMON instance for DAMON_RECLAIM.
242+
243+
``monitor_region_start`` and ``monitor_region_end`` should be provided in this
244+
unit. For example, let's suppose ``addr_unit``, ``monitor_region_start`` and
245+
``monitor_region_end`` are set as ``1024``, ``0`` and ``10``, respectively.
246+
Then DAMON_RECLAIM will work for 10 KiB length of physical address range that
247+
starts from address zero (``[0 * 1024, 10 * 1024)`` in bytes).
248+
249+
``bytes_reclaim_tried_regions`` and ``bytes_reclaimed_regions`` are also in
250+
this unit. For example, let's suppose values of ``addr_unit``,
251+
``bytes_reclaim_tried_regions`` and ``bytes_reclaimed_regions`` are ``1024``,
252+
``42``, and ``32``, respectively. Then it means DAMON_RECLAIM tried to reclaim
253+
42 KiB memory and successfully reclaimed 32 KiB memory in total.
254+
255+
If unsure, use only the default value (``1``) and forget about this.
256+
235257
skip_anon
236258
---------
237259

Documentation/admin-guide/mm/damon/stat.rst

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,20 @@ on the system's entire physical memory using DAMON, and provides simplified
1010
access monitoring results statistics, namely idle time percentiles and
1111
estimated memory bandwidth.
1212

13+
.. _damon_stat_monitoring_accuracy_overhead:
14+
1315
Monitoring Accuracy and Overhead
1416
================================
1517

1618
DAMON_STAT uses monitoring intervals :ref:`auto-tuning
1719
<damon_design_monitoring_intervals_autotuning>` to make its accuracy high and
1820
overhead minimum. It auto-tunes the intervals aiming 4 % of observable access
1921
events to be captured in each snapshot, while limiting the resulting sampling
20-
events to be 5 milliseconds in minimum and 10 seconds in maximum. On a few
22+
interval to be 5 milliseconds in minimum and 10 seconds in maximum. On a few
2123
production server systems, it resulted in consuming only 0.x % single CPU time,
22-
while capturing reasonable quality of access patterns.
24+
while capturing reasonable quality of access patterns. The tuning-resulting
25+
intervals can be retrieved via ``aggr_interval_us`` :ref:`parameter
26+
<damon_stat_aggr_interval_us>`.
2327

2428
Interface: Module Parameters
2529
============================
@@ -41,6 +45,18 @@ You can enable DAMON_STAT by setting the value of this parameter as ``Y``.
4145
Setting it as ``N`` disables DAMON_STAT. The default value is set by
4246
``CONFIG_DAMON_STAT_ENABLED_DEFAULT`` build config option.
4347

48+
.. _damon_stat_aggr_interval_us:
49+
50+
aggr_interval_us
51+
----------------
52+
53+
Auto-tuned aggregation time interval in microseconds.
54+
55+
Users can read the aggregation interval of DAMON that is being used by the
56+
DAMON instance for DAMON_STAT. It is :ref:`auto-tuned
57+
<damon_stat_monitoring_accuracy_overhead>` and therefore the value is
58+
dynamically changed.
59+
4460
estimated_memory_bandwidth
4561
--------------------------
4662

@@ -58,12 +74,13 @@ memory_idle_ms_percentiles
5874
Per-byte idle time (milliseconds) percentiles of the system.
5975

6076
DAMON_STAT calculates how long each byte of the memory was not accessed until
61-
now (idle time), based on the current DAMON results snapshot. If DAMON found a
62-
region of access frequency (nr_accesses) larger than zero, every byte of the
63-
region gets zero idle time. If a region has zero access frequency
64-
(nr_accesses), how long the region was keeping the zero access frequency (age)
65-
becomes the idle time of every byte of the region. Then, DAMON_STAT exposes
66-
the percentiles of the idle time values via this read-only parameter. Reading
67-
the parameter returns 101 idle time values in milliseconds, separated by comma.
77+
now (idle time), based on the current DAMON results snapshot. For regions
78+
having access frequency (nr_accesses) larger than zero, how long the current
79+
access frequency level was kept multiplied by ``-1`` becomes the idlee time of
80+
every byte of the region. If a region has zero access frequency (nr_accesses),
81+
how long the region was keeping the zero access frequency (age) becomes the
82+
idle time of every byte of the region. Then, DAMON_STAT exposes the
83+
percentiles of the idle time values via this read-only parameter. Reading the
84+
parameter returns 101 idle time values in milliseconds, separated by comma.
6885
Each value represents 0-th, 1st, 2nd, 3rd, ..., 99th and 100th percentile idle
6986
times.

Documentation/admin-guide/mm/damon/usage.rst

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ comma (",").
6767
│ │ │ │ │ │ │ intervals_goal/access_bp,aggrs,min_sample_us,max_sample_us
6868
│ │ │ │ │ │ nr_regions/min,max
6969
│ │ │ │ │ :ref:`targets <sysfs_targets>`/nr_targets
70-
│ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target
70+
│ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target,obsolete_target
7171
│ │ │ │ │ │ │ :ref:`regions <sysfs_regions>`/nr_regions
7272
│ │ │ │ │ │ │ │ :ref:`0 <sysfs_region>`/start,end
7373
│ │ │ │ │ │ │ │ ...
@@ -81,7 +81,7 @@ comma (",").
8181
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes
8282
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
8383
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
84-
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid
84+
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid,path
8585
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
8686
│ │ │ │ │ │ │ :ref:`{core_,ops_,}filters <sysfs_filters>`/nr_filters
8787
│ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max
@@ -134,7 +134,8 @@ Users can write below commands for the kdamond to the ``state`` file.
134134
- ``on``: Start running.
135135
- ``off``: Stop running.
136136
- ``commit``: Read the user inputs in the sysfs files except ``state`` file
137-
again.
137+
again. Monitoring :ref:`target region <sysfs_regions>` inputs are also be
138+
ignored if no target region is specified.
138139
- ``update_tuned_intervals``: Update the contents of ``sample_us`` and
139140
``aggr_us`` files of the kdamond with the auto-tuning applied ``sampling
140141
interval`` and ``aggregation interval`` for the files. Please refer to
@@ -264,13 +265,20 @@ to ``N-1``. Each directory represents each monitoring target.
264265
targets/<N>/
265266
------------
266267

267-
In each target directory, one file (``pid_target``) and one directory
268-
(``regions``) exist.
268+
In each target directory, two files (``pid_target`` and ``obsolete_target``)
269+
and one directory (``regions``) exist.
269270

270271
If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should
271272
be a process. You can specify the process to DAMON by writing the pid of the
272273
process to the ``pid_target`` file.
273274

275+
Users can selectively remove targets in the middle of the targets array by
276+
writing non-zero value to ``obsolete_target`` file and committing it (writing
277+
``commit`` to ``state`` file). DAMON will remove the matching targets from its
278+
internal targets array. Users are responsible to construct target directories
279+
again, so that those correctly represent the changed internal targets array.
280+
281+
274282
.. _sysfs_regions:
275283

276284
targets/<N>/regions
@@ -289,6 +297,11 @@ In the beginning, this directory has only one file, ``nr_regions``. Writing a
289297
number (``N``) to the file creates the number of child directories named ``0``
290298
to ``N-1``. Each directory represents each initial monitoring target region.
291299

300+
If ``nr_regions`` is zero when committing new DAMON parameters online (writing
301+
``commit`` to ``state`` file of :ref:`kdamond <sysfs_kdamond>`), the commit
302+
logic ignores the target regions. In other words, the current monitoring
303+
results for the target are preserved.
304+
292305
.. _sysfs_region:
293306

294307
regions/<N>/
@@ -402,9 +415,9 @@ number (``N``) to the file creates the number of child directories named ``0``
402415
to ``N-1``. Each directory represents each goal and current achievement.
403416
Among the multiple feedback, the best one is used.
404417

405-
Each goal directory contains four files, namely ``target_metric``,
406-
``target_value``, ``current_value`` and ``nid``. Users can set and get the
407-
four parameters for the quota auto-tuning goals that specified on the
418+
Each goal directory contains five files, namely ``target_metric``,
419+
``target_value``, ``current_value`` ``nid`` and ``path``. Users can set and
420+
get the five parameters for the quota auto-tuning goals that specified on the
408421
:ref:`design doc <damon_design_damos_quotas_auto_tuning>` by writing to and
409422
reading from each of the files. Note that users should further write
410423
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond

Documentation/admin-guide/mm/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@ the Linux memory management.
3939
shrinker_debugfs
4040
slab
4141
soft-dirty
42-
swap_numa
4342
transhuge
4443
userfaultfd
4544
zswap

Documentation/admin-guide/mm/pagemap.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,8 @@ Short descriptions to the page flags
115115
A free memory block managed by the buddy system allocator.
116116
The buddy system organizes free memory in blocks of various orders.
117117
An order N block has 2^N physically contiguous pages, with the BUDDY flag
118-
set for and _only_ for the first page.
118+
set for all pages.
119+
Before 4.6 only the first page of the block had the flag set.
119120
15 - COMPOUND_HEAD
120121
A compound page with order N consists of 2^N physically contiguous pages.
121122
A compound page with order 2 takes the form of "HTTT", where H donates its

Documentation/admin-guide/mm/swap_numa.rst

Lines changed: 0 additions & 78 deletions
This file was deleted.

0 commit comments

Comments
 (0)