Skip to content

Commit 9ea35a2

Browse files
ljskernelakpm00
authored andcommitted
mm: introduce VMA flags bitmap type
It is useful to transition to using a bitmap for VMA flags so we can avoid running out of flags, especially for 32-bit kernels which are constrained to 32 flags, necessitating some features to be limited to 64-bit kernels only. By doing so, we remove any constraint on the number of VMA flags moving forwards no matter the platform and can decide in future to extend beyond 64 if required. We start by declaring an opaque types, vma_flags_t (which resembles mm_struct flags of type mm_flags_t), setting it to precisely the same size as vm_flags_t, and place it in union with vm_flags in the VMA declaration. We additionally update struct vm_area_desc equivalently placing the new opaque type in union with vm_flags. This change therefore does not impact the size of struct vm_area_struct or struct vm_area_desc. In order for the change to be iterative and to avoid impacting performance, we designate VM_xxx declared bitmap flag values as those which must exist in the first system word of the VMA flags bitmap. We therefore declare vma_flags_clear_all(), vma_flags_overwrite_word(), vma_flags_overwrite_word(), vma_flags_overwrite_word_once(), vma_flags_set_word() and vma_flags_clear_word() in order to allow us to update the existing vm_flags_*() functions to utilise these helpers. This is a stepping stone towards converting users to the VMA flags bitmap and behaves precisely as before. By doing this, we can eliminate the existing private vma->__vm_flags field in the vma->vm_flags union and replace it with the newly introduced opaque type vma_flags, which we call flags so we refer to the new bitmap field as vma->flags. We update vma_flag_[test, set]_atomic() to account for the change also. We adapt vm_flags_reset_once() to only clear those bits above the first system word providing write-once semantics to the first system word (which it is presumed the caller requires - and in all current use cases this is so). As we currently only specify that the VMA flags bitmap size is equal to BITS_PER_LONG number of bits, this is a noop, but is defensive in preparation for a future change that increases this. We additionally update the VMA userland test declarations to implement the same changes there. Finally, we update the rust code to reference vma->vm_flags on update rather than vma->__vm_flags which has been removed. This is safe for now, albeit it is implicitly performing a const cast. Once we introduce flag helpers we can improve this more. No functional change intended. Link: https://lkml.kernel.org/r/bab179d7b153ac12f221b7d65caac2759282cfe9.1764064557.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Alice Ryhl <aliceryhl@google.com> [rust] Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Ben Segall <bsegall@google.com> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Gary Guo <gary@garyguo.net> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kees Cook <kees@kernel.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 4c613f5 commit 9ea35a2

4 files changed

Lines changed: 202 additions & 38 deletions

File tree

include/linux/mm.h

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -911,7 +911,8 @@ static inline void vm_flags_init(struct vm_area_struct *vma,
911911
vm_flags_t flags)
912912
{
913913
VM_WARN_ON_ONCE(!pgtable_supports_soft_dirty() && (flags & VM_SOFTDIRTY));
914-
ACCESS_PRIVATE(vma, __vm_flags) = flags;
914+
vma_flags_clear_all(&vma->flags);
915+
vma_flags_overwrite_word(&vma->flags, flags);
915916
}
916917

917918
/*
@@ -931,22 +932,33 @@ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
931932
vm_flags_t flags)
932933
{
933934
vma_assert_write_locked(vma);
934-
WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
935+
/*
936+
* If VMA flags exist beyond the first system word, also clear these. It
937+
* is assumed the write once behaviour is required only for the first
938+
* system word.
939+
*/
940+
if (NUM_VMA_FLAG_BITS > BITS_PER_LONG) {
941+
unsigned long *bitmap = ACCESS_PRIVATE(&vma->flags, __vma_flags);
942+
943+
bitmap_zero(&bitmap[1], NUM_VMA_FLAG_BITS - BITS_PER_LONG);
944+
}
945+
946+
vma_flags_overwrite_word_once(&vma->flags, flags);
935947
}
936948

937949
static inline void vm_flags_set(struct vm_area_struct *vma,
938950
vm_flags_t flags)
939951
{
940952
vma_start_write(vma);
941-
ACCESS_PRIVATE(vma, __vm_flags) |= flags;
953+
vma_flags_set_word(&vma->flags, flags);
942954
}
943955

944956
static inline void vm_flags_clear(struct vm_area_struct *vma,
945957
vm_flags_t flags)
946958
{
947959
VM_WARN_ON_ONCE(!pgtable_supports_soft_dirty() && (flags & VM_SOFTDIRTY));
948960
vma_start_write(vma);
949-
ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
961+
vma_flags_clear_word(&vma->flags, flags);
950962
}
951963

952964
/*
@@ -989,12 +1001,14 @@ static inline bool __vma_flag_atomic_valid(struct vm_area_struct *vma,
9891001
static inline void vma_flag_set_atomic(struct vm_area_struct *vma,
9901002
vma_flag_t bit)
9911003
{
1004+
unsigned long *bitmap = ACCESS_PRIVATE(&vma->flags, __vma_flags);
1005+
9921006
/* mmap read lock/VMA read lock must be held. */
9931007
if (!rwsem_is_locked(&vma->vm_mm->mmap_lock))
9941008
vma_assert_locked(vma);
9951009

9961010
if (__vma_flag_atomic_valid(vma, bit))
997-
set_bit((__force int)bit, &ACCESS_PRIVATE(vma, __vm_flags));
1011+
set_bit((__force int)bit, bitmap);
9981012
}
9991013

10001014
/*

include/linux/mm_types.h

Lines changed: 62 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -848,6 +848,15 @@ struct mmap_action {
848848
bool hide_from_rmap_until_complete :1;
849849
};
850850

851+
/*
852+
* Opaque type representing current VMA (vm_area_struct) flag state. Must be
853+
* accessed via vma_flags_xxx() helper functions.
854+
*/
855+
#define NUM_VMA_FLAG_BITS BITS_PER_LONG
856+
typedef struct {
857+
DECLARE_BITMAP(__vma_flags, NUM_VMA_FLAG_BITS);
858+
} __private vma_flags_t;
859+
851860
/*
852861
* Describes a VMA that is about to be mmap()'ed. Drivers may choose to
853862
* manipulate mutable fields which will cause those fields to be updated in the
@@ -865,7 +874,10 @@ struct vm_area_desc {
865874
/* Mutable fields. Populated with initial state. */
866875
pgoff_t pgoff;
867876
struct file *vm_file;
868-
vm_flags_t vm_flags;
877+
union {
878+
vm_flags_t vm_flags;
879+
vma_flags_t vma_flags;
880+
};
869881
pgprot_t page_prot;
870882

871883
/* Write-only fields. */
@@ -910,10 +922,12 @@ struct vm_area_struct {
910922
/*
911923
* Flags, see mm.h.
912924
* To modify use vm_flags_{init|reset|set|clear|mod} functions.
925+
* Preferably, use vma_flags_xxx() functions.
913926
*/
914927
union {
928+
/* Temporary while VMA flags are being converted. */
915929
const vm_flags_t vm_flags;
916-
vm_flags_t __private __vm_flags;
930+
vma_flags_t flags;
917931
};
918932

919933
#ifdef CONFIG_PER_VMA_LOCK
@@ -994,6 +1008,52 @@ struct vm_area_struct {
9941008
#endif
9951009
} __randomize_layout;
9961010

1011+
/* Clears all bits in the VMA flags bitmap, non-atomically. */
1012+
static inline void vma_flags_clear_all(vma_flags_t *flags)
1013+
{
1014+
bitmap_zero(ACCESS_PRIVATE(flags, __vma_flags), NUM_VMA_FLAG_BITS);
1015+
}
1016+
1017+
/*
1018+
* Copy value to the first system word of VMA flags, non-atomically.
1019+
*
1020+
* IMPORTANT: This does not overwrite bytes past the first system word. The
1021+
* caller must account for this.
1022+
*/
1023+
static inline void vma_flags_overwrite_word(vma_flags_t *flags, unsigned long value)
1024+
{
1025+
*ACCESS_PRIVATE(flags, __vma_flags) = value;
1026+
}
1027+
1028+
/*
1029+
* Copy value to the first system word of VMA flags ONCE, non-atomically.
1030+
*
1031+
* IMPORTANT: This does not overwrite bytes past the first system word. The
1032+
* caller must account for this.
1033+
*/
1034+
static inline void vma_flags_overwrite_word_once(vma_flags_t *flags, unsigned long value)
1035+
{
1036+
unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
1037+
1038+
WRITE_ONCE(*bitmap, value);
1039+
}
1040+
1041+
/* Update the first system word of VMA flags setting bits, non-atomically. */
1042+
static inline void vma_flags_set_word(vma_flags_t *flags, unsigned long value)
1043+
{
1044+
unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
1045+
1046+
*bitmap |= value;
1047+
}
1048+
1049+
/* Update the first system word of VMA flags clearing bits, non-atomically. */
1050+
static inline void vma_flags_clear_word(vma_flags_t *flags, unsigned long value)
1051+
{
1052+
unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
1053+
1054+
*bitmap &= ~value;
1055+
}
1056+
9971057
#ifdef CONFIG_NUMA
9981058
#define vma_policy(vma) ((vma)->vm_policy)
9991059
#else

rust/kernel/mm/virt.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -250,7 +250,7 @@ impl VmaNew {
250250
// SAFETY: This is not a data race: the vma is undergoing initial setup, so it's not yet
251251
// shared. Additionally, `VmaNew` is `!Sync`, so it cannot be used to write in parallel.
252252
// The caller promises that this does not set the flags to an invalid value.
253-
unsafe { (*self.as_ptr()).__bindgen_anon_2.__vm_flags = flags };
253+
unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags = flags };
254254
}
255255

256256
/// Set the `VM_MIXEDMAP` flag on this vma.

tools/testing/vma/vma_internal.h

Lines changed: 120 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,15 @@ typedef struct {
524524
__private DECLARE_BITMAP(__mm_flags, NUM_MM_FLAG_BITS);
525525
} mm_flags_t;
526526

527+
/*
528+
* Opaque type representing current VMA (vm_area_struct) flag state. Must be
529+
* accessed via vma_flags_xxx() helper functions.
530+
*/
531+
#define NUM_VMA_FLAG_BITS BITS_PER_LONG
532+
typedef struct {
533+
DECLARE_BITMAP(__vma_flags, NUM_VMA_FLAG_BITS);
534+
} __private vma_flags_t;
535+
527536
struct mm_struct {
528537
struct maple_tree mm_mt;
529538
int map_count; /* number of VMAs */
@@ -608,7 +617,10 @@ struct vm_area_desc {
608617
/* Mutable fields. Populated with initial state. */
609618
pgoff_t pgoff;
610619
struct file *vm_file;
611-
vm_flags_t vm_flags;
620+
union {
621+
vm_flags_t vm_flags;
622+
vma_flags_t vma_flags;
623+
};
612624
pgprot_t page_prot;
613625

614626
/* Write-only fields. */
@@ -654,7 +666,7 @@ struct vm_area_struct {
654666
*/
655667
union {
656668
const vm_flags_t vm_flags;
657-
vm_flags_t __private __vm_flags;
669+
vma_flags_t flags;
658670
};
659671

660672
#ifdef CONFIG_PER_VMA_LOCK
@@ -1368,26 +1380,6 @@ static inline bool may_expand_vm(struct mm_struct *mm, vm_flags_t flags,
13681380
return true;
13691381
}
13701382

1371-
static inline void vm_flags_init(struct vm_area_struct *vma,
1372-
vm_flags_t flags)
1373-
{
1374-
vma->__vm_flags = flags;
1375-
}
1376-
1377-
static inline void vm_flags_set(struct vm_area_struct *vma,
1378-
vm_flags_t flags)
1379-
{
1380-
vma_start_write(vma);
1381-
vma->__vm_flags |= flags;
1382-
}
1383-
1384-
static inline void vm_flags_clear(struct vm_area_struct *vma,
1385-
vm_flags_t flags)
1386-
{
1387-
vma_start_write(vma);
1388-
vma->__vm_flags &= ~flags;
1389-
}
1390-
13911383
static inline int shmem_zero_setup(struct vm_area_struct *vma)
13921384
{
13931385
return 0;
@@ -1544,13 +1536,118 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm,
15441536
{
15451537
}
15461538

1547-
# define ACCESS_PRIVATE(p, member) ((p)->member)
1539+
#define ACCESS_PRIVATE(p, member) ((p)->member)
1540+
1541+
#define bitmap_size(nbits) (ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE)
1542+
1543+
static __always_inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
1544+
{
1545+
unsigned int len = bitmap_size(nbits);
1546+
1547+
if (small_const_nbits(nbits))
1548+
*dst = 0;
1549+
else
1550+
memset(dst, 0, len);
1551+
}
15481552

15491553
static inline bool mm_flags_test(int flag, const struct mm_struct *mm)
15501554
{
15511555
return test_bit(flag, ACCESS_PRIVATE(&mm->flags, __mm_flags));
15521556
}
15531557

1558+
/* Clears all bits in the VMA flags bitmap, non-atomically. */
1559+
static inline void vma_flags_clear_all(vma_flags_t *flags)
1560+
{
1561+
bitmap_zero(ACCESS_PRIVATE(flags, __vma_flags), NUM_VMA_FLAG_BITS);
1562+
}
1563+
1564+
/*
1565+
* Copy value to the first system word of VMA flags, non-atomically.
1566+
*
1567+
* IMPORTANT: This does not overwrite bytes past the first system word. The
1568+
* caller must account for this.
1569+
*/
1570+
static inline void vma_flags_overwrite_word(vma_flags_t *flags, unsigned long value)
1571+
{
1572+
*ACCESS_PRIVATE(flags, __vma_flags) = value;
1573+
}
1574+
1575+
/*
1576+
* Copy value to the first system word of VMA flags ONCE, non-atomically.
1577+
*
1578+
* IMPORTANT: This does not overwrite bytes past the first system word. The
1579+
* caller must account for this.
1580+
*/
1581+
static inline void vma_flags_overwrite_word_once(vma_flags_t *flags, unsigned long value)
1582+
{
1583+
unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
1584+
1585+
WRITE_ONCE(*bitmap, value);
1586+
}
1587+
1588+
/* Update the first system word of VMA flags setting bits, non-atomically. */
1589+
static inline void vma_flags_set_word(vma_flags_t *flags, unsigned long value)
1590+
{
1591+
unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
1592+
1593+
*bitmap |= value;
1594+
}
1595+
1596+
/* Update the first system word of VMA flags clearing bits, non-atomically. */
1597+
static inline void vma_flags_clear_word(vma_flags_t *flags, unsigned long value)
1598+
{
1599+
unsigned long *bitmap = ACCESS_PRIVATE(flags, __vma_flags);
1600+
1601+
*bitmap &= ~value;
1602+
}
1603+
1604+
1605+
/* Use when VMA is not part of the VMA tree and needs no locking */
1606+
static inline void vm_flags_init(struct vm_area_struct *vma,
1607+
vm_flags_t flags)
1608+
{
1609+
vma_flags_clear_all(&vma->flags);
1610+
vma_flags_overwrite_word(&vma->flags, flags);
1611+
}
1612+
1613+
/*
1614+
* Use when VMA is part of the VMA tree and modifications need coordination
1615+
* Note: vm_flags_reset and vm_flags_reset_once do not lock the vma and
1616+
* it should be locked explicitly beforehand.
1617+
*/
1618+
static inline void vm_flags_reset(struct vm_area_struct *vma,
1619+
vm_flags_t flags)
1620+
{
1621+
vma_assert_write_locked(vma);
1622+
vm_flags_init(vma, flags);
1623+
}
1624+
1625+
static inline void vm_flags_reset_once(struct vm_area_struct *vma,
1626+
vm_flags_t flags)
1627+
{
1628+
vma_assert_write_locked(vma);
1629+
/*
1630+
* The user should only be interested in avoiding reordering of
1631+
* assignment to the first word.
1632+
*/
1633+
vma_flags_clear_all(&vma->flags);
1634+
vma_flags_overwrite_word_once(&vma->flags, flags);
1635+
}
1636+
1637+
static inline void vm_flags_set(struct vm_area_struct *vma,
1638+
vm_flags_t flags)
1639+
{
1640+
vma_start_write(vma);
1641+
vma_flags_set_word(&vma->flags, flags);
1642+
}
1643+
1644+
static inline void vm_flags_clear(struct vm_area_struct *vma,
1645+
vm_flags_t flags)
1646+
{
1647+
vma_start_write(vma);
1648+
vma_flags_clear_word(&vma->flags, flags);
1649+
}
1650+
15541651
/*
15551652
* Denies creating a writable executable mapping or gaining executable permissions.
15561653
*
@@ -1763,11 +1860,4 @@ static inline int do_munmap(struct mm_struct *, unsigned long, size_t,
17631860
return 0;
17641861
}
17651862

1766-
static inline void vm_flags_reset(struct vm_area_struct *vma, vm_flags_t flags)
1767-
{
1768-
vm_flags_t *dst = (vm_flags_t *)(&vma->vm_flags);
1769-
1770-
*dst = flags;
1771-
}
1772-
17731863
#endif /* __MM_VMA_INTERNAL_H */

0 commit comments

Comments
 (0)