Skip to content

Commit b7ce6fa

Browse files
committed
Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add "initramfs_options" parameter to set initramfs mount options. This allows to add specific mount options to the rootfs to e.g., limit the memory size - Add RWF_NOSIGNAL flag for pwritev2() Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal from being raised when writing on disconnected pipes or sockets. The flag is handled directly by the pipe filesystem and converted to the existing MSG_NOSIGNAL flag for sockets - Allow to pass pid namespace as procfs mount option Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed) This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs) But this requires forking off a helper process because you cannot just one-shot this using mount(2) * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process) While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate Things would be much easier if there was a way for userspace to just specify the pidns they want. So this pull request contains changes to implement a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); Cleanups: - Remove the last references to EXPORT_OP_ASYNC_LOCK - Make file_remove_privs_flags() static - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used - Use try_cmpxchg() in start_dir_add() - Use try_cmpxchg() in sb_init_done_wq() - Replace offsetof() with struct_size() in ioctl_file_dedupe_range() - Remove vfs_ioctl() export - Replace rwlock() with spinlock in epoll code as rwlock causes priority inversion on preempt rt kernels - Make ns_entries in fs/proc/namespaces const - Use a switch() statement() in init_special_inode() just like we do in may_open() - Use struct_size() in dir_add() in the initramfs code - Use str_plural() in rd_load_image() - Replace strcpy() with strscpy() in find_link() - Rename generic_delete_inode() to inode_just_drop() and generic_drop_inode() to inode_generic_drop() - Remove unused arguments from fcntl_{g,s}et_rw_hint() Fixes: - Document @name parameter for name_contains_dotdot() helper - Fix spelling mistake - Always return zero from replace_fd() instead of the file descriptor number - Limit the size for copy_file_range() in compat mode to prevent a signed overflow - Fix debugfs mount options not being applied - Verify the inode mode when loading it from disk in minixfs - Verify the inode mode when loading it from disk in cramfs - Don't trigger automounts with RESOLVE_NO_XDEV If openat2() was called with RESOLVE_NO_XDEV it didn't traverse through automounts, but could still trigger them - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints - Fix unused variable warning in rd_load_image() on s390 - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD - Use ns_capable_noaudit() when determining net sysctl permissions - Don't call path_put() under namespace semaphore in listmount() and statmount()" * tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) fcntl: trim arguments listmount: don't call path_put() under namespace semaphore statmount: don't call path_put() under namespace semaphore pid: use ns_capable_noaudit() when determining net sysctl permissions fs: rename generic_delete_inode() and generic_drop_inode() init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD initramfs: Replace strcpy() with strscpy() in find_link() initrd: Use str_plural() in rd_load_image() initramfs: Use struct_size() helper to improve dir_add() initrd: Fix unused variable warning in rd_load_image() on s390 fs: use the switch statement in init_special_inode() fs/proc/namespaces: make ns_entries const filelock: add FL_RECLAIM to show_fl_flags() macro eventpoll: Replace rwlock with spinlock selftests/proc: add tests for new pidns APIs procfs: add "pidns" mount option pidns: move is-ancestor logic to helper openat2: don't trigger automounts with RESOLVE_NO_XDEV namei: move cross-device check to __traverse_mounts namei: remove LOOKUP_NO_XDEV check from handle_mounts ...
2 parents fde0ab4 + 28986dd commit b7ce6fa

64 files changed

Lines changed: 582 additions & 264 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6429,6 +6429,9 @@
64296429

64306430
rootflags= [KNL] Set root filesystem mount option string
64316431

6432+
initramfs_options= [KNL]
6433+
Specify mount options for for the initramfs mount.
6434+
64326435
rootfstype= [KNL] Set root filesystem type
64336436

64346437
rootwait [KNL] Wait (indefinitely) for root device to show up.

Documentation/filesystems/porting.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -340,8 +340,8 @@ of those. Caller makes sure async writeback cannot be running for the inode whil
340340

341341
->drop_inode() returns int now; it's called on final iput() with
342342
inode->i_lock held and it returns true if filesystems wants the inode to be
343-
dropped. As before, generic_drop_inode() is still the default and it's been
344-
updated appropriately. generic_delete_inode() is also alive and it consists
343+
dropped. As before, inode_generic_drop() is still the default and it's been
344+
updated appropriately. inode_just_drop() is also alive and it consists
345345
simply of return 1. Note that all actual eviction work is done by caller after
346346
->drop_inode() returns.
347347

Documentation/filesystems/proc.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2362,6 +2362,7 @@ The following mount options are supported:
23622362
hidepid= Set /proc/<pid>/ access mode.
23632363
gid= Set the group authorized to learn processes information.
23642364
subset= Show only the specified subset of procfs.
2365+
pidns= Specify a the namespace used by this procfs.
23652366
========= ========================================================
23662367

23672368
hidepid=off or hidepid=0 means classic mode - everybody may access all
@@ -2394,6 +2395,13 @@ information about processes information, just add identd to this group.
23942395
subset=pid hides all top level files and directories in the procfs that
23952396
are not related to tasks.
23962397

2398+
pidns= specifies a pid namespace (either as a string path to something like
2399+
`/proc/$pid/ns/pid`, or a file descriptor when using `FSCONFIG_SET_FD`) that
2400+
will be used by the procfs instance when translating pids. By default, procfs
2401+
will use the calling process's active pid namespace. Note that the pid
2402+
namespace of an existing procfs instance cannot be modified (attempting to do
2403+
so will give an `-EBUSY` error).
2404+
23972405
Chapter 5: Filesystem behavior
23982406
==============================
23992407

Documentation/filesystems/vfs.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -327,11 +327,11 @@ or bottom half).
327327
inode->i_lock spinlock held.
328328

329329
This method should be either NULL (normal UNIX filesystem
330-
semantics) or "generic_delete_inode" (for filesystems that do
330+
semantics) or "inode_just_drop" (for filesystems that do
331331
not want to cache inodes - causing "delete_inode" to always be
332332
called regardless of the value of i_nlink)
333333

334-
The "generic_delete_inode()" behavior is equivalent to the old
334+
The "inode_just_drop()" behavior is equivalent to the old
335335
practice of using "force_delete" in the put_inode() case, but
336336
does not have the races that the "force_delete()" approach had.
337337

block/bdev.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -412,7 +412,7 @@ static const struct super_operations bdev_sops = {
412412
.statfs = simple_statfs,
413413
.alloc_inode = bdev_alloc_inode,
414414
.free_inode = bdev_free_inode,
415-
.drop_inode = generic_delete_inode,
415+
.drop_inode = inode_just_drop,
416416
.evict_inode = bdev_evict_inode,
417417
};
418418

drivers/dax/super.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,7 @@ static const struct super_operations dax_sops = {
388388
.alloc_inode = dax_alloc_inode,
389389
.destroy_inode = dax_destroy_inode,
390390
.free_inode = dax_free_inode,
391-
.drop_inode = generic_delete_inode,
391+
.drop_inode = inode_just_drop,
392392
};
393393

394394
static int dax_init_fs_context(struct fs_context *fc)

drivers/misc/ibmasm/ibmasmfs.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ static int ibmasmfs_init_fs_context(struct fs_context *fc)
9494

9595
static const struct super_operations ibmasmfs_s_ops = {
9696
.statfs = simple_statfs,
97-
.drop_inode = generic_delete_inode,
97+
.drop_inode = inode_just_drop,
9898
};
9999

100100
static const struct file_operations *ibmasmfs_dir_ops = &simple_dir_operations;

drivers/usb/gadget/function/f_fs.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1891,7 +1891,7 @@ static struct dentry *ffs_sb_create_file(struct super_block *sb,
18911891
/* Super block */
18921892
static const struct super_operations ffs_sb_operations = {
18931893
.statfs = simple_statfs,
1894-
.drop_inode = generic_delete_inode,
1894+
.drop_inode = inode_just_drop,
18951895
};
18961896

18971897
struct ffs_sb_fill_data {

drivers/usb/gadget/legacy/inode.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2011,7 +2011,7 @@ gadgetfs_create_file (struct super_block *sb, char const *name,
20112011

20122012
static const struct super_operations gadget_fs_operations = {
20132013
.statfs = simple_statfs,
2014-
.drop_inode = generic_delete_inode,
2014+
.drop_inode = inode_just_drop,
20152015
};
20162016

20172017
static int

fs/9p/vfs_super.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ static int v9fs_drop_inode(struct inode *inode)
252252

253253
v9ses = v9fs_inode2v9ses(inode);
254254
if (v9ses->cache & (CACHE_META|CACHE_LOOSE))
255-
return generic_drop_inode(inode);
255+
return inode_generic_drop(inode);
256256
/*
257257
* in case of non cached mode always drop the
258258
* inode because we want the inode attribute

0 commit comments

Comments
 (0)