Skip to content

Commit 2110772

Browse files
committed
fs: mount detached mounts onto detached mounts
Currently, detached mounts can only be mounted onto attached mounts. This limitation makes it impossible to assemble a new private rootfs and move it into place. That's an extremely powerful concept for container and service workloads that we should support. Right now, a detached tree must be created, attached, then it can gain additional mounts and then it can either be moved (if it doesn't reside under a shared mount) or a detached mount created again. Lift this restriction. In order to allow mounting detached mounts onto other detached mounts the same permission model used for creating detached mounts from detached mounts can be used: (1) Check that the caller is privileged over the owning user namespace of it's current mount namespace. (2) Check that the caller is located in the mount namespace of the mount it wants to create a detached copy of. The origin mount namespace of the anonymous mount namespace must be the same as the caller's mount namespace. To establish this the sequence number of the caller's mount namespace and the origin sequence number of the anonymous mount namespace are compared. The caller is always located in a non-anonymous mount namespace since anonymous mount namespaces cannot be setns()ed into. The caller's mount namespace will thus always have a valid sequence number. The owning namespace of any mount namespace, anonymous or non-anonymous, can never change. A mount attached to a non-anonymous mount namespace can never change mount namespace. If the sequence number of the non-anonymous mount namespace and the origin sequence number of the anonymous mount namespace match, the owning namespaces must match as well. Hence, the capability check on the owning namespace of the caller's mount namespace ensures that the caller has the ability to attach the mount tree. Link: https://lore.kernel.org/r/20250221-brauner-open_tree-v1-9-dbcfcb98c676@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent f9fde81 commit 2110772

1 file changed

Lines changed: 84 additions & 2 deletions

File tree

fs/namespace.c

Lines changed: 84 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2295,6 +2295,24 @@ void dissolve_on_fput(struct vfsmount *mnt)
22952295
if (!must_dissolve(ns))
22962296
return;
22972297

2298+
/*
2299+
* After must_dissolve() we know that this is a detached
2300+
* mount in an anonymous mount namespace.
2301+
*
2302+
* Now when mnt_has_parent() reports that this mount
2303+
* tree has a parent, we know that this anonymous mount
2304+
* tree has been moved to another anonymous mount
2305+
* namespace.
2306+
*
2307+
* So when closing this file we cannot unmount the mount
2308+
* tree. This will be done when the file referring to
2309+
* the root of the anonymous mount namespace will be
2310+
* closed (It could already be closed but it would sync
2311+
* on @namespace_sem and wait for us to finish.).
2312+
*/
2313+
if (mnt_has_parent(m))
2314+
return;
2315+
22982316
lock_mount_hash();
22992317
umount_tree(m, UMOUNT_CONNECTED);
23002318
unlock_mount_hash();
@@ -3437,6 +3455,54 @@ static int can_move_mount_beneath(const struct path *from,
34373455
return 0;
34383456
}
34393457

3458+
/* may_use_mount() - check if a mount tree can be used
3459+
* @mnt: vfsmount to be used
3460+
*
3461+
* This helper checks if the caller may use the mount tree starting
3462+
* from @path->mnt. The caller may use the mount tree under the
3463+
* following circumstances:
3464+
*
3465+
* (1) The caller is located in the mount namespace of the mount tree.
3466+
* This also implies that the mount does not belong to an anonymous
3467+
* mount namespace.
3468+
* (2) The caller is trying to use a mount tree that belongs to an
3469+
* anonymous mount namespace.
3470+
*
3471+
* For that to be safe, this helper enforces that the origin mount
3472+
* namespace the anonymous mount namespace was created from is the
3473+
* same as the caller's mount namespace by comparing the sequence
3474+
* numbers.
3475+
*
3476+
* The ownership of a non-anonymous mount namespace such as the
3477+
* caller's cannot change.
3478+
* => We know that the caller's mount namespace is stable.
3479+
*
3480+
* If the origin sequence number of the anonymous mount namespace is
3481+
* the same as the sequence number of the caller's mount namespace.
3482+
* => The owning namespaces are the same.
3483+
*
3484+
* ==> The earlier capability check on the owning namespace of the
3485+
* caller's mount namespace ensures that the caller has the
3486+
* ability to use the mount tree.
3487+
*
3488+
* Returns true if the mount tree can be used, false otherwise.
3489+
*/
3490+
static inline bool may_use_mount(struct mount *mnt)
3491+
{
3492+
if (check_mnt(mnt))
3493+
return true;
3494+
3495+
/*
3496+
* Make sure that noone unmounted the target path or somehow
3497+
* managed to get their hands on something purely kernel
3498+
* internal.
3499+
*/
3500+
if (!is_mounted(&mnt->mnt))
3501+
return false;
3502+
3503+
return check_anonymous_mnt(mnt);
3504+
}
3505+
34403506
static int do_move_mount(struct path *old_path,
34413507
struct path *new_path, enum mnt_tree_flags_t flags)
34423508
{
@@ -3462,8 +3528,14 @@ static int do_move_mount(struct path *old_path,
34623528
ns = old->mnt_ns;
34633529

34643530
err = -EINVAL;
3465-
/* The mountpoint must be in our namespace. */
3466-
if (!check_mnt(p))
3531+
if (!may_use_mount(p))
3532+
goto out;
3533+
3534+
/*
3535+
* Don't allow moving an attached mount tree to an anonymous
3536+
* mount tree.
3537+
*/
3538+
if (!is_anon_ns(ns) && is_anon_ns(p->mnt_ns))
34673539
goto out;
34683540

34693541
/* The thing moved must be mounted... */
@@ -3474,6 +3546,16 @@ static int do_move_mount(struct path *old_path,
34743546
if (!(attached ? check_mnt(old) : is_anon_ns(ns)))
34753547
goto out;
34763548

3549+
/*
3550+
* Ending up with two files referring to the root of the same
3551+
* anonymous mount namespace would cause an error as this would
3552+
* mean trying to move the same mount twice into the mount tree
3553+
* which would be rejected later. But be explicit about it right
3554+
* here.
3555+
*/
3556+
if (is_anon_ns(ns) && is_anon_ns(p->mnt_ns) && ns == p->mnt_ns)
3557+
goto out;
3558+
34773559
if (old->mnt.mnt_flags & MNT_LOCKED)
34783560
goto out;
34793561

0 commit comments

Comments
 (0)