Skip to content

Commit 88d5baf

Browse files
neilbrownbrauner
authored andcommitted
Change inode_operations.mkdir to return struct dentry *
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent 7162858 commit 88d5baf

57 files changed

Lines changed: 274 additions & 225 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/filesystems/locking.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ prototypes::
6666
int (*link) (struct dentry *,struct inode *,struct dentry *);
6767
int (*unlink) (struct inode *,struct dentry *);
6868
int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
69-
int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
69+
struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
7070
int (*rmdir) (struct inode *,struct dentry *);
7171
int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
7272
int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,

Documentation/filesystems/porting.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1178,3 +1178,22 @@ these conditions don't require explicit checks:
11781178

11791179
LOOKUP_EXCL now means "target must not exist". It can be combined with
11801180
LOOK_CREATE or LOOKUP_RENAME_TARGET.
1181+
1182+
---
1183+
1184+
** mandatory**
1185+
1186+
->mkdir() now returns a 'struct dentry *'. If the created inode is
1187+
found to already be in cache and have a dentry (often IS_ROOT()), it will
1188+
need to be spliced into the given name in place of the given dentry.
1189+
That dentry now needs to be returned. If the original dentry is used,
1190+
NULL should be returned. Any error should be returned with
1191+
ERR_PTR().
1192+
1193+
In general, filesystems which use d_instantiate_new() to install the new
1194+
inode can safely return NULL. Filesystems which may not have an I_NEW inode
1195+
should use d_drop();d_splice_alias() and return the result of the latter.
1196+
1197+
If a positive dentry cannot be returned for some reason, in-kernel
1198+
clients such as cachefiles, nfsd, smb/server may not perform ideally but
1199+
will fail-safe.

Documentation/filesystems/vfs.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -495,7 +495,7 @@ As of kernel 2.6.22, the following members are defined:
495495
int (*link) (struct dentry *,struct inode *,struct dentry *);
496496
int (*unlink) (struct inode *,struct dentry *);
497497
int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
498-
int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
498+
struct dentry *(*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
499499
int (*rmdir) (struct inode *,struct dentry *);
500500
int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
501501
int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
@@ -562,7 +562,26 @@ otherwise noted.
562562
``mkdir``
563563
called by the mkdir(2) system call. Only required if you want
564564
to support creating subdirectories. You will probably need to
565-
call d_instantiate() just as you would in the create() method
565+
call d_instantiate_new() just as you would in the create() method.
566+
567+
If d_instantiate_new() is not used and if the fh_to_dentry()
568+
export operation is provided, or if the storage might be
569+
accessible by another path (e.g. with a network filesystem)
570+
then more care may be needed. Importantly d_instantate()
571+
should not be used with an inode that is no longer I_NEW if there
572+
any chance that the inode could already be attached to a dentry.
573+
This is because of a hard rule in the VFS that a directory must
574+
only ever have one dentry.
575+
576+
For example, if an NFS filesystem is mounted twice the new directory
577+
could be visible on the other mount before it is on the original
578+
mount, and a pair of name_to_handle_at(), open_by_handle_at()
579+
calls could instantiate the directory inode with an IS_ROOT()
580+
dentry before the first mkdir returns.
581+
582+
If there is any chance this could happen, then the new inode
583+
should be d_drop()ed and attached with d_splice_alias(). The
584+
returned dentry (if any) should be returned by ->mkdir().
566585

567586
``rmdir``
568587
called by the rmdir(2) system call. Only required if you want

fs/9p/vfs_inode.c

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -669,8 +669,8 @@ v9fs_vfs_create(struct mnt_idmap *idmap, struct inode *dir,
669669
*
670670
*/
671671

672-
static int v9fs_vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
673-
struct dentry *dentry, umode_t mode)
672+
static struct dentry *v9fs_vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
673+
struct dentry *dentry, umode_t mode)
674674
{
675675
int err;
676676
u32 perm;
@@ -692,8 +692,7 @@ static int v9fs_vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
692692

693693
if (fid)
694694
p9_fid_put(fid);
695-
696-
return err;
695+
return ERR_PTR(err);
697696
}
698697

699698
/**

fs/9p/vfs_inode_dotl.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -350,9 +350,9 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
350350
*
351351
*/
352352

353-
static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
354-
struct inode *dir, struct dentry *dentry,
355-
umode_t omode)
353+
static struct dentry *v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
354+
struct inode *dir, struct dentry *dentry,
355+
umode_t omode)
356356
{
357357
int err;
358358
struct v9fs_session_info *v9ses;
@@ -417,7 +417,7 @@ static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
417417
p9_fid_put(fid);
418418
v9fs_put_acl(dacl, pacl);
419419
p9_fid_put(dfid);
420-
return err;
420+
return ERR_PTR(err);
421421
}
422422

423423
static int

fs/affs/affs.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ extern struct dentry *affs_lookup(struct inode *dir, struct dentry *dentry, unsi
168168
extern int affs_unlink(struct inode *dir, struct dentry *dentry);
169169
extern int affs_create(struct mnt_idmap *idmap, struct inode *dir,
170170
struct dentry *dentry, umode_t mode, bool);
171-
extern int affs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
171+
extern struct dentry *affs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
172172
struct dentry *dentry, umode_t mode);
173173
extern int affs_rmdir(struct inode *dir, struct dentry *dentry);
174174
extern int affs_link(struct dentry *olddentry, struct inode *dir,

fs/affs/namei.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -273,7 +273,7 @@ affs_create(struct mnt_idmap *idmap, struct inode *dir,
273273
return 0;
274274
}
275275

276-
int
276+
struct dentry *
277277
affs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
278278
struct dentry *dentry, umode_t mode)
279279
{
@@ -285,7 +285,7 @@ affs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
285285

286286
inode = affs_new_inode(dir);
287287
if (!inode)
288-
return -ENOSPC;
288+
return ERR_PTR(-ENOSPC);
289289

290290
inode->i_mode = S_IFDIR | mode;
291291
affs_mode_to_prot(inode);
@@ -298,9 +298,9 @@ affs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
298298
clear_nlink(inode);
299299
mark_inode_dirty(inode);
300300
iput(inode);
301-
return error;
301+
return ERR_PTR(error);
302302
}
303-
return 0;
303+
return NULL;
304304
}
305305

306306
int

fs/afs/dir.c

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ static bool afs_lookup_filldir(struct dir_context *ctx, const char *name, int nl
3333
loff_t fpos, u64 ino, unsigned dtype);
3434
static int afs_create(struct mnt_idmap *idmap, struct inode *dir,
3535
struct dentry *dentry, umode_t mode, bool excl);
36-
static int afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
37-
struct dentry *dentry, umode_t mode);
36+
static struct dentry *afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
37+
struct dentry *dentry, umode_t mode);
3838
static int afs_rmdir(struct inode *dir, struct dentry *dentry);
3939
static int afs_unlink(struct inode *dir, struct dentry *dentry);
4040
static int afs_link(struct dentry *from, struct inode *dir,
@@ -1315,8 +1315,8 @@ static const struct afs_operation_ops afs_mkdir_operation = {
13151315
/*
13161316
* create a directory on an AFS filesystem
13171317
*/
1318-
static int afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
1319-
struct dentry *dentry, umode_t mode)
1318+
static struct dentry *afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
1319+
struct dentry *dentry, umode_t mode)
13201320
{
13211321
struct afs_operation *op;
13221322
struct afs_vnode *dvnode = AFS_FS_I(dir);
@@ -1328,7 +1328,7 @@ static int afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
13281328
op = afs_alloc_operation(NULL, dvnode->volume);
13291329
if (IS_ERR(op)) {
13301330
d_drop(dentry);
1331-
return PTR_ERR(op);
1331+
return ERR_CAST(op);
13321332
}
13331333

13341334
fscache_use_cookie(afs_vnode_cache(dvnode), true);
@@ -1344,7 +1344,7 @@ static int afs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
13441344
op->ops = &afs_mkdir_operation;
13451345
ret = afs_do_sync_operation(op);
13461346
afs_dir_unuse_cookie(dvnode, ret);
1347-
return ret;
1347+
return ERR_PTR(ret);
13481348
}
13491349

13501350
/*

fs/autofs/root.c

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ static int autofs_dir_symlink(struct mnt_idmap *, struct inode *,
1515
struct dentry *, const char *);
1616
static int autofs_dir_unlink(struct inode *, struct dentry *);
1717
static int autofs_dir_rmdir(struct inode *, struct dentry *);
18-
static int autofs_dir_mkdir(struct mnt_idmap *, struct inode *,
19-
struct dentry *, umode_t);
18+
static struct dentry *autofs_dir_mkdir(struct mnt_idmap *, struct inode *,
19+
struct dentry *, umode_t);
2020
static long autofs_root_ioctl(struct file *, unsigned int, unsigned long);
2121
#ifdef CONFIG_COMPAT
2222
static long autofs_root_compat_ioctl(struct file *,
@@ -720,9 +720,9 @@ static int autofs_dir_rmdir(struct inode *dir, struct dentry *dentry)
720720
return 0;
721721
}
722722

723-
static int autofs_dir_mkdir(struct mnt_idmap *idmap,
724-
struct inode *dir, struct dentry *dentry,
725-
umode_t mode)
723+
static struct dentry *autofs_dir_mkdir(struct mnt_idmap *idmap,
724+
struct inode *dir, struct dentry *dentry,
725+
umode_t mode)
726726
{
727727
struct autofs_sb_info *sbi = autofs_sbi(dir->i_sb);
728728
struct autofs_info *ino = autofs_dentry_ino(dentry);
@@ -739,7 +739,7 @@ static int autofs_dir_mkdir(struct mnt_idmap *idmap,
739739

740740
inode = autofs_get_inode(dir->i_sb, S_IFDIR | mode);
741741
if (!inode)
742-
return -ENOMEM;
742+
return ERR_PTR(-ENOMEM);
743743
d_add(dentry, inode);
744744

745745
if (sbi->version < 5)
@@ -751,7 +751,7 @@ static int autofs_dir_mkdir(struct mnt_idmap *idmap,
751751
inc_nlink(dir);
752752
inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
753753

754-
return 0;
754+
return NULL;
755755
}
756756

757757
/* Get/set timeout ioctl() operation */

fs/bad_inode.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,10 @@ static int bad_inode_symlink(struct mnt_idmap *idmap,
5858
return -EIO;
5959
}
6060

61-
static int bad_inode_mkdir(struct mnt_idmap *idmap, struct inode *dir,
62-
struct dentry *dentry, umode_t mode)
61+
static struct dentry *bad_inode_mkdir(struct mnt_idmap *idmap, struct inode *dir,
62+
struct dentry *dentry, umode_t mode)
6363
{
64-
return -EIO;
64+
return ERR_PTR(-EIO);
6565
}
6666

6767
static int bad_inode_rmdir (struct inode *dir, struct dentry *dentry)

0 commit comments

Comments
 (0)