Skip to content

Commit 84422ae

Browse files
committed
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner: "This contains the usual miscellaneous fixes and cleanups for vfs and individual fses: Fixes: - Revert ki_pos on error from buffered writes for direct io fallback - Add missing documentation for block device and superblock handling for changes merged this cycle - Fix reiserfs flexible array usage - Ensure that overlayfs sets ctime when setting mtime and atime - Disable deferred caller completions with overlayfs writes until proper support exists Cleanups: - Remove duplicate initialization in pipe code - Annotate aio kioctx_table with __counted_by" * tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: overlayfs: set ctime when setting mtime and atime ntfs3: put resources during ntfs_fill_super() ovl: disable IOCB_DIO_CALLER_COMP porting: document superblock as block device holder porting: document new block device opening order fs/pipe: remove duplicate "offset" initializer fs-writeback: do not requeue a clean inode having skipped pages aio: Annotate struct kioctx_table with __counted_by direct_write_fallback(): on error revert the ->ki_pos update from buffered write reiserfs: Replace 1-element array with C99 style flex-array
2 parents 5c519bc + 03dbab3 commit 84422ae

9 files changed

Lines changed: 117 additions & 9 deletions

File tree

Documentation/filesystems/porting.rst

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -949,3 +949,99 @@ mmap_lock held. All in-tree users have been audited and do not seem to
949949
depend on the mmap_lock being held, but out of tree users should verify
950950
for themselves. If they do need it, they can return VM_FAULT_RETRY to
951951
be called with the mmap_lock held.
952+
953+
---
954+
955+
**mandatory**
956+
957+
The order of opening block devices and matching or creating superblocks has
958+
changed.
959+
960+
The old logic opened block devices first and then tried to find a
961+
suitable superblock to reuse based on the block device pointer.
962+
963+
The new logic tries to find a suitable superblock first based on the device
964+
number, and opening the block device afterwards.
965+
966+
Since opening block devices cannot happen under s_umount because of lock
967+
ordering requirements s_umount is now dropped while opening block devices and
968+
reacquired before calling fill_super().
969+
970+
In the old logic concurrent mounters would find the superblock on the list of
971+
superblocks for the filesystem type. Since the first opener of the block device
972+
would hold s_umount they would wait until the superblock became either born or
973+
was discarded due to initialization failure.
974+
975+
Since the new logic drops s_umount concurrent mounters could grab s_umount and
976+
would spin. Instead they are now made to wait using an explicit wait-wake
977+
mechanism without having to hold s_umount.
978+
979+
---
980+
981+
**mandatory**
982+
983+
The holder of a block device is now the superblock.
984+
985+
The holder of a block device used to be the file_system_type which wasn't
986+
particularly useful. It wasn't possible to go from block device to owning
987+
superblock without matching on the device pointer stored in the superblock.
988+
This mechanism would only work for a single device so the block layer couldn't
989+
find the owning superblock of any additional devices.
990+
991+
In the old mechanism reusing or creating a superblock for a racing mount(2) and
992+
umount(2) relied on the file_system_type as the holder. This was severly
993+
underdocumented however:
994+
995+
(1) Any concurrent mounter that managed to grab an active reference on an
996+
existing superblock was made to wait until the superblock either became
997+
ready or until the superblock was removed from the list of superblocks of
998+
the filesystem type. If the superblock is ready the caller would simple
999+
reuse it.
1000+
1001+
(2) If the mounter came after deactivate_locked_super() but before
1002+
the superblock had been removed from the list of superblocks of the
1003+
filesystem type the mounter would wait until the superblock was shutdown,
1004+
reuse the block device and allocate a new superblock.
1005+
1006+
(3) If the mounter came after deactivate_locked_super() and after
1007+
the superblock had been removed from the list of superblocks of the
1008+
filesystem type the mounter would reuse the block device and allocate a new
1009+
superblock (the bd_holder point may still be set to the filesystem type).
1010+
1011+
Because the holder of the block device was the file_system_type any concurrent
1012+
mounter could open the block devices of any superblock of the same
1013+
file_system_type without risking seeing EBUSY because the block device was
1014+
still in use by another superblock.
1015+
1016+
Making the superblock the owner of the block device changes this as the holder
1017+
is now a unique superblock and thus block devices associated with it cannot be
1018+
reused by concurrent mounters. So a concurrent mounter in (2) could suddenly
1019+
see EBUSY when trying to open a block device whose holder was a different
1020+
superblock.
1021+
1022+
The new logic thus waits until the superblock and the devices are shutdown in
1023+
->kill_sb(). Removal of the superblock from the list of superblocks of the
1024+
filesystem type is now moved to a later point when the devices are closed:
1025+
1026+
(1) Any concurrent mounter managing to grab an active reference on an existing
1027+
superblock is made to wait until the superblock is either ready or until
1028+
the superblock and all devices are shutdown in ->kill_sb(). If the
1029+
superblock is ready the caller will simply reuse it.
1030+
1031+
(2) If the mounter comes after deactivate_locked_super() but before
1032+
the superblock has been removed from the list of superblocks of the
1033+
filesystem type the mounter is made to wait until the superblock and the
1034+
devices are shut down in ->kill_sb() and the superblock is removed from the
1035+
list of superblocks of the filesystem type. The mounter will allocate a new
1036+
superblock and grab ownership of the block device (the bd_holder pointer of
1037+
the block device will be set to the newly allocated superblock).
1038+
1039+
(3) This case is now collapsed into (2) as the superblock is left on the list
1040+
of superblocks of the filesystem type until all devices are shutdown in
1041+
->kill_sb(). In other words, if the superblock isn't on the list of
1042+
superblock of the filesystem type anymore then it has given up ownership of
1043+
all associated block devices (the bd_holder pointer is NULL).
1044+
1045+
As this is a VFS level change it has no practical consequences for filesystems
1046+
other than that all of them must use one of the provided kill_litter_super(),
1047+
kill_anon_super(), or kill_block_super() helpers.

fs/aio.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ struct aio_ring {
8080
struct kioctx_table {
8181
struct rcu_head rcu;
8282
unsigned nr;
83-
struct kioctx __rcu *table[];
83+
struct kioctx __rcu *table[] __counted_by(nr);
8484
};
8585

8686
struct kioctx_cpu {

fs/fs-writeback.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1535,10 +1535,15 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
15351535

15361536
if (wbc->pages_skipped) {
15371537
/*
1538-
* writeback is not making progress due to locked
1539-
* buffers. Skip this inode for now.
1538+
* Writeback is not making progress due to locked buffers.
1539+
* Skip this inode for now. Although having skipped pages
1540+
* is odd for clean inodes, it can happen for some
1541+
* filesystems so handle that gracefully.
15401542
*/
1541-
redirty_tail_locked(inode, wb);
1543+
if (inode->i_state & I_DIRTY_ALL)
1544+
redirty_tail_locked(inode, wb);
1545+
else
1546+
inode_cgwb_move_to_attached(inode, wb);
15421547
return;
15431548
}
15441549

fs/libfs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1903,6 +1903,7 @@ ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
19031903
* We don't know how much we wrote, so just return the number of
19041904
* bytes which were direct-written
19051905
*/
1906+
iocb->ki_pos -= buffered_written;
19061907
if (direct_written)
19071908
return direct_written;
19081909
return err;

fs/ntfs3/super.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1562,6 +1562,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
15621562
put_inode_out:
15631563
iput(inode);
15641564
out:
1565+
ntfs3_put_sbi(sbi);
15651566
kfree(boot2);
15661567
return err;
15671568
}

fs/overlayfs/copy_up.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ static int ovl_set_timestamps(struct ovl_fs *ofs, struct dentry *upperdentry,
337337
{
338338
struct iattr attr = {
339339
.ia_valid =
340-
ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET,
340+
ATTR_ATIME | ATTR_MTIME | ATTR_ATIME_SET | ATTR_MTIME_SET | ATTR_CTIME,
341341
.ia_atime = stat->atime,
342342
.ia_mtime = stat->mtime,
343343
};

fs/overlayfs/file.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -391,6 +391,12 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
391391
if (!ovl_should_sync(OVL_FS(inode->i_sb)))
392392
ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
393393

394+
/*
395+
* Overlayfs doesn't support deferred completions, don't copy
396+
* this property in case it is set by the issuer.
397+
*/
398+
ifl &= ~IOCB_DIO_CALLER_COMP;
399+
394400
old_cred = ovl_override_creds(file_inode(file)->i_sb);
395401
if (is_sync_kiocb(iocb)) {
396402
file_start_write(real.file);

fs/pipe.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
537537
break;
538538
}
539539
ret += copied;
540-
buf->offset = 0;
541540
buf->len = copied;
542541

543542
if (!iov_iter_count(from))

fs/reiserfs/reiserfs.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2699,7 +2699,7 @@ struct reiserfs_iget_args {
26992699
#define get_journal_desc_magic(bh) (bh->b_data + bh->b_size - 12)
27002700

27012701
#define journal_trans_half(blocksize) \
2702-
((blocksize - sizeof (struct reiserfs_journal_desc) + sizeof (__u32) - 12) / sizeof (__u32))
2702+
((blocksize - sizeof(struct reiserfs_journal_desc) - 12) / sizeof(__u32))
27032703

27042704
/* journal.c see journal.c for all the comments here */
27052705

@@ -2711,7 +2711,7 @@ struct reiserfs_journal_desc {
27112711
__le32 j_len;
27122712

27132713
__le32 j_mount_id; /* mount id of this trans */
2714-
__le32 j_realblock[1]; /* real locations for each block */
2714+
__le32 j_realblock[]; /* real locations for each block */
27152715
};
27162716

27172717
#define get_desc_trans_id(d) le32_to_cpu((d)->j_trans_id)
@@ -2726,7 +2726,7 @@ struct reiserfs_journal_desc {
27262726
struct reiserfs_journal_commit {
27272727
__le32 j_trans_id; /* must match j_trans_id from the desc block */
27282728
__le32 j_len; /* ditto */
2729-
__le32 j_realblock[1]; /* real locations for each block */
2729+
__le32 j_realblock[]; /* real locations for each block */
27302730
};
27312731

27322732
#define get_commit_trans_id(c) le32_to_cpu((c)->j_trans_id)

0 commit comments

Comments
 (0)