Skip to content

Commit 1885cdb

Browse files
committed
Merge tag 'vfs-6.19-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner: "FUSE iomap Support for Buffered Reads: This adds iomap support for FUSE buffered reads and readahead. This enables granular uptodate tracking with large folios so only non-uptodate portions need to be read. Also fixes a race condition with large folios + writeback cache that could cause data corruption on partial writes followed by reads. - Refactored iomap read/readahead bio logic into helpers - Added caller-provided callbacks for read operations - Moved buffered IO bio logic into new file - FUSE now uses iomap for read_folio and readahead Zero Range Folio Batch Support: Add folio batch support for iomap_zero_range() to handle dirty folios over unwritten mappings. Fix raciness issues where dirty data could be lost during zero range operations. - filemap_get_folios_tag_range() helper for dirty folio lookup - Optional zero range dirty folio processing - XFS fills dirty folios on zero range of unwritten mappings - Removed old partial EOF zeroing optimization DIO Write Completions from Interrupt Context: Restore pre-iomap behavior where pure overwrite completions run inline rather than being deferred to workqueue. Reduces context switches for high-performance workloads like ScyllaDB. - Removed unused IOCB_DIO_CALLER_COMP code - Error completions always run in user context (fixes zonefs) - Reworked REQ_FUA selection logic - Inverted IOMAP_DIO_INLINE_COMP to IOMAP_DIO_OFFLOAD_COMP Buffered IO Cleanups: Some performance and code clarity improvements: - Replace manual bitmap scanning with find_next_bit() - Simplify read skip logic for writes - Optimize pending async writeback accounting - Better variable naming - Documentation for iomap_finish_folio_write() requirements Misaligned Vectors for Zoned XFS: Enables sub-block aligned vectors in XFS always-COW mode for zoned devices via new IOMAP_DIO_FSBLOCK_ALIGNED flag. Bug Fixes: - Allocate s_dio_done_wq for async reads (fixes syzbot report after error completion changes) - Fix iomap_read_end() for already uptodate folios (regression fix)" * tag 'vfs-6.19-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (40 commits) iomap: allocate s_dio_done_wq for async reads as well iomap: fix iomap_read_end() for already uptodate folios iomap: invert the polarity of IOMAP_DIO_INLINE_COMP iomap: support write completions from interrupt context iomap: rework REQ_FUA selection iomap: always run error completions in user context fs, iomap: remove IOCB_DIO_CALLER_COMP iomap: use find_next_bit() for uptodate bitmap scanning iomap: use find_next_bit() for dirty bitmap scanning iomap: simplify when reads can be skipped for writes iomap: simplify ->read_folio_range() error handling for reads iomap: optimize pending async writeback accounting docs: document iomap writeback's iomap_finish_folio_write() requirement iomap: account for unaligned end offsets when truncating read range iomap: rename bytes_pending/bytes_accounted to bytes_submitted/bytes_not_submitted xfs: support sub-block aligned vectors in always COW mode iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag xfs: error tag to force zeroing on debug kernels iomap: remove old partial eof zeroing optimization xfs: fill dirty folios on zero range of unwritten mappings ...
2 parents 7d0a66e + 7fd8720 commit 1885cdb

29 files changed

Lines changed: 1093 additions & 633 deletions

File tree

Documentation/filesystems/iomap/operations.rst

Lines changed: 46 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,27 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
135135

136136
* ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``.
137137

138+
``struct iomap_read_ops``
139+
--------------------------
140+
141+
.. code-block:: c
142+
143+
struct iomap_read_ops {
144+
int (*read_folio_range)(const struct iomap_iter *iter,
145+
struct iomap_read_folio_ctx *ctx, size_t len);
146+
void (*submit_read)(struct iomap_read_folio_ctx *ctx);
147+
};
148+
149+
iomap calls these functions:
150+
151+
- ``read_folio_range``: Called to read in the range. This must be provided
152+
by the caller. If this succeeds, iomap_finish_folio_read() must be called
153+
after the range is read in, regardless of whether the read succeeded or
154+
failed.
155+
156+
- ``submit_read``: Submit any pending read requests. This function is
157+
optional.
158+
138159
Internal per-Folio State
139160
------------------------
140161

@@ -182,6 +203,28 @@ The ``flags`` argument to ``->iomap_begin`` will be set to zero.
182203
The pagecache takes whatever locks it needs before calling the
183204
filesystem.
184205

206+
Both ``iomap_readahead`` and ``iomap_read_folio`` pass in a ``struct
207+
iomap_read_folio_ctx``:
208+
209+
.. code-block:: c
210+
211+
struct iomap_read_folio_ctx {
212+
const struct iomap_read_ops *ops;
213+
struct folio *cur_folio;
214+
struct readahead_control *rac;
215+
void *read_ctx;
216+
};
217+
218+
``iomap_readahead`` must set:
219+
* ``ops->read_folio_range()`` and ``rac``
220+
221+
``iomap_read_folio`` must set:
222+
* ``ops->read_folio_range()`` and ``cur_folio``
223+
224+
``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to
225+
pass in any custom data the caller needs accessible in the ops callbacks for
226+
fulfilling reads.
227+
185228
Buffered Writes
186229
---------------
187230

@@ -317,6 +360,9 @@ The fields are as follows:
317360
delalloc reservations to avoid having delalloc reservations for
318361
clean pagecache.
319362
This function must be supplied by the filesystem.
363+
If this succeeds, iomap_finish_folio_write() must be called once writeback
364+
completes for the range, regardless of whether the writeback succeeded or
365+
failed.
320366

321367
- ``writeback_submit``: Submit the previous built writeback context.
322368
Block based file systems should use the iomap_ioend_writeback_submit
@@ -444,10 +490,6 @@ These ``struct kiocb`` flags are significant for direct I/O with iomap:
444490
Only meaningful for asynchronous I/O, and only if the entire I/O can
445491
be issued as a single ``struct bio``.
446492

447-
* ``IOCB_DIO_CALLER_COMP``: Try to run I/O completion from the caller's
448-
process context.
449-
See ``linux/fs.h`` for more details.
450-
451493
Filesystems should call ``iomap_dio_rw`` from ``->read_iter`` and
452494
``->write_iter``, and set ``FMODE_CAN_ODIRECT`` in the ``->open``
453495
function for the file.

block/fops.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -540,12 +540,13 @@ const struct address_space_operations def_blk_aops = {
540540
#else /* CONFIG_BUFFER_HEAD */
541541
static int blkdev_read_folio(struct file *file, struct folio *folio)
542542
{
543-
return iomap_read_folio(folio, &blkdev_iomap_ops);
543+
iomap_bio_read_folio(folio, &blkdev_iomap_ops);
544+
return 0;
544545
}
545546

546547
static void blkdev_readahead(struct readahead_control *rac)
547548
{
548-
iomap_readahead(rac, &blkdev_iomap_ops);
549+
iomap_bio_readahead(rac, &blkdev_iomap_ops);
549550
}
550551

551552
static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,

fs/backing-file.c

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -227,12 +227,6 @@ ssize_t backing_file_write_iter(struct file *file, struct iov_iter *iter,
227227
!(file->f_mode & FMODE_CAN_ODIRECT))
228228
return -EINVAL;
229229

230-
/*
231-
* Stacked filesystems don't support deferred completions, don't copy
232-
* this property in case it is set by the issuer.
233-
*/
234-
flags &= ~IOCB_DIO_CALLER_COMP;
235-
236230
old_cred = override_creds(ctx->cred);
237231
if (is_sync_kiocb(iocb)) {
238232
rwf_t rwf = iocb_to_rw_flags(flags);

fs/dax.c

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1507,7 +1507,7 @@ static int dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
15071507

15081508
/* already zeroed? we're done. */
15091509
if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
1510-
return iomap_iter_advance(iter, &length);
1510+
return iomap_iter_advance(iter, length);
15111511

15121512
/*
15131513
* invalidate the pages whose sharing state is to be changed
@@ -1536,10 +1536,10 @@ static int dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
15361536
if (ret < 0)
15371537
return ret;
15381538

1539-
ret = iomap_iter_advance(iter, &length);
1539+
ret = iomap_iter_advance(iter, length);
15401540
if (ret)
15411541
return ret;
1542-
} while (length > 0);
1542+
} while ((length = iomap_length(iter)) > 0);
15431543

15441544
if (did_zero)
15451545
*did_zero = true;
@@ -1597,7 +1597,7 @@ static int dax_iomap_iter(struct iomap_iter *iomi, struct iov_iter *iter)
15971597

15981598
if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN) {
15991599
done = iov_iter_zero(min(length, end - pos), iter);
1600-
return iomap_iter_advance(iomi, &done);
1600+
return iomap_iter_advance(iomi, done);
16011601
}
16021602
}
16031603

@@ -1681,12 +1681,12 @@ static int dax_iomap_iter(struct iomap_iter *iomi, struct iov_iter *iter)
16811681
xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr,
16821682
map_len, iter);
16831683

1684-
length = xfer;
1685-
ret = iomap_iter_advance(iomi, &length);
1684+
ret = iomap_iter_advance(iomi, xfer);
16861685
if (!ret && xfer == 0)
16871686
ret = -EFAULT;
16881687
if (xfer < map_len)
16891688
break;
1689+
length = iomap_length(iomi);
16901690
}
16911691
dax_read_unlock(id);
16921692

@@ -1919,10 +1919,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, unsigned long *pfnp,
19191919
ret |= VM_FAULT_MAJOR;
19201920
}
19211921

1922-
if (!(ret & VM_FAULT_ERROR)) {
1923-
u64 length = PAGE_SIZE;
1924-
iter.status = iomap_iter_advance(&iter, &length);
1925-
}
1922+
if (!(ret & VM_FAULT_ERROR))
1923+
iter.status = iomap_iter_advance(&iter, PAGE_SIZE);
19261924
}
19271925

19281926
if (iomap_errp)
@@ -2034,10 +2032,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, unsigned long *pfnp,
20342032
continue; /* actually breaks out of the loop */
20352033

20362034
ret = dax_fault_iter(vmf, &iter, pfnp, &xas, &entry, true);
2037-
if (ret != VM_FAULT_FALLBACK) {
2038-
u64 length = PMD_SIZE;
2039-
iter.status = iomap_iter_advance(&iter, &length);
2040-
}
2035+
if (ret != VM_FAULT_FALLBACK)
2036+
iter.status = iomap_iter_advance(&iter, PMD_SIZE);
20412037
}
20422038

20432039
unlock_entry:
@@ -2163,7 +2159,6 @@ static int dax_range_compare_iter(struct iomap_iter *it_src,
21632159
const struct iomap *smap = &it_src->iomap;
21642160
const struct iomap *dmap = &it_dest->iomap;
21652161
loff_t pos1 = it_src->pos, pos2 = it_dest->pos;
2166-
u64 dest_len;
21672162
void *saddr, *daddr;
21682163
int id, ret;
21692164

@@ -2196,10 +2191,9 @@ static int dax_range_compare_iter(struct iomap_iter *it_src,
21962191
dax_read_unlock(id);
21972192

21982193
advance:
2199-
dest_len = len;
2200-
ret = iomap_iter_advance(it_src, &len);
2194+
ret = iomap_iter_advance(it_src, len);
22012195
if (!ret)
2202-
ret = iomap_iter_advance(it_dest, &dest_len);
2196+
ret = iomap_iter_advance(it_dest, len);
22032197
return ret;
22042198

22052199
out_unlock:

fs/erofs/data.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -371,15 +371,16 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
371371
{
372372
trace_erofs_read_folio(folio, true);
373373

374-
return iomap_read_folio(folio, &erofs_iomap_ops);
374+
iomap_bio_read_folio(folio, &erofs_iomap_ops);
375+
return 0;
375376
}
376377

377378
static void erofs_readahead(struct readahead_control *rac)
378379
{
379380
trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
380381
readahead_count(rac), true);
381382

382-
return iomap_readahead(rac, &erofs_iomap_ops);
383+
iomap_bio_readahead(rac, &erofs_iomap_ops);
383384
}
384385

385386
static sector_t erofs_bmap(struct address_space *mapping, sector_t block)

fs/fuse/dir.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1192,7 +1192,7 @@ static void fuse_fillattr(struct mnt_idmap *idmap, struct inode *inode,
11921192
if (attr->blksize != 0)
11931193
blkbits = ilog2(attr->blksize);
11941194
else
1195-
blkbits = fc->blkbits;
1195+
blkbits = inode->i_sb->s_blocksize_bits;
11961196

11971197
stat->blksize = 1 << blkbits;
11981198
}

0 commit comments

Comments
 (0)