Skip to content

Commit 1122c0c

Browse files
Christoph Hellwigaxboe
authored andcommitted
block: move cache control settings out of queue->flags
Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
1 parent 70905f8 commit 1122c0c

29 files changed

Lines changed: 227 additions & 206 deletions

File tree

Documentation/block/writeback_cache_control.rst

Lines changed: 38 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
4646
the Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags
4747
may both be set on a single bio.
4848

49+
Feature settings for block drivers
50+
----------------------------------
4951

50-
Implementation details for bio based block drivers
51-
--------------------------------------------------------------
52+
For devices that do not support volatile write caches there is no driver
53+
support required, the block layer completes empty REQ_PREFLUSH requests before
54+
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
55+
requests that have a payload.
5256

53-
These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
54-
directly below the submit_bio interface. For remapping drivers the REQ_FUA
55-
bits need to be propagated to underlying devices, and a global flush needs
56-
to be implemented for bios with the REQ_PREFLUSH bit set. For real device
57-
drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
58-
on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
59-
data can be completed successfully without doing any work. Drivers for
60-
devices with volatile caches need to implement the support for these
61-
flags themselves without any help from the block layer.
57+
For devices with volatile write caches the driver needs to tell the block layer
58+
that it supports flushing caches by setting the
6259

60+
BLK_FEAT_WRITE_CACHE
6361

64-
Implementation details for request_fn based block drivers
65-
---------------------------------------------------------
62+
flag in the queue_limits feature field. For devices that also support the FUA
63+
bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
64+
the
6665

67-
For devices that do not support volatile write caches there is no driver
68-
support required, the block layer completes empty REQ_PREFLUSH requests before
69-
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
70-
requests that have a payload. For devices with volatile write caches the
71-
driver needs to tell the block layer that it supports flushing caches by
72-
doing::
66+
BLK_FEAT_FUA
67+
68+
flag in the features field of the queue_limits structure.
69+
70+
Implementation details for bio based block drivers
71+
--------------------------------------------------
72+
73+
For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simplify passed on
74+
to the driver if the drivers sets the BLK_FEAT_WRITE_CACHE flag and the drivers
75+
needs to handle them.
76+
77+
*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
78+
_not_ set. Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
79+
handle REQ_FUA.
7380

74-
blk_queue_write_cache(sdkp->disk->queue, true, false);
81+
For remapping drivers the REQ_FUA bits need to be propagated to underlying
82+
devices, and a global flush needs to be implemented for bios with the
83+
REQ_PREFLUSH bit set.
7584

76-
and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note that
77-
REQ_PREFLUSH requests with a payload are automatically turned into a sequence
78-
of an empty REQ_OP_FLUSH request followed by the actual write by the block
79-
layer. For devices that also support the FUA bit the block layer needs
80-
to be told to pass through the REQ_FUA bit using::
85+
Implementation details for blk-mq drivers
86+
-----------------------------------------
8187

82-
blk_queue_write_cache(sdkp->disk->queue, true, true);
88+
When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
89+
with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
90+
request followed by the actual write by the block layer.
8391

84-
and the driver must handle write requests that have the REQ_FUA bit set
85-
in prep_fn/request_fn. If the FUA bit is not natively supported the block
86-
layer turns it into an empty REQ_OP_FLUSH request after the actual write.
92+
When the BLK_FEAT_FUA flags is set, the REQ_FUA bit simplify passed on for the
93+
REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
94+
after the completion of the write request for bio submissions with the REQ_FUA
95+
bit set.

arch/um/drivers/ubd_kern.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -835,6 +835,7 @@ static int ubd_add(int n, char **error_out)
835835
struct queue_limits lim = {
836836
.max_segments = MAX_SG,
837837
.seg_boundary_mask = PAGE_SIZE - 1,
838+
.features = BLK_FEAT_WRITE_CACHE,
838839
};
839840
struct gendisk *disk;
840841
int err = 0;
@@ -882,7 +883,6 @@ static int ubd_add(int n, char **error_out)
882883
}
883884

884885
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
885-
blk_queue_write_cache(disk->queue, true, false);
886886
disk->major = UBD_MAJOR;
887887
disk->first_minor = n << UBD_SHIFT;
888888
disk->minors = 1 << UBD_SHIFT;

block/blk-core.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -782,7 +782,7 @@ void submit_bio_noacct(struct bio *bio)
782782
if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE &&
783783
bio_op(bio) != REQ_OP_ZONE_APPEND))
784784
goto end_io;
785-
if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
785+
if (!bdev_write_cache(bdev)) {
786786
bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
787787
if (!bio_sectors(bio)) {
788788
status = BLK_STS_OK;

block/blk-flush.c

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -381,8 +381,8 @@ static void blk_rq_init_flush(struct request *rq)
381381
bool blk_insert_flush(struct request *rq)
382382
{
383383
struct request_queue *q = rq->q;
384-
unsigned long fflags = q->queue_flags; /* may change, cache */
385384
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
385+
bool supports_fua = q->limits.features & BLK_FEAT_FUA;
386386
unsigned int policy = 0;
387387

388388
/* FLUSH/FUA request must never be merged */
@@ -394,11 +394,10 @@ bool blk_insert_flush(struct request *rq)
394394
/*
395395
* Check which flushes we need to sequence for this operation.
396396
*/
397-
if (fflags & (1UL << QUEUE_FLAG_WC)) {
397+
if (blk_queue_write_cache(q)) {
398398
if (rq->cmd_flags & REQ_PREFLUSH)
399399
policy |= REQ_FSEQ_PREFLUSH;
400-
if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
401-
(rq->cmd_flags & REQ_FUA))
400+
if ((rq->cmd_flags & REQ_FUA) && !supports_fua)
402401
policy |= REQ_FSEQ_POSTFLUSH;
403402
}
404403

@@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq)
407406
* REQ_PREFLUSH and FUA for the driver.
408407
*/
409408
rq->cmd_flags &= ~REQ_PREFLUSH;
410-
if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
409+
if (!supports_fua)
411410
rq->cmd_flags &= ~REQ_FUA;
412411

413412
/*

block/blk-mq-debugfs.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = {
9393
QUEUE_FLAG_NAME(INIT_DONE),
9494
QUEUE_FLAG_NAME(STABLE_WRITES),
9595
QUEUE_FLAG_NAME(POLL),
96-
QUEUE_FLAG_NAME(WC),
97-
QUEUE_FLAG_NAME(FUA),
9896
QUEUE_FLAG_NAME(DAX),
9997
QUEUE_FLAG_NAME(STATS),
10098
QUEUE_FLAG_NAME(REGISTERED),

block/blk-settings.c

Lines changed: 5 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,9 @@ static int blk_validate_limits(struct queue_limits *lim)
261261
lim->misaligned = 0;
262262
}
263263

264+
if (!(lim->features & BLK_FEAT_WRITE_CACHE))
265+
lim->features &= ~BLK_FEAT_FUA;
266+
264267
err = blk_validate_integrity_limits(lim);
265268
if (err)
266269
return err;
@@ -454,6 +457,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
454457
{
455458
unsigned int top, bottom, alignment, ret = 0;
456459

460+
t->features |= (b->features & BLK_FEAT_INHERIT_MASK);
461+
457462
t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
458463
t->max_user_sectors = min_not_zero(t->max_user_sectors,
459464
b->max_user_sectors);
@@ -711,30 +716,6 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth)
711716
}
712717
EXPORT_SYMBOL(blk_set_queue_depth);
713718

714-
/**
715-
* blk_queue_write_cache - configure queue's write cache
716-
* @q: the request queue for the device
717-
* @wc: write back cache on or off
718-
* @fua: device supports FUA writes, if true
719-
*
720-
* Tell the block layer about the write cache of @q.
721-
*/
722-
void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
723-
{
724-
if (wc) {
725-
blk_queue_flag_set(QUEUE_FLAG_HW_WC, q);
726-
blk_queue_flag_set(QUEUE_FLAG_WC, q);
727-
} else {
728-
blk_queue_flag_clear(QUEUE_FLAG_HW_WC, q);
729-
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
730-
}
731-
if (fua)
732-
blk_queue_flag_set(QUEUE_FLAG_FUA, q);
733-
else
734-
blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
735-
}
736-
EXPORT_SYMBOL_GPL(blk_queue_write_cache);
737-
738719
int bdev_alignment_offset(struct block_device *bdev)
739720
{
740721
struct request_queue *q = bdev_get_queue(bdev);

block/blk-sysfs.c

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
423423

424424
static ssize_t queue_wc_show(struct request_queue *q, char *page)
425425
{
426-
if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
427-
return sprintf(page, "write back\n");
428-
429-
return sprintf(page, "write through\n");
426+
if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED)
427+
return sprintf(page, "write through\n");
428+
return sprintf(page, "write back\n");
430429
}
431430

432431
static ssize_t queue_wc_store(struct request_queue *q, const char *page,
433432
size_t count)
434433
{
434+
struct queue_limits lim;
435+
bool disable;
436+
int err;
437+
435438
if (!strncmp(page, "write back", 10)) {
436-
if (!test_bit(QUEUE_FLAG_HW_WC, &q->queue_flags))
437-
return -EINVAL;
438-
blk_queue_flag_set(QUEUE_FLAG_WC, q);
439+
disable = false;
439440
} else if (!strncmp(page, "write through", 13) ||
440-
!strncmp(page, "none", 4)) {
441-
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
441+
!strncmp(page, "none", 4)) {
442+
disable = true;
442443
} else {
443444
return -EINVAL;
444445
}
445446

447+
lim = queue_limits_start_update(q);
448+
if (disable)
449+
lim.flags |= BLK_FLAGS_WRITE_CACHE_DISABLED;
450+
else
451+
lim.flags &= ~BLK_FLAGS_WRITE_CACHE_DISABLED;
452+
err = queue_limits_commit_update(q, &lim);
453+
if (err)
454+
return err;
446455
return count;
447456
}
448457

449458
static ssize_t queue_fua_show(struct request_queue *q, char *page)
450459
{
451-
return sprintf(page, "%u\n", test_bit(QUEUE_FLAG_FUA, &q->queue_flags));
460+
return sprintf(page, "%u\n", !!(q->limits.features & BLK_FEAT_FUA));
452461
}
453462

454463
static ssize_t queue_dax_show(struct request_queue *q, char *page)

block/blk-wbt.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
206206
*/
207207
if (wb_acct & WBT_DISCARD)
208208
limit = rwb->wb_background;
209-
else if (test_bit(QUEUE_FLAG_WC, &rwb->rqos.disk->queue->queue_flags) &&
210-
!wb_recent_wait(rwb))
209+
else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
210+
!wb_recent_wait(rwb))
211211
limit = 0;
212212
else
213213
limit = rwb->wb_normal;

drivers/block/drbd/drbd_main.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2697,6 +2697,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
26972697
* connect.
26982698
*/
26992699
.max_hw_sectors = DRBD_MAX_BIO_SIZE_SAFE >> 8,
2700+
.features = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
27002701
};
27012702

27022703
device = minor_to_device(minor);
@@ -2736,7 +2737,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
27362737
disk->private_data = device;
27372738

27382739
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
2739-
blk_queue_write_cache(disk->queue, true, true);
27402740

27412741
device->md_io.page = alloc_page(GFP_KERNEL);
27422742
if (!device->md_io.page)

drivers/block/loop.c

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -985,6 +985,9 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
985985
lim.logical_block_size = bsize;
986986
lim.physical_block_size = bsize;
987987
lim.io_min = bsize;
988+
lim.features &= ~BLK_FEAT_WRITE_CACHE;
989+
if (file->f_op->fsync && !(lo->lo_flags & LO_FLAGS_READ_ONLY))
990+
lim.features |= BLK_FEAT_WRITE_CACHE;
988991
if (!backing_bdev || bdev_nonrot(backing_bdev))
989992
blk_queue_flag_set(QUEUE_FLAG_NONROT, lo->lo_queue);
990993
else
@@ -1078,9 +1081,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
10781081
lo->old_gfp_mask = mapping_gfp_mask(mapping);
10791082
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
10801083

1081-
if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
1082-
blk_queue_write_cache(lo->lo_queue, true, false);
1083-
10841084
error = loop_reconfigure_limits(lo, config->block_size);
10851085
if (WARN_ON_ONCE(error))
10861086
goto out_unlock;
@@ -1131,9 +1131,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
11311131
struct file *filp;
11321132
gfp_t gfp = lo->old_gfp_mask;
11331133

1134-
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
1135-
blk_queue_write_cache(lo->lo_queue, false, false);
1136-
11371134
/*
11381135
* Freeze the request queue when unbinding on a live file descriptor and
11391136
* thus an open device. When called from ->release we are guaranteed

0 commit comments

Comments
 (0)