Skip to content

Commit 919edba

Browse files
dgchinnerDarrick J. Wong
authored andcommitted
xfs: drop async cache flushes from CIL commits.
Jan Kara reported a performance regression in dbench that he bisected down to commit bad77c3 ("xfs: CIL checkpoint flushes caches unconditionally"). Whilst developing the journal flush/fua optimisations this cache was part of, it appeared to made a significant difference to performance. However, now that this patchset has settled and all the correctness issues fixed, there does not appear to be any significant performance benefit to asynchronous cache flushes. In fact, the opposite is true on some storage types and workloads, where additional cache flushes that can occur from fsync heavy workloads have measurable and significant impact on overall throughput. Local dbench testing shows little difference on dbench runs with sync vs async cache flushes on either fast or slow SSD storage, and no difference in streaming concurrent async transaction workloads like fs-mark. Fast NVME storage. From `dbench -t 30`, CIL scale: clients async sync BW Latency BW Latency 1 935.18 0.855 915.64 0.903 8 2404.51 6.873 2341.77 6.511 16 3003.42 6.460 2931.57 6.529 32 3697.23 7.939 3596.28 7.894 128 7237.43 15.495 7217.74 11.588 512 5079.24 90.587 5167.08 95.822 fsmark, 32 threads, create w/ 64 byte xattr w/32k logbsize create chown unlink async 1m41s 1m16s 2m03s sync 1m40s 1m19s 1m54s Slower SATA SSD storage: From `dbench -t 30`, CIL scale: clients async sync BW Latency BW Latency 1 78.59 15.792 83.78 10.729 8 367.88 92.067 404.63 59.943 16 564.51 72.524 602.71 76.089 32 831.66 105.984 870.26 110.482 128 1659.76 102.969 1624.73 91.356 512 2135.91 223.054 2603.07 161.160 fsmark, 16 threads, create w/32k logbsize create unlink async 5m06s 4m15s sync 5m00s 4m22s And on Jan's test machine: 5.18-rc8-vanilla 5.18-rc8-patched Amean 1 71.22 ( 0.00%) 64.94 * 8.81%* Amean 2 93.03 ( 0.00%) 84.80 * 8.85%* Amean 4 150.54 ( 0.00%) 137.51 * 8.66%* Amean 8 252.53 ( 0.00%) 242.24 * 4.08%* Amean 16 454.13 ( 0.00%) 439.08 * 3.31%* Amean 32 835.24 ( 0.00%) 829.74 * 0.66%* Amean 64 1740.59 ( 0.00%) 1686.73 * 3.09%* Performance and cache flush behaviour is restored to pre-regression levels. As such, we can now consider the async cache flush mechanism an unnecessary exercise in premature optimisation and hence we can now remove it and the infrastructure it requires completely. Fixes: bad77c3 ("xfs: CIL checkpoint flushes caches unconditionally") Reported-and-tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
1 parent 5652ef3 commit 919edba

5 files changed

Lines changed: 25 additions & 93 deletions

File tree

fs/xfs/xfs_bio_io.c

Lines changed: 0 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -9,41 +9,6 @@ static inline unsigned int bio_max_vecs(unsigned int count)
99
return bio_max_segs(howmany(count, PAGE_SIZE));
1010
}
1111

12-
static void
13-
xfs_flush_bdev_async_endio(
14-
struct bio *bio)
15-
{
16-
complete(bio->bi_private);
17-
}
18-
19-
/*
20-
* Submit a request for an async cache flush to run. If the request queue does
21-
* not require flush operations, just skip it altogether. If the caller needs
22-
* to wait for the flush completion at a later point in time, they must supply a
23-
* valid completion. This will be signalled when the flush completes. The
24-
* caller never sees the bio that is issued here.
25-
*/
26-
void
27-
xfs_flush_bdev_async(
28-
struct bio *bio,
29-
struct block_device *bdev,
30-
struct completion *done)
31-
{
32-
struct request_queue *q = bdev->bd_disk->queue;
33-
34-
if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
35-
complete(done);
36-
return;
37-
}
38-
39-
bio_init(bio, NULL, 0);
40-
bio_set_dev(bio, bdev);
41-
bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
42-
bio->bi_private = done;
43-
bio->bi_end_io = xfs_flush_bdev_async_endio;
44-
45-
submit_bio(bio);
46-
}
4712
int
4813
xfs_rw_bdev(
4914
struct block_device *bdev,

fs/xfs/xfs_linux.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -197,8 +197,6 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
197197

198198
int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count,
199199
char *data, unsigned int op);
200-
void xfs_flush_bdev_async(struct bio *bio, struct block_device *bdev,
201-
struct completion *done);
202200

203201
#define ASSERT_ALWAYS(expr) \
204202
(likely(expr) ? (void)0 : assfail(NULL, #expr, __FILE__, __LINE__))

fs/xfs/xfs_log.c

Lines changed: 11 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -527,12 +527,6 @@ xlog_state_shutdown_callbacks(
527527
* Flush iclog to disk if this is the last reference to the given iclog and the
528528
* it is in the WANT_SYNC state.
529529
*
530-
* If the caller passes in a non-zero @old_tail_lsn and the current log tail
531-
* does not match, there may be metadata on disk that must be persisted before
532-
* this iclog is written. To satisfy that requirement, set the
533-
* XLOG_ICL_NEED_FLUSH flag as a condition for writing this iclog with the new
534-
* log tail value.
535-
*
536530
* If XLOG_ICL_NEED_FUA is already set on the iclog, we need to ensure that the
537531
* log tail is updated correctly. NEED_FUA indicates that the iclog will be
538532
* written to stable storage, and implies that a commit record is contained
@@ -549,12 +543,10 @@ xlog_state_shutdown_callbacks(
549543
* always capture the tail lsn on the iclog on the first NEED_FUA release
550544
* regardless of the number of active reference counts on this iclog.
551545
*/
552-
553546
int
554547
xlog_state_release_iclog(
555548
struct xlog *log,
556-
struct xlog_in_core *iclog,
557-
xfs_lsn_t old_tail_lsn)
549+
struct xlog_in_core *iclog)
558550
{
559551
xfs_lsn_t tail_lsn;
560552
bool last_ref;
@@ -565,18 +557,14 @@ xlog_state_release_iclog(
565557
/*
566558
* Grabbing the current log tail needs to be atomic w.r.t. the writing
567559
* of the tail LSN into the iclog so we guarantee that the log tail does
568-
* not move between deciding if a cache flush is required and writing
569-
* the LSN into the iclog below.
560+
* not move between the first time we know that the iclog needs to be
561+
* made stable and when we eventually submit it.
570562
*/
571-
if (old_tail_lsn || iclog->ic_state == XLOG_STATE_WANT_SYNC) {
563+
if ((iclog->ic_state == XLOG_STATE_WANT_SYNC ||
564+
(iclog->ic_flags & XLOG_ICL_NEED_FUA)) &&
565+
!iclog->ic_header.h_tail_lsn) {
572566
tail_lsn = xlog_assign_tail_lsn(log->l_mp);
573-
574-
if (old_tail_lsn && tail_lsn != old_tail_lsn)
575-
iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
576-
577-
if ((iclog->ic_flags & XLOG_ICL_NEED_FUA) &&
578-
!iclog->ic_header.h_tail_lsn)
579-
iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
567+
iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
580568
}
581569

582570
last_ref = atomic_dec_and_test(&iclog->ic_refcnt);
@@ -601,8 +589,6 @@ xlog_state_release_iclog(
601589
}
602590

603591
iclog->ic_state = XLOG_STATE_SYNCING;
604-
if (!iclog->ic_header.h_tail_lsn)
605-
iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
606592
xlog_verify_tail_lsn(log, iclog);
607593
trace_xlog_iclog_syncing(iclog, _RET_IP_);
608594

@@ -874,7 +860,7 @@ xlog_force_iclog(
874860
iclog->ic_flags |= XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA;
875861
if (iclog->ic_state == XLOG_STATE_ACTIVE)
876862
xlog_state_switch_iclogs(iclog->ic_log, iclog, 0);
877-
return xlog_state_release_iclog(iclog->ic_log, iclog, 0);
863+
return xlog_state_release_iclog(iclog->ic_log, iclog);
878864
}
879865

880866
/*
@@ -2412,7 +2398,7 @@ xlog_write_copy_finish(
24122398
ASSERT(iclog->ic_state == XLOG_STATE_WANT_SYNC ||
24132399
xlog_is_shutdown(log));
24142400
release_iclog:
2415-
error = xlog_state_release_iclog(log, iclog, 0);
2401+
error = xlog_state_release_iclog(log, iclog);
24162402
spin_unlock(&log->l_icloglock);
24172403
return error;
24182404
}
@@ -2629,7 +2615,7 @@ xlog_write(
26292615

26302616
spin_lock(&log->l_icloglock);
26312617
xlog_state_finish_copy(log, iclog, record_cnt, data_cnt);
2632-
error = xlog_state_release_iclog(log, iclog, 0);
2618+
error = xlog_state_release_iclog(log, iclog);
26332619
spin_unlock(&log->l_icloglock);
26342620

26352621
return error;
@@ -3053,7 +3039,7 @@ xlog_state_get_iclog_space(
30533039
* reference to the iclog.
30543040
*/
30553041
if (!atomic_add_unless(&iclog->ic_refcnt, -1, 1))
3056-
error = xlog_state_release_iclog(log, iclog, 0);
3042+
error = xlog_state_release_iclog(log, iclog);
30573043
spin_unlock(&log->l_icloglock);
30583044
if (error)
30593045
return error;

fs/xfs/xfs_log_cil.c

Lines changed: 13 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -705,11 +705,21 @@ xlog_cil_set_ctx_write_state(
705705
* The LSN we need to pass to the log items on transaction
706706
* commit is the LSN reported by the first log vector write, not
707707
* the commit lsn. If we use the commit record lsn then we can
708-
* move the tail beyond the grant write head.
708+
* move the grant write head beyond the tail LSN and overwrite
709+
* it.
709710
*/
710711
ctx->start_lsn = lsn;
711712
wake_up_all(&cil->xc_start_wait);
712713
spin_unlock(&cil->xc_push_lock);
714+
715+
/*
716+
* Make sure the metadata we are about to overwrite in the log
717+
* has been flushed to stable storage before this iclog is
718+
* issued.
719+
*/
720+
spin_lock(&cil->xc_log->l_icloglock);
721+
iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
722+
spin_unlock(&cil->xc_log->l_icloglock);
713723
return;
714724
}
715725

@@ -888,10 +898,7 @@ xlog_cil_push_work(
888898
struct xfs_trans_header thdr;
889899
struct xfs_log_iovec lhdr;
890900
struct xfs_log_vec lvhdr = { NULL };
891-
xfs_lsn_t preflush_tail_lsn;
892901
xfs_csn_t push_seq;
893-
struct bio bio;
894-
DECLARE_COMPLETION_ONSTACK(bdev_flush);
895902
bool push_commit_stable;
896903

897904
new_ctx = xlog_cil_ctx_alloc();
@@ -961,23 +968,6 @@ xlog_cil_push_work(
961968
list_add(&ctx->committing, &cil->xc_committing);
962969
spin_unlock(&cil->xc_push_lock);
963970

964-
/*
965-
* The CIL is stable at this point - nothing new will be added to it
966-
* because we hold the flush lock exclusively. Hence we can now issue
967-
* a cache flush to ensure all the completed metadata in the journal we
968-
* are about to overwrite is on stable storage.
969-
*
970-
* Because we are issuing this cache flush before we've written the
971-
* tail lsn to the iclog, we can have metadata IO completions move the
972-
* tail forwards between the completion of this flush and the iclog
973-
* being written. In this case, we need to re-issue the cache flush
974-
* before the iclog write. To detect whether the log tail moves, sample
975-
* the tail LSN *before* we issue the flush.
976-
*/
977-
preflush_tail_lsn = atomic64_read(&log->l_tail_lsn);
978-
xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
979-
&bdev_flush);
980-
981971
/*
982972
* Pull all the log vectors off the items in the CIL, and remove the
983973
* items from the CIL. We don't need the CIL lock here because it's only
@@ -1054,12 +1044,6 @@ xlog_cil_push_work(
10541044
lvhdr.lv_iovecp = &lhdr;
10551045
lvhdr.lv_next = ctx->lv_chain;
10561046

1057-
/*
1058-
* Before we format and submit the first iclog, we have to ensure that
1059-
* the metadata writeback ordering cache flush is complete.
1060-
*/
1061-
wait_for_completion(&bdev_flush);
1062-
10631047
error = xlog_cil_write_chain(ctx, &lvhdr);
10641048
if (error)
10651049
goto out_abort_free_ticket;
@@ -1118,7 +1102,7 @@ xlog_cil_push_work(
11181102
if (push_commit_stable &&
11191103
ctx->commit_iclog->ic_state == XLOG_STATE_ACTIVE)
11201104
xlog_state_switch_iclogs(log, ctx->commit_iclog, 0);
1121-
xlog_state_release_iclog(log, ctx->commit_iclog, preflush_tail_lsn);
1105+
xlog_state_release_iclog(log, ctx->commit_iclog);
11221106

11231107
/* Not safe to reference ctx now! */
11241108

@@ -1139,7 +1123,7 @@ xlog_cil_push_work(
11391123
return;
11401124
}
11411125
spin_lock(&log->l_icloglock);
1142-
xlog_state_release_iclog(log, ctx->commit_iclog, 0);
1126+
xlog_state_release_iclog(log, ctx->commit_iclog);
11431127
/* Not safe to reference ctx now! */
11441128
spin_unlock(&log->l_icloglock);
11451129
}

fs/xfs/xfs_log_priv.h

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -535,8 +535,7 @@ void xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
535535

536536
void xlog_state_switch_iclogs(struct xlog *log, struct xlog_in_core *iclog,
537537
int eventual_size);
538-
int xlog_state_release_iclog(struct xlog *log, struct xlog_in_core *iclog,
539-
xfs_lsn_t log_tail_lsn);
538+
int xlog_state_release_iclog(struct xlog *log, struct xlog_in_core *iclog);
540539

541540
/*
542541
* When we crack an atomic LSN, we sample it first so that the value will not

0 commit comments

Comments
 (0)