Skip to content

Commit efc6134

Browse files
Eric Whitneytytso
authored andcommitted
ext4: shrink race window in ext4_should_retry_alloc()
When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it fails at significant rates on the two test scenarios that disable delayed allocation (ext3conv and data_journal) and force actual block allocation for the fallocate and pwrite functions in the test. The failure rate on 5.10 for both ext3conv and data_journal on one test system typically runs about 85%. On 5.11, the failure rate on ext3conv sometimes drops to as low as 1% while the rate on data_journal increases to nearly 100%. The observed failures are largely due to ext4_should_retry_alloc() cutting off block allocation retries when s_mb_free_pending (used to indicate that a transaction in progress will free blocks) is 0. However, free space is usually available when this occurs during runs of generic/371. It appears that a thread attempting to allocate blocks is just missing transaction commits in other threads that increase the free cluster count and reset s_mb_free_pending while the allocating thread isn't running. Explicitly testing for free space availability avoids this race. The current code uses a post-increment operator in the conditional expression that determines whether the retry limit has been exceeded. This means that the conditional expression uses the value of the retry counter before it's increased, resulting in an extra retry cycle. The current code actually retries twice before hitting its retry limit rather than once. Increasing the retry limit to 3 from the current actual maximum retry count of 2 in combination with the change described above reduces the observed failure rate to less that 0.1% on both ext3conv and data_journal with what should be limited impact on users sensitive to the overhead caused by retries. A per filesystem percpu counter exported via sysfs is added to allow users or developers to track the number of times the retry limit is exceeded without resorting to debugging methods. This should provide some insight into worst case retry behavior. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
1 parent a38fd87 commit efc6134

4 files changed

Lines changed: 39 additions & 12 deletions

File tree

fs/ext4/balloc.c

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -626,27 +626,41 @@ int ext4_claim_free_clusters(struct ext4_sb_info *sbi,
626626

627627
/**
628628
* ext4_should_retry_alloc() - check if a block allocation should be retried
629-
* @sb: super block
630-
* @retries: number of attemps has been made
629+
* @sb: superblock
630+
* @retries: number of retry attempts made so far
631631
*
632-
* ext4_should_retry_alloc() is called when ENOSPC is returned, and if
633-
* it is profitable to retry the operation, this function will wait
634-
* for the current or committing transaction to complete, and then
635-
* return TRUE. We will only retry once.
632+
* ext4_should_retry_alloc() is called when ENOSPC is returned while
633+
* attempting to allocate blocks. If there's an indication that a pending
634+
* journal transaction might free some space and allow another attempt to
635+
* succeed, this function will wait for the current or committing transaction
636+
* to complete and then return TRUE.
636637
*/
637638
int ext4_should_retry_alloc(struct super_block *sb, int *retries)
638639
{
639-
if (!ext4_has_free_clusters(EXT4_SB(sb), 1, 0) ||
640-
(*retries)++ > 1 ||
641-
!EXT4_SB(sb)->s_journal)
640+
struct ext4_sb_info *sbi = EXT4_SB(sb);
641+
642+
if (!sbi->s_journal)
642643
return 0;
643644

644-
smp_mb();
645-
if (EXT4_SB(sb)->s_mb_free_pending == 0)
645+
if (++(*retries) > 3) {
646+
percpu_counter_inc(&sbi->s_sra_exceeded_retry_limit);
646647
return 0;
648+
}
647649

650+
/*
651+
* if there's no indication that blocks are about to be freed it's
652+
* possible we just missed a transaction commit that did so
653+
*/
654+
smp_mb();
655+
if (sbi->s_mb_free_pending == 0)
656+
return ext4_has_free_clusters(sbi, 1, 0);
657+
658+
/*
659+
* it's possible we've just missed a transaction commit here,
660+
* so ignore the returned status
661+
*/
648662
jbd_debug(1, "%s: retrying operation after ENOSPC\n", sb->s_id);
649-
jbd2_journal_force_commit_nested(EXT4_SB(sb)->s_journal);
663+
(void) jbd2_journal_force_commit_nested(sbi->s_journal);
650664
return 1;
651665
}
652666

fs/ext4/ext4.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1484,6 +1484,7 @@ struct ext4_sb_info {
14841484
struct percpu_counter s_freeinodes_counter;
14851485
struct percpu_counter s_dirs_counter;
14861486
struct percpu_counter s_dirtyclusters_counter;
1487+
struct percpu_counter s_sra_exceeded_retry_limit;
14871488
struct blockgroup_lock *s_blockgroup_lock;
14881489
struct proc_dir_entry *s_proc;
14891490
struct kobject s_kobj;

fs/ext4/super.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1210,6 +1210,7 @@ static void ext4_put_super(struct super_block *sb)
12101210
percpu_counter_destroy(&sbi->s_freeinodes_counter);
12111211
percpu_counter_destroy(&sbi->s_dirs_counter);
12121212
percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
1213+
percpu_counter_destroy(&sbi->s_sra_exceeded_retry_limit);
12131214
percpu_free_rwsem(&sbi->s_writepages_rwsem);
12141215
#ifdef CONFIG_QUOTA
12151216
for (i = 0; i < EXT4_MAXQUOTAS; i++)
@@ -5011,6 +5012,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
50115012
if (!err)
50125013
err = percpu_counter_init(&sbi->s_dirtyclusters_counter, 0,
50135014
GFP_KERNEL);
5015+
if (!err)
5016+
err = percpu_counter_init(&sbi->s_sra_exceeded_retry_limit, 0,
5017+
GFP_KERNEL);
50145018
if (!err)
50155019
err = percpu_init_rwsem(&sbi->s_writepages_rwsem);
50165020

@@ -5124,6 +5128,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
51245128
percpu_counter_destroy(&sbi->s_freeinodes_counter);
51255129
percpu_counter_destroy(&sbi->s_dirs_counter);
51265130
percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
5131+
percpu_counter_destroy(&sbi->s_sra_exceeded_retry_limit);
51275132
percpu_free_rwsem(&sbi->s_writepages_rwsem);
51285133
failed_mount5:
51295134
ext4_ext_release(sb);

fs/ext4/sysfs.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ typedef enum {
2424
attr_session_write_kbytes,
2525
attr_lifetime_write_kbytes,
2626
attr_reserved_clusters,
27+
attr_sra_exceeded_retry_limit,
2728
attr_inode_readahead,
2829
attr_trigger_test_error,
2930
attr_first_error_time,
@@ -202,6 +203,7 @@ EXT4_ATTR_FUNC(delayed_allocation_blocks, 0444);
202203
EXT4_ATTR_FUNC(session_write_kbytes, 0444);
203204
EXT4_ATTR_FUNC(lifetime_write_kbytes, 0444);
204205
EXT4_ATTR_FUNC(reserved_clusters, 0644);
206+
EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
205207

206208
EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
207209
ext4_sb_info, s_inode_readahead_blks);
@@ -251,6 +253,7 @@ static struct attribute *ext4_attrs[] = {
251253
ATTR_LIST(session_write_kbytes),
252254
ATTR_LIST(lifetime_write_kbytes),
253255
ATTR_LIST(reserved_clusters),
256+
ATTR_LIST(sra_exceeded_retry_limit),
254257
ATTR_LIST(inode_readahead_blks),
255258
ATTR_LIST(inode_goal),
256259
ATTR_LIST(mb_stats),
@@ -374,6 +377,10 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
374377
return snprintf(buf, PAGE_SIZE, "%llu\n",
375378
(unsigned long long)
376379
atomic64_read(&sbi->s_resv_clusters));
380+
case attr_sra_exceeded_retry_limit:
381+
return snprintf(buf, PAGE_SIZE, "%llu\n",
382+
(unsigned long long)
383+
percpu_counter_sum(&sbi->s_sra_exceeded_retry_limit));
377384
case attr_inode_readahead:
378385
case attr_pointer_ui:
379386
if (!ptr)

0 commit comments

Comments
 (0)