Skip to content

Commit 6654408

Browse files
lostjefflebrauner
authored andcommitted
writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs
The cgwb cleanup routine will try to release the dying cgwb by switching the attached inodes. It fetches the attached inodes from wb->b_attached list, omitting the fact that inodes only with dirty timestamps reside in wb->b_dirty_time list, which is the case when lazytime is enabled. This causes enormous zombie memory cgroup when lazytime is enabled, as inodes with dirty timestamps can not be switched to a live cgwb for a long time. It is reasonable not to switch cgwb for inodes with dirty data, as otherwise it may break the bandwidth restrictions. However since the writeback of inode metadata is not accounted for, let's also switch inodes with dirty timestamps to avoid zombie memory and block cgroups when laztytime is enabled. Fixes: c22d70a ("writeback, cgroup: release dying cgwbs by switching attached inodes") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20231014125511.102978-1-jefflexu@linux.alibaba.com Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent e311ba2 commit 6654408

1 file changed

Lines changed: 29 additions & 12 deletions

File tree

fs/fs-writeback.c

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -613,6 +613,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
613613
kfree(isw);
614614
}
615615

616+
static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
617+
struct list_head *list, int *nr)
618+
{
619+
struct inode *inode;
620+
621+
list_for_each_entry(inode, list, i_io_list) {
622+
if (!inode_prepare_wbs_switch(inode, isw->new_wb))
623+
continue;
624+
625+
isw->inodes[*nr] = inode;
626+
(*nr)++;
627+
628+
if (*nr >= WB_MAX_INODES_PER_ISW - 1)
629+
return true;
630+
}
631+
return false;
632+
}
633+
616634
/**
617635
* cleanup_offline_cgwb - detach associated inodes
618636
* @wb: target wb
@@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
625643
{
626644
struct cgroup_subsys_state *memcg_css;
627645
struct inode_switch_wbs_context *isw;
628-
struct inode *inode;
629646
int nr;
630647
bool restart = false;
631648

@@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
647664

648665
nr = 0;
649666
spin_lock(&wb->list_lock);
650-
list_for_each_entry(inode, &wb->b_attached, i_io_list) {
651-
if (!inode_prepare_wbs_switch(inode, isw->new_wb))
652-
continue;
653-
654-
isw->inodes[nr++] = inode;
655-
656-
if (nr >= WB_MAX_INODES_PER_ISW - 1) {
657-
restart = true;
658-
break;
659-
}
660-
}
667+
/*
668+
* In addition to the inodes that have completed writeback, also switch
669+
* cgwbs for those inodes only with dirty timestamps. Otherwise, those
670+
* inodes won't be written back for a long time when lazytime is
671+
* enabled, and thus pinning the dying cgwbs. It won't break the
672+
* bandwidth restrictions, as writeback of inode metadata is not
673+
* accounted for.
674+
*/
675+
restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
676+
if (!restart)
677+
restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
661678
spin_unlock(&wb->list_lock);
662679

663680
/* no attached inodes? bail out */

0 commit comments

Comments
 (0)