Skip to content

Commit 952c3fc

Browse files
jmberg-intelgregkh
authored andcommitted
debugfs: fix wait/cancellation handling during remove
Ben Greear further reports deadlocks during concurrent debugfs remove while files are being accessed, even though the code in question now uses debugfs cancellations. Turns out that despite all the review on the locking, we missed completely that the logic is wrong: if the refcount hits zero we can finish (and need not wait for the completion), but if it doesn't we have to trigger all the cancellations. As written, we can _never_ get into the loop triggering the cancellations. Fix this, and explain it better while at it. Cc: stable@vger.kernel.org Fixes: 8c88a47 ("debugfs: add API to allow debugfs operations cancellation") Reported-by: Ben Greear <greearb@candelatech.com> Closes: https://lore.kernel.org/r/1c9fa9e5-09f1-0522-fdbc-dbcef4d255ca@candelatech.com Tested-by: Madhan Sai <madhan.singaraju@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240229153635.6bfab7eb34d3.I6c7aeff8c9d6628a8bc1ddcf332205a49d801f17@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent 4dc3d61 commit 952c3fc

1 file changed

Lines changed: 20 additions & 5 deletions

File tree

fs/debugfs/inode.c

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -751,13 +751,28 @@ static void __debugfs_file_removed(struct dentry *dentry)
751751
if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
752752
return;
753753

754-
/* if we hit zero, just wait for all to finish */
755-
if (!refcount_dec_and_test(&fsd->active_users)) {
756-
wait_for_completion(&fsd->active_users_drained);
754+
/* if this was the last reference, we're done */
755+
if (refcount_dec_and_test(&fsd->active_users))
757756
return;
758-
}
759757

760-
/* if we didn't hit zero, try to cancel any we can */
758+
/*
759+
* If there's still a reference, the code that obtained it can
760+
* be in different states:
761+
* - The common case of not using cancellations, or already
762+
* after debugfs_leave_cancellation(), where we just need
763+
* to wait for debugfs_file_put() which signals the completion;
764+
* - inside a cancellation section, i.e. between
765+
* debugfs_enter_cancellation() and debugfs_leave_cancellation(),
766+
* in which case we need to trigger the ->cancel() function,
767+
* and then wait for debugfs_file_put() just like in the
768+
* previous case;
769+
* - before debugfs_enter_cancellation() (but obviously after
770+
* debugfs_file_get()), in which case we may not see the
771+
* cancellation in the list on the first round of the loop,
772+
* but debugfs_enter_cancellation() signals the completion
773+
* after adding it, so this code gets woken up to call the
774+
* ->cancel() function.
775+
*/
761776
while (refcount_read(&fsd->active_users)) {
762777
struct debugfs_cancellation *c;
763778

0 commit comments

Comments
 (0)