Skip to content

Commit 1416bd5

Browse files
Alexander Aringteigland
authored andcommitted
dlm: fix recovery pending middle conversion
During a workload involving conversions between lock modes PR and CW, lock recovery can create a "conversion deadlock" state between locks that have been recovered. When this occurs, kernel warning messages are logged, e.g. "dlm: WARN: pending deadlock 1e node 0 2 1bf21" "dlm: receive_rcom_lock_args 2e middle convert gr 3 rq 2 remote 2 1e" After this occurs, the deadlocked conversions both appear on the convert queue of the resource being locked, and the conversion requests do not complete. Outside of recovery, conversions that would produce a deadlock are resolved immediately, and return -EDEADLK. The locks are not placed on the convert queue in the deadlocked state. To fix this problem, an lkb under conversion between PR/CW is rebuilt during recovery on a new master's granted queue, with the currently granted mode, rather than being rebuilt on the new master's convert queue, with the currently granted mode and the newly requested mode. The in-progress convert is then resent to the new master after recovery, so the conversion deadlock will be processed outside of the recovery context and handled as described above. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
1 parent 24d479d commit 1416bd5

1 file changed

Lines changed: 1 addition & 18 deletions

File tree

fs/dlm/lock.c

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5014,25 +5014,8 @@ void dlm_receive_buffer(const union dlm_packet *p, int nodeid)
50145014
static void recover_convert_waiter(struct dlm_ls *ls, struct dlm_lkb *lkb,
50155015
struct dlm_message *ms_local)
50165016
{
5017-
if (middle_conversion(lkb)) {
5018-
log_rinfo(ls, "%s %x middle convert in progress", __func__,
5019-
lkb->lkb_id);
5020-
5021-
/* We sent this lock to the new master. The new master will
5022-
* tell us when it's granted. We no longer need a reply, so
5023-
* use a fake reply to put the lkb into the right state.
5024-
*/
5025-
hold_lkb(lkb);
5026-
memset(ms_local, 0, sizeof(struct dlm_message));
5027-
ms_local->m_type = cpu_to_le32(DLM_MSG_CONVERT_REPLY);
5028-
ms_local->m_result = cpu_to_le32(to_dlm_errno(-EINPROGRESS));
5029-
ms_local->m_header.h_nodeid = cpu_to_le32(lkb->lkb_nodeid);
5030-
_receive_convert_reply(lkb, ms_local, true);
5031-
unhold_lkb(lkb);
5032-
5033-
} else if (lkb->lkb_rqmode >= lkb->lkb_grmode) {
5017+
if (middle_conversion(lkb) || lkb->lkb_rqmode >= lkb->lkb_grmode)
50345018
set_bit(DLM_IFL_RESEND_BIT, &lkb->lkb_iflags);
5035-
}
50365019

50375020
/* lkb->lkb_rqmode < lkb->lkb_grmode shouldn't happen since down
50385021
conversions are async; there's no reply from the remote master */

0 commit comments

Comments
 (0)