genirq: Prevent migration live lock in handle_edge_irq()

KAGA-KOKO · KAGA-KOKO · commit 8d39d6ec4db5 · 2025-07-22T14:30:42.000+02:00
Yicon reported and Liangyan debugged a live lock in handle_edge_irq() related to interrupt migration. If the interrupt affinity is moved to a new target CPU and the interrupt is currently handled on the previous target CPU for edge type interrupts the handler might get stuck on the previous target: CPU 0 (previous target) CPU 1 (new target) handle_edge_irq() repeat: handle_event() handle_edge_irq() if (INPROGESS) { set(PENDING); mask(); return; } if (PENDING) { clear(PENDING); unmask(); goto repeat; } The migration in software never completes and CPU0 continues to handle the pending events forever. This happens when the device raises interrupts with a high rate and always before handle_event() completes and before the CPU0 handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and over. This has been observed in virtual machines. Prevent this by checking whether the CPU which observes the INPROGRESS flag is the new affinity target. If that's the case, do not set the PENDING flag and wait for the INPROGRESS flag to be cleared instead, so that the new interrupt is handled on the new target CPU and the previous CPU is released from the action. This is restricted to the edge type handler and only utilized on systems, which use single CPU targets for interrupt affinity. Reported-by: Yicong Shen <shenyicong.1023@bytedance.com> Reported-by: Liangyan <liangyan.peng@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com Link: https://lore.kernel.org/all/20250718185312.076515034@linutronix.de
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
@@ -476,11 +476,14 @@ static bool irq_wait_on_inprogress(struct irq_desc *desc)
 
 static bool irq_can_handle_pm(struct irq_desc *desc)
 {
+	struct irq_data *irqd = &desc->irq_data;
+	const struct cpumask *aff;
+
 	/*
 	 * If the interrupt is not in progress and is not an armed
 	 * wakeup interrupt, proceed.
 	 */
-	if (!irqd_has_set(&desc->irq_data, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
+	if (!irqd_has_set(irqd, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
 		return true;
 
 	/*
@@ -501,7 +504,41 @@ static bool irq_can_handle_pm(struct irq_desc *desc)
 			return false;
 		return irq_wait_on_inprogress(desc);
 	}
-	return false;
+
+	/* The below works only for single target interrupts */
+	if (!IS_ENABLED(CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK) ||
+	    !irqd_is_single_target(irqd) || desc->handle_irq != handle_edge_irq)
+		return false;
+
+	/*
+	 * If the interrupt affinity was moved to this CPU and the
+	 * interrupt is currently handled on the previous target CPU, then
+	 * busy wait for INPROGRESS to be cleared. Otherwise for edge type
+	 * interrupts the handler might get stuck on the previous target:
+	 *
+	 * CPU 0			CPU 1 (new target)
+	 * handle_edge_irq()
+	 * repeat:
+	 *	handle_event()		handle_edge_irq()
+	 *			        if (INPROGESS) {
+	 *				  set(PENDING);
+	 *				  mask();
+	 *				  return;
+	 *				}
+	 *	if (PENDING) {
+	 *	  clear(PENDING);
+	 *	  unmask();
+	 *	  goto repeat;
+	 *	}
+	 *
+	 * This happens when the device raises interrupts with a high rate
+	 * and always before handle_event() completes and the CPU0 handler
+	 * can clear INPROGRESS. This has been observed in virtual machines.
+	 */
+	aff = irq_data_get_effective_affinity_mask(irqd);
+	if (cpumask_first(aff) != smp_processor_id())
+		return false;
+	return irq_wait_on_inprogress(desc);
 }
 
 static inline bool irq_can_handle_actions(struct irq_desc *desc)