Skip to content

Commit 2cdd64c

Browse files
committed
KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter
Disallow binding an irqfd to an eventfd that already has a priority waiter, i.e. to an eventfd that already has an attached irqfd. KVM always operates in exclusive mode for EPOLL_IN (unconditionally returns '1'), i.e. only the first waiter will be notified. KVM already disallows binding multiple irqfds to an eventfd in a single VM, but doesn't guard against multiple VMs binding to an eventfd. Adding the extra protection reduces the pain of a userspace VMM bug, e.g. if userspace fails to de-assign before re-assigning when transferring state for intra-host migration, then the migration will explicitly fail as opposed to dropping IRQs on the destination VM. Temporarily keep KVM's manual check on irqfds.items, but add a WARN, e.g. to allow sanity checking the waitqueue enforcement. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: David Matlack <dmatlack@google.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250522235223.3178519-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
1 parent 0d09582 commit 2cdd64c

1 file changed

Lines changed: 37 additions & 18 deletions

File tree

virt/kvm/eventfd.c

Lines changed: 37 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -291,38 +291,57 @@ static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh,
291291
struct kvm_kernel_irqfd *tmp;
292292
struct kvm *kvm = p->kvm;
293293

294+
/*
295+
* Note, irqfds.lock protects the irqfd's irq_entry, i.e. its routing,
296+
* and irqfds.items. It does NOT protect registering with the eventfd.
297+
*/
294298
spin_lock_irq(&kvm->irqfds.lock);
295299

296-
list_for_each_entry(tmp, &kvm->irqfds.items, list) {
297-
if (irqfd->eventfd != tmp->eventfd)
298-
continue;
299-
/* This fd is used for another irq already. */
300-
p->ret = -EBUSY;
301-
spin_unlock_irq(&kvm->irqfds.lock);
302-
return;
303-
}
304-
300+
/*
301+
* Initialize the routing information prior to adding the irqfd to the
302+
* eventfd's waitqueue, as irqfd_wakeup() can be invoked as soon as the
303+
* irqfd is registered.
304+
*/
305305
irqfd_update(kvm, irqfd);
306306

307-
list_add_tail(&irqfd->list, &kvm->irqfds.items);
308-
309307
/*
310308
* Add the irqfd as a priority waiter on the eventfd, with a custom
311309
* wake-up handler, so that KVM *and only KVM* is notified whenever the
312-
* underlying eventfd is signaled. Temporarily lie to lockdep about
313-
* holding irqfds.lock to avoid a false positive regarding potential
314-
* deadlock with irqfd_wakeup() (see irqfd_wakeup() for details).
310+
* underlying eventfd is signaled.
315311
*/
316312
init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
317313

314+
/*
315+
* Temporarily lie to lockdep about holding irqfds.lock to avoid a
316+
* false positive regarding potential deadlock with irqfd_wakeup()
317+
* (see irqfd_wakeup() for details).
318+
*
319+
* Adding to the wait queue will fail if there is already a priority
320+
* waiter, i.e. if the eventfd is associated with another irqfd (in any
321+
* VM). Note, kvm_irqfd_deassign() waits for all in-flight shutdown
322+
* jobs to complete, i.e. ensures the irqfd has been removed from the
323+
* eventfd's waitqueue before returning to userspace.
324+
*/
318325
spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_);
319-
irqfd->wait.flags |= WQ_FLAG_EXCLUSIVE;
320-
add_wait_queue_priority(wqh, &irqfd->wait);
326+
p->ret = add_wait_queue_priority_exclusive(wqh, &irqfd->wait);
321327
spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_);
328+
if (p->ret)
329+
goto out;
322330

323-
spin_unlock_irq(&kvm->irqfds.lock);
331+
list_for_each_entry(tmp, &kvm->irqfds.items, list) {
332+
if (irqfd->eventfd != tmp->eventfd)
333+
continue;
324334

325-
p->ret = 0;
335+
WARN_ON_ONCE(1);
336+
/* This fd is used for another irq already. */
337+
p->ret = -EBUSY;
338+
goto out;
339+
}
340+
341+
list_add_tail(&irqfd->list, &kvm->irqfds.items);
342+
343+
out:
344+
spin_unlock_irq(&kvm->irqfds.lock);
326345
}
327346

328347
#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)

0 commit comments

Comments
 (0)