Skip to content

Commit 4de520f

Browse files
committed
Merge tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux
Pull io_uring futex support from Jens Axboe: "This adds support for using futexes through io_uring - first futex wake and wait, and then the vectored variant of waiting, futex waitv. For both wait/wake/waitv, we support the bitset variant, as the 'normal' variants can be easily implemented on top of that. PI and requeue are not supported through io_uring, just the above mentioned parts. This may change in the future, but in the spirit of keeping this small (and based on what people have been asking for), this is what we currently have. Wake support is pretty straight forward, most of the thought has gone into the wait side to avoid needing to offload wait operations to a blocking context. Instead, we rely on the usual callbacks to retry and post a completion event, when appropriate. As far as I can recall, the first request for futex support with io_uring came from Andres Freund, working on postgres. His aio rework of postgres was one of the early adopters of io_uring, and futex support was a natural extension for that. This is relevant from both a usability point of view, as well as for effiency and performance. In Andres's words, for the former: Futex wait support in io_uring makes it a lot easier to avoid deadlocks in concurrent programs that have their own buffer pool: Obviously pages in the application buffer pool have to be locked during IO. If the initiator of IO A needs to wait for a held lock B, the holder of lock B might wait for the IO A to complete. The ability to wait for a lock and IO completions at the same time provides an efficient way to avoid such deadlocks and in terms of effiency, even without unlocking the full potential yet, Andres says: Futex wake support in io_uring is useful because it allows for more efficient directed wakeups. For some "locks" postgres has queues implemented in userspace, with wakeup logic that cannot easily be implemented with FUTEX_WAKE_BITSET on a single "futex word" (imagine waiting for journal flushes to have completed up to a certain point). Thus a "lock release" sometimes need to wake up many processes in a row. A quick-and-dirty conversion to doing these wakeups via io_uring lead to a 3% throughput increase, with 12% fewer context switches, albeit in a fairly extreme workload" * tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux: io_uring: add support for vectored futex waits futex: make the vectored futex operations available futex: make futex_parse_waitv() available as a helper futex: add wake_data to struct futex_q io_uring: add support for futex wake and wait futex: abstract out a __futex_wake_mark() helper futex: factor out the futex wake handling futex: move FUTEX2_VALID_MASK to futex.h
2 parents f5277ad + 8f35019 commit 4de520f

13 files changed

Lines changed: 545 additions & 27 deletions

File tree

include/linux/io_uring_types.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,11 @@ struct io_ring_ctx {
321321

322322
struct hlist_head waitid_list;
323323

324+
#ifdef CONFIG_FUTEX
325+
struct hlist_head futex_list;
326+
struct io_alloc_cache futex_cache;
327+
#endif
328+
324329
const struct cred *sq_creds; /* cred used for __io_sq_thread() */
325330
struct io_sq_data *sq_data; /* if using sq thread polling */
326331

include/uapi/linux/io_uring.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ struct io_uring_sqe {
7070
__u32 msg_ring_flags;
7171
__u32 uring_cmd_flags;
7272
__u32 waitid_flags;
73+
__u32 futex_flags;
7374
};
7475
__u64 user_data; /* data to be passed back at completion time */
7576
/* pack this to avoid bogus arm OABI complaints */
@@ -249,6 +250,9 @@ enum io_uring_op {
249250
IORING_OP_SENDMSG_ZC,
250251
IORING_OP_READ_MULTISHOT,
251252
IORING_OP_WAITID,
253+
IORING_OP_FUTEX_WAIT,
254+
IORING_OP_FUTEX_WAKE,
255+
IORING_OP_FUTEX_WAITV,
252256

253257
/* this goes last, obviously */
254258
IORING_OP_LAST,

io_uring/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \
1010
cancel.o kbuf.o rsrc.o rw.o opdef.o \
1111
notif.o waitid.o
1212
obj-$(CONFIG_IO_WQ) += io-wq.o
13+
obj-$(CONFIG_FUTEX) += futex.o

io_uring/cancel.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include "poll.h"
1717
#include "timeout.h"
1818
#include "waitid.h"
19+
#include "futex.h"
1920
#include "cancel.h"
2021

2122
struct io_cancel {
@@ -124,6 +125,10 @@ int io_try_cancel(struct io_uring_task *tctx, struct io_cancel_data *cd,
124125
if (ret != -ENOENT)
125126
return ret;
126127

128+
ret = io_futex_cancel(ctx, cd, issue_flags);
129+
if (ret != -ENOENT)
130+
return ret;
131+
127132
spin_lock(&ctx->completion_lock);
128133
if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
129134
ret = io_timeout_cancel(ctx, cd);

io_uring/cancel.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
// SPDX-License-Identifier: GPL-2.0
2+
#ifndef IORING_CANCEL_H
3+
#define IORING_CANCEL_H
24

35
#include <linux/io_uring_types.h>
46

@@ -22,3 +24,5 @@ void init_hash_table(struct io_hash_table *table, unsigned size);
2224

2325
int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg);
2426
bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd);
27+
28+
#endif

0 commit comments

Comments
 (0)