Skip to content

Commit 34d2bfe

Browse files
committed
io_uring: improve task work cache utilization
While profiling task_work intensive workloads, I noticed that most of the time in tctx_task_work() is spending stalled on loading 'req'. This is one of the unfortunate side effects of using linked lists, particularly when they end up being passe around. Prefetch the next request, if there is one. There's a sufficient amount of work in between that this makes it available for the next loop. While fiddling with the cache layout, move the link outside of the hot completion cacheline. It's rarely used in hot workloads, so better to bring in kbuf which is used for networked loads with provided buffers. This reduces tctx_task_work() overhead from ~3% to 1-1.5% in my testing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
1 parent a73825b commit 34d2bfe

1 file changed

Lines changed: 5 additions & 1 deletion

File tree

fs/io_uring.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -928,7 +928,6 @@ struct io_kiocb {
928928
struct io_wq_work_node comp_list;
929929
atomic_t refs;
930930
atomic_t poll_refs;
931-
struct io_kiocb *link;
932931
struct io_task_work io_task_work;
933932
/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
934933
struct hlist_node hash_node;
@@ -939,6 +938,7 @@ struct io_kiocb {
939938
/* custom credentials, valid IFF REQ_F_CREDS is set */
940939
/* stores selected buf, valid IFF REQ_F_BUFFER_SELECTED is set */
941940
struct io_buffer *kbuf;
941+
struct io_kiocb *link;
942942
const struct cred *creds;
943943
struct io_wq_work work;
944944
};
@@ -2451,6 +2451,8 @@ static void handle_prev_tw_list(struct io_wq_work_node *node,
24512451
struct io_kiocb *req = container_of(node, struct io_kiocb,
24522452
io_task_work.node);
24532453

2454+
prefetch(container_of(next, struct io_kiocb, io_task_work.node));
2455+
24542456
if (req->ctx != *ctx) {
24552457
if (unlikely(!*uring_locked && *ctx))
24562458
ctx_commit_and_unlock(*ctx);
@@ -2483,6 +2485,8 @@ static void handle_tw_list(struct io_wq_work_node *node,
24832485
struct io_kiocb *req = container_of(node, struct io_kiocb,
24842486
io_task_work.node);
24852487

2488+
prefetch(container_of(next, struct io_kiocb, io_task_work.node));
2489+
24862490
if (req->ctx != *ctx) {
24872491
ctx_flush_and_put(*ctx, locked);
24882492
*ctx = req->ctx;

0 commit comments

Comments
 (0)