Merge tag 'for-6.12/io_uring-20240922' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe: "Mostly just a set of fixes in here, or little changes that didn't get included in the initial pull request. This contains: - Move the SQPOLL napi polling outside the submission lock (Olivier) - Rename of the "copy buffers" API that got added in the 6.12 merge window. There's really no copying going on, it's just referencing the buffers. After a bit of consideration, decided that it was better to simply rename this to avoid potential confusion (me) - Shrink struct io_mapped_ubuf from 48 to 32 bytes, by changing it to start + len tracking rather than having start / end in there, and by removing the caching of folio_mask when we can just calculate it from folio_shift when we need it (me) - Fixes for the SQPOLL affinity checking (me, Felix) - Fix for how cqring waiting checks for the presence of task_work. Just check it directly rather than check for a specific notification mechanism (me) - Tweak to how request linking is represented in tracing (me) - Fix a syzbot report that deliberately sets up a huge list of overflow entries, and then hits rcu stalls when flushing this list. Just check for the need to preempt, and drop/reacquire locks in the loop. There's no state maintained over the loop itself, and each entry is yanked from head-of-list (me)" * tag 'for-6.12/io_uring-20240922' of git://git.kernel.dk/linux: io_uring: check if we need to reschedule during overflow flush io_uring: improve request linking trace io_uring: check for presence of task_work rather than TIF_NOTIFY_SIGNAL io_uring/sqpoll: do the napi busy poll outside the submission block io_uring: clean up a type in io_uring_register_get_file() io_uring/sqpoll: do not put cpumask on stack io_uring/sqpoll: retain test for whether the CPU is valid io_uring/rsrc: change ubuf->ubuf_end to length tracking io_uring/rsrc: get rid of io_mapped_ubuf->folio_mask io_uring: rename "copy buffers" to "clone buffers"
author: Linus Torvalds <torvalds@linux-foundation.org> 2024-09-24 11:11:38 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2024-09-24 11:11:38 -0700
commit: 3147a0689dd9793990ff954369ffcdf2de984b46 (patch)
tree: 8c09e627f830e204fdc2009e46c3509b76d6b189 /io_uring/io_uring.c
parent: 172d513936c707e991c3eca1b79cd8a153171862 (diff)
parent: eac2ca2d682f94f46b1973bdf5e77d85d77b8e53 (diff)
1 files changed, 18 insertions, 3 deletions
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f3570e81ecb4..feb61d68dca6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -635,6 +635,21 @@ static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying)
 		}
 		list_del(&ocqe->list);
 		kfree(ocqe);
+
+		/*
+		 * For silly syzbot cases that deliberately overflow by huge
+		 * amounts, check if we need to resched and drop and
+		 * reacquire the locks if so. Nothing real would ever hit this.
+		 * Ideally we'd have a non-posting unlock for this, but hard
+		 * to care for a non-real case.
+		 */
+		if (need_resched()) {
+			io_cq_unlock_post(ctx);
+			mutex_unlock(&ctx->uring_lock);
+			cond_resched();
+			mutex_lock(&ctx->uring_lock);
+			io_cq_lock(ctx);
+		}
 	}
 
 	if (list_empty(&ctx->cq_overflow_list)) {
@@ -2164,7 +2179,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	 * conditions are true (normal request), then just queue it.
 	 */
 	if (unlikely(link->head)) {
-		trace_io_uring_link(req, link->head);
+		trace_io_uring_link(req, link->last);
 		link->last->link = req;
 		link->last = req;
 
@@ -2472,7 +2487,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 		return 1;
 	if (unlikely(!llist_empty(&ctx->work_llist)))
 		return 1;
-	if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL)))
+	if (unlikely(task_work_pending(current)))
 		return 1;
 	if (unlikely(task_sigpending(current)))
 		return -EINTR;
@@ -2579,9 +2594,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 		 * If we got woken because of task_work being processed, run it
 		 * now rather than let the caller do another wait loop.
 		 */
-		io_run_task_work();
 		if (!llist_empty(&ctx->work_llist))
 			io_run_local_work(ctx, nr_wait);
+		io_run_task_work();
 
 		/*
 		 * Non-local task_work will be run on exit to userspace, but
author	Linus Torvalds <torvalds@linux-foundation.org>	2024-09-24 11:11:38 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2024-09-24 11:11:38 -0700
commit	3147a0689dd9793990ff954369ffcdf2de984b46 (patch)
tree	8c09e627f830e204fdc2009e46c3509b76d6b189 /io_uring/io_uring.c
parent	172d513936c707e991c3eca1b79cd8a153171862 (diff)
parent	eac2ca2d682f94f46b1973bdf5e77d85d77b8e53 (diff)