Skip to content

Commit 4960626

Browse files
namhyungPeter Zijlstra
authored andcommitted
perf/core: Fix slow perf_event_task_exit() with LBR callstacks
I got a report that a task is stuck in perf_event_exit_task() waiting for global_ctx_data_rwsem. On large systems with lots threads, it'd have performance issues when it grabs the lock to iterate all threads in the system to allocate the context data. And it'd block task exit path which is problematic especially under memory pressure. perf_event_open perf_event_alloc attach_perf_ctx_data attach_global_ctx_data percpu_down_write (global_ctx_data_rwsem) for_each_process_thread alloc_task_ctx_data do_exit perf_event_exit_task percpu_down_read (global_ctx_data_rwsem) It should not hold the global_ctx_data_rwsem on the exit path. Let's skip allocation for exiting tasks and free the data carefully. Reported-by: Rosalie Fang <rosaliefang@google.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260112165157.1919624-1-namhyung@kernel.org
1 parent eebe644 commit 4960626

1 file changed

Lines changed: 18 additions & 2 deletions

File tree

kernel/events/core.c

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5421,9 +5421,20 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
54215421
return -ENOMEM;
54225422

54235423
for (;;) {
5424-
if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
5424+
if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
54255425
if (old)
54265426
perf_free_ctx_data_rcu(old);
5427+
/*
5428+
* Above try_cmpxchg() pairs with try_cmpxchg() from
5429+
* detach_task_ctx_data() such that
5430+
* if we race with perf_event_exit_task(), we must
5431+
* observe PF_EXITING.
5432+
*/
5433+
if (task->flags & PF_EXITING) {
5434+
/* detach_task_ctx_data() may free it already */
5435+
if (try_cmpxchg(&task->perf_ctx_data, &cd, NULL))
5436+
perf_free_ctx_data_rcu(cd);
5437+
}
54275438
return 0;
54285439
}
54295440

@@ -5469,6 +5480,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
54695480
/* Allocate everything */
54705481
scoped_guard (rcu) {
54715482
for_each_process_thread(g, p) {
5483+
if (p->flags & PF_EXITING)
5484+
continue;
54725485
cd = rcu_dereference(p->perf_ctx_data);
54735486
if (cd && !cd->global) {
54745487
cd->global = 1;
@@ -14562,8 +14575,11 @@ void perf_event_exit_task(struct task_struct *task)
1456214575

1456314576
/*
1456414577
* Detach the perf_ctx_data for the system-wide event.
14578+
*
14579+
* Done without holding global_ctx_data_rwsem; typically
14580+
* attach_global_ctx_data() will skip over this task, but otherwise
14581+
* attach_task_ctx_data() will observe PF_EXITING.
1456514582
*/
14566-
guard(percpu_read)(&global_ctx_data_rwsem);
1456714583
detach_task_ctx_data(task);
1456814584
}
1456914585

0 commit comments

Comments
 (0)