Skip to content

Commit 0fb4827

Browse files
committed
pidfs: improve multi-threaded exec and premature thread-group leader exit polling
This is another attempt trying to make pidfd polling for multi-threaded exec and premature thread-group leader exit consistent. A quick recap of these two cases: (1) During a multi-threaded exec by a subthread, i.e., non-thread-group leader thread, all other threads in the thread-group including the thread-group leader are killed and the struct pid of the thread-group leader will be taken over by the subthread that called exec. IOW, two tasks change their TIDs. (2) A premature thread-group leader exit means that the thread-group leader exited before all of the other subthreads in the thread-group have exited. Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the current thread-group leader may or may not see an exit notification on the file descriptor depending on when poll is performed. If the poll is performed before the exec of the subthread has concluded an exit notification is generated for the old thread-group leader. If the poll is performed after the exec of the subthread has concluded no exit notification is generated for the old thread-group leader. The correct behavior would be to simply not generate an exit notification on the struct pid of a subhthread exec because the struct pid is taken over by the subthread and thus remains alive. But this is difficult to handle because a thread-group may exit prematurely as mentioned in (2). In that case an exit notification is reliably generated but the subthreads may continue to run for an indeterminate amount of time and thus also may exec at some point. So far there was no way to distinguish between (1) and (2) internally. This tiny series tries to address this problem by discarding PIDFD_THREAD notification on premature thread-group leader exit. If that works correctly then no exit notifications are generated for a PIDFD_THREAD pidfd for a thread-group leader until all subthreads have been reaped. If a subthread should exec aftewards no exit notification will be generated until that task exits or it creates subthreads and repeates the cycle. Co-Developed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-1-da678ce805bf@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent 68db272 commit 0fb4827

3 files changed

Lines changed: 9 additions & 9 deletions

File tree

fs/pidfs.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -210,20 +210,21 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
210210
static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
211211
{
212212
struct pid *pid = pidfd_pid(file);
213-
bool thread = file->f_flags & PIDFD_THREAD;
214213
struct task_struct *task;
215214
__poll_t poll_flags = 0;
216215

217216
poll_wait(file, &pid->wait_pidfd, pts);
218217
/*
219-
* Depending on PIDFD_THREAD, inform pollers when the thread
220-
* or the whole thread-group exits.
218+
* Don't wake waiters if the thread-group leader exited
219+
* prematurely. They either get notified when the last subthread
220+
* exits or not at all if one of the remaining subthreads execs
221+
* and assumes the struct pid of the old thread-group leader.
221222
*/
222223
guard(rcu)();
223224
task = pid_task(pid, PIDTYPE_PID);
224225
if (!task)
225226
poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP;
226-
else if (task->exit_state && (thread || thread_group_empty(task)))
227+
else if (task->exit_state && !delay_group_leader(task))
227228
poll_flags = EPOLLIN | EPOLLRDNORM;
228229

229230
return poll_flags;

kernel/exit.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -743,10 +743,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
743743

744744
tsk->exit_state = EXIT_ZOMBIE;
745745
/*
746-
* sub-thread or delay_group_leader(), wake up the
747-
* PIDFD_THREAD waiters.
746+
* Ignore thread-group leaders that exited before all
747+
* subthreads did.
748748
*/
749-
if (!thread_group_empty(tsk))
749+
if (!delay_group_leader(tsk))
750750
do_notify_pidfd(tsk);
751751

752752
if (unlikely(tsk->ptrace)) {

kernel/signal.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2180,8 +2180,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
21802180
WARN_ON_ONCE(!tsk->ptrace &&
21812181
(tsk->group_leader != tsk || !thread_group_empty(tsk)));
21822182
/*
2183-
* tsk is a group leader and has no threads, wake up the
2184-
* non-PIDFD_THREAD waiters.
2183+
* Notify for thread-group leaders without subthreads.
21852184
*/
21862185
if (thread_group_empty(tsk))
21872186
do_notify_pidfd(tsk);

0 commit comments

Comments
 (0)