Skip to content

fix(concurrent): resolve ThreadPoolExecutor shutdown deadlock in signal handlers#151482

Open
Synteri wants to merge 1 commit into
python:mainfrom
Synteri:harden/threadpool-executor-signal-deadlock
Open

fix(concurrent): resolve ThreadPoolExecutor shutdown deadlock in signal handlers#151482
Synteri wants to merge 1 commit into
python:mainfrom
Synteri:harden/threadpool-executor-signal-deadlock

Conversation

@Synteri

@Synteri Synteri commented Jun 15, 2026

Copy link
Copy Markdown

Description

This PR resolves a deadlock that occurs when attempting to shut down a ThreadPoolExecutor from within an OS signal handler (such as SIGTERM or SIGINT) if the main thread is interrupted while already executing submit() (which holds the executor's internal _shutdown_lock).

Root Cause

Because _shutdown_lock is a standard, non-reentrant threading.Lock(), synchronous signal handlers executed on the main thread will deadlock if they attempt to acquire the lock a second time.

Simply changing the lock to an RLock() is insufficient because it introduces correctness and task leakage issues:

  1. Reentrant Join Deadlocks: If the signal handler calls shutdown(wait=True) reentrantly, it blocks the main thread to join worker threads, which can deadlock if those worker threads are waiting for the GIL.
  2. Task/Memory Leakage: If shutdown() executes and terminates the worker threads, when the signal handler exits, submit() resumes and places a task on the queue. This task will leak and hang indefinitely since all worker threads are already dead.

Solution

This patch implements a safe, reentrancy-aware shutdown mechanism:

  1. Convert _shutdown_lock to threading.RLock(): Prevents self-deadlocks on lock acquisition when executing inside signal handlers.
  2. Reentrancy Detection in shutdown(): Uses self._shutdown_lock._is_owned() to check if the lock is already owned by the calling thread. If reentrant, it skips the synchronous t.join() loop to prevent blocking the interrupted thread. (Includes a safety fallback if the Python runtime does not implement _is_owned()).
  3. Queue Insertion Safeguards & Reference Clearing in submit(): If self._shutdown is set to True reentrantly (detected at the end of the submit() critical section), we cancel the future, set w.task = None to clear references to user task positional/keyword arguments (avoiding memory leaks), and raise a RuntimeError.

Verification

We verified the fix by simulating tight-loop signal delivery (SIGINT) during execution of a high-throughput submit loop.

  • On unmodified CPython, this consistently triggers a deadlock on _shutdown_lock and hangs the process.
  • With this patch applied, the executor shuts down cleanly, cancels pending futures, clears references, raises the expected RuntimeError, and the process terminates successfully with exit code 0.

Copilot AI review requested due to automatic review settings June 15, 2026 05:00
@python-cla-bot

Copy link
Copy Markdown

The following commit authors need to sign the Contributor License Agreement:

CLA not signed

@bedevere-app

bedevere-app Bot commented Jun 15, 2026

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates ThreadPoolExecutor shutdown/submission behavior to better handle shutdown races and avoid potential deadlocks by introducing reentrancy-aware shutdown logic.

Changes:

  • Switch _shutdown_lock from Lock to RLock to allow re-entrant shutdown paths.
  • Add a post-enqueue shutdown check in submit() that cancels/raises if shutdown has begun.
  • Detect “reentrant” shutdown and skip synchronous thread joining in that case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Lib/concurrent/futures/thread.py Outdated
Comment on lines 214 to 218
self._work_queue.put(w)
if self._shutdown or _shutdown:
f.cancel()
raise RuntimeError('cannot schedule new futures after shutdown')
self._adjust_thread_count()
Comment thread Lib/concurrent/futures/thread.py Outdated
Comment on lines +258 to +260
# Detect if we are called reentrantly (e.g. from a signal handler on a thread
# already holding self._shutdown_lock)
reentrant = self._shutdown_lock._is_owned()
Comment on lines +281 to 283
if wait and not reentrant:
for t in self._threads:
t.join()
Comment on lines 277 to +279
self._work_queue.put(None)
if wait:

# If we are reentrant, we cannot join threads synchronously because the current
Comment on lines +215 to +217
if self._shutdown or _shutdown:
f.cancel()
raise RuntimeError('cannot schedule new futures after shutdown')
@Synteri Synteri force-pushed the harden/threadpool-executor-signal-deadlock branch from e9ab197 to 10c03d3 Compare June 15, 2026 05:11
@bedevere-app

bedevere-app Bot commented Jun 15, 2026

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants