Skip to content

fix: thread safety issues#614

Open
leakonvalinka wants to merge 6 commits into
open-feature:mainfrom
leakonvalinka:fix/thread-safety-adjustments
Open

fix: thread safety issues#614
leakonvalinka wants to merge 6 commits into
open-feature:mainfrom
leakonvalinka:fix/thread-safety-adjustments

Conversation

@leakonvalinka

@leakonvalinka leakonvalinka commented Jun 19, 2026

Copy link
Copy Markdown
Member

This PR

  • aims to improve the thread-safety of the python-sdk
  • some of these changes still need to be discussed -> see comments (mentioning potential issues and options)

Related Issues

#96

Notes

  • i'm not a python expert, please let me know if i got anything wrong :)

Signed-off-by: Lea Konvalinka <lea.konvalinka@dynatrace.com>
Signed-off-by: Lea Konvalinka <lea.konvalinka@dynatrace.com>
Signed-off-by: Lea Konvalinka <lea.konvalinka@dynatrace.com>
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.36%. Comparing base (79bb01b) to head (027a326).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #614      +/-   ##
==========================================
+ Coverage   98.34%   98.36%   +0.02%     
==========================================
  Files          45       45              
  Lines        2483     2514      +31     
==========================================
+ Hits         2442     2473      +31     
  Misses         41       41              
Flag Coverage Δ
unittests 98.36% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e4d26a76-58ca-41b3-a903-b5f4c00f357c

📥 Commits

Reviewing files that changed from the base of the PR and between 22d9583 and 027a326.

📒 Files selected for processing (2)
  • openfeature/client.py
  • openfeature/hook/__init__.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • openfeature/hook/init.py
  • openfeature/client.py

📝 Walkthrough

Walkthrough

The PR hardens shared-state access across the Python SDK: hook lists (global and per-client) now mutate under RLocks, transaction-context propagator reads/writes run under a module-level lock, _event_support.clear() acquires both locks atomically, AbstractProvider.emit uses getattr/None instead of hasattr, and _assert_provider_status is refactored to accept an explicit provider and query provider_registry directly, eliminating a TOCTOU race.

Changes

Shared-state hardening

Layer / File(s) Summary
Hook list locking
openfeature/hook/__init__.py, openfeature/client.py
Global _hooks list mutations now run under a module-level _hooks_lock (RLock); OpenFeatureClient gains _hooks_lock and wraps add_hooks similarly. Comments document the no-GIL / tight-scheduling loss scenario.
Event handler clearing and lock-order comments
openfeature/_event_support.py
clear() acquires _global_lock and _client_lock together in one with statement. Comments in add_client_handler and add_global_handler explain why the immediate-fire call runs outside the lock to avoid lock-order inversion.
Transaction context propagator locking
openfeature/transaction_context/__init__.py
Introduces _propagator_lock; set_transaction_context_propagator, get_transaction_context, and set_transaction_context all acquire this lock before accessing or mutating the global propagator.
Provider emit guard and client TOCTOU fix
openfeature/provider/__init__.py, openfeature/client.py, tests/test_client.py
AbstractProvider.emit uses getattr(..., None) instead of hasattr. _assert_provider_status now takes an explicit provider and calls provider_registry.get_provider_status(provider); both async and sync evaluation paths pass the resolved provider. Tests are updated to patch provider_registry.get_provider_status and a new regression test verifies the TOCTOU fix.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing thread-safety issues in the Python SDK.
Description check ✅ Passed The description is directly related to the changes and states the thread-safety goal of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands.

@leakonvalinka leakonvalinka changed the title Fix/thread safety adjustments fix: thread safety issues Jun 19, 2026
Signed-off-by: Lea Konvalinka <lea.konvalinka@dynatrace.com>
@leakonvalinka leakonvalinka force-pushed the fix/thread-safety-adjustments branch from 0c9eb51 to 5d11dbc Compare June 19, 2026 10:51
@leakonvalinka leakonvalinka marked this pull request as ready for review June 25, 2026 12:32
@leakonvalinka leakonvalinka requested review from a team as code owners June 25, 2026 12:32

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openfeature/provider/_registry.py (1)

99-105: 🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Avoid clearing event handlers while holding the registry lock.

clear_event_handlers() acquires _client_lock; concurrently, handler dispatch holds _client_lock while resolving client.provider, which now acquires ProviderRegistry._lock. This creates a registry-lock → client-lock path here and a client-lock → registry-lock path during dispatch, so clear_providers() can deadlock with provider event dispatch.

🔒 Proposed fix
     def clear_providers(self) -> None:
         self.shutdown()
         with self._lock:
             self._providers.clear()
             self._default_provider = NoOpProvider()
             self._provider_status = {
                 self._default_provider: ProviderStatus.READY,
             }
-            clear_event_handlers()
+
+        clear_event_handlers()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openfeature/provider/_registry.py` around lines 99 - 105, clear_providers()
is calling clear_event_handlers() while still holding ProviderRegistry._lock,
which creates a lock-order cycle with event dispatch. Move the
clear_event_handlers() call out of the locked section in
ProviderRegistry.clear_providers(), keeping only provider state updates under
the lock and then clearing handlers after the lock is released. Use the
ProviderRegistry._lock, clear_providers(), and clear_event_handlers() symbols to
update the flow so the registry lock is never held when acquiring the
client/event-handler lock.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openfeature/transaction_context/__init__.py`:
- Around line 26-50: The transaction-context helpers can self-deadlock because
_propagator_lock is a non-reentrant Lock while get_transaction_context and
set_transaction_context invoke user-provided TransactionContextPropagator
methods under that lock. Update the locking strategy in
openfeature/transaction_context/__init__.py so re-entrant calls from custom
propagators do not block the same thread, for example by switching
_propagator_lock to an RLock and keeping the existing serialization around
set_transaction_context_propagator, get_transaction_context, and
set_transaction_context.

---

Outside diff comments:
In `@openfeature/provider/_registry.py`:
- Around line 99-105: clear_providers() is calling clear_event_handlers() while
still holding ProviderRegistry._lock, which creates a lock-order cycle with
event dispatch. Move the clear_event_handlers() call out of the locked section
in ProviderRegistry.clear_providers(), keeping only provider state updates under
the lock and then clearing handlers after the lock is released. Use the
ProviderRegistry._lock, clear_providers(), and clear_event_handlers() symbols to
update the flow so the registry lock is never held when acquiring the
client/event-handler lock.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e01e6e3-2b84-4dab-a926-10ec95fae6b5

📥 Commits

Reviewing files that changed from the base of the PR and between 79bb01b and 7f14e05.

📒 Files selected for processing (8)
  • openfeature/_event_support.py
  • openfeature/api.py
  • openfeature/client.py
  • openfeature/hook/__init__.py
  • openfeature/provider/__init__.py
  • openfeature/provider/_registry.py
  • openfeature/transaction_context/__init__.py
  • tests/test_client.py
💤 Files with no reviewable changes (1)
  • openfeature/api.py

Comment thread openfeature/transaction_context/__init__.py Outdated
Signed-off-by: Lea Konvalinka <lea.konvalinka@dynatrace.com>
@leakonvalinka leakonvalinka force-pushed the fix/thread-safety-adjustments branch from 7f14e05 to 22d9583 Compare June 25, 2026 12:55
@aepfli

aepfli commented Jun 25, 2026

Copy link
Copy Markdown
Member

I am curious, can we add a tool to detect this like vmlens in java? In the sense of automated testing?

Comment thread openfeature/provider/_registry.py Outdated

@gruebel gruebel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks 🍻 overall it looks pretty good and adds more thread safety especially when using free-threaded CPython

Comment thread openfeature/hook/__init__.py Outdated
Comment thread openfeature/client.py Outdated
Comment on lines -264 to +266
if hasattr(self, "_on_emit"):
self._on_emit(self, event, details)
on_emit = getattr(self, "_on_emit", None)
if on_emit is not None:
on_emit(self, event, details)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of what was pointed out in the original issue:
"
AbstractProvider._on_emit (provider/__init__.py) - emit() does if hasattr(self, "_on_emit"): self._on_emit(...) which is a TOCTOU; detach() during shutdown can delete _on_emit while a background thread is between the check and the call
"

Comment thread openfeature/provider/_registry.py Outdated
Comment thread openfeature/provider/_registry.py Outdated
Comment thread openfeature/provider/_registry.py Outdated
Comment thread openfeature/provider/_registry.py Outdated
Comment thread openfeature/provider/_registry.py Outdated

def get_transaction_context() -> EvaluationContext:
return _evaluation_transaction_context_propagator.get_transaction_context()
with _propagator_lock:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this one here, if this is really needed or not

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I tried to ask Claude but the answer was sometimes yes and sometimes no 😅

Comment thread openfeature/api.py
Signed-off-by: Lea Konvalinka <lea.konvalinka@dynatrace.com>
@leakonvalinka

Copy link
Copy Markdown
Member Author

@aepfli I don't think so, unfortunetly. I did try to write a few tests and also have Claude write some with threading.Barrier & run them a few thousand times with pytest repeat, but they didn't reliably catch any issues even before the fixes were implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants