fix: Redact SSO PII before deletion#38425
Conversation
|
@robrap We’re dealing with multiple ways SSO records can get deleted through Django admin, user actions like unlinking accounts, bulk retirement scripts. The challenge is that we don’t control all of these paths, so we can’t reliably add PII redaction directly into each one. Instead, we’ve set up a two-layer approach. The first layer is a Django signal that runs automatically right before any SSO record is deleted. This acts as a safety net. No matter how the deletion is triggered whether it’s from admin, user action, the signal ensures sensitive fields like the UID and extra data are redacted. It’s centralized, consistent, so it won’t cause issues if it runs more than once. The second layer is used only in cases we fully control, like user retirement flows. There, we proactively run a bulk redaction step before deleting records. This is much faster because it uses efficient database operations. When the delete happens afterward, the signal still fires, but it detects that the data is already redacted and simply exits without doing extra work. Together, these two layers cover both safety and performance. The signal guarantees we never miss redaction, even in code we don’t control, while the explicit bulk step keeps large-scale operations efficient. |
robrap
left a comment
There was a problem hiding this comment.
I still need to look at tests, but some minor comments. Looking good so far.
robrap
left a comment
There was a problem hiding this comment.
Mostly test clean-up comments at this point.
| captured_states = [] | ||
|
|
||
| def capture_state_before_delete(sender, instance, **kwargs): # pylint: disable=unused-argument | ||
| instance.refresh_from_db() | ||
| captured_states.append({ | ||
| 'id': instance.id, | ||
| 'uid': instance.uid, | ||
| 'extra_data': dict(instance.extra_data) if instance.extra_data else {}, | ||
| }) |
There was a problem hiding this comment.
I think using a pre_delete' signal for testing a pre_deletesignal makes this confusing. Is that what is being done here? How do you know what order thepre_delete` signals will get called? I'd rather it wasn't confusing in this way, and you used some other mechanism to test, like checking that there is an appropriate UPDATE query before the DELETE query, as we did in the earlier PR. You can retain the not exists assertion at the end.
Also, If this were needed, you've got a lot of code redundancy. You could use setUpClass or setUp and tearDownClass or tearDown, or helper functions to keep things DRY (Don't Repeat Yourself).
There was a problem hiding this comment.
refactored to use CaptureQueriesContext to assert UPDATE precedes DELETE, and moved the signal tests to a new test_signals.py.
| Safety-net signal handler that redacts PII on any UserSocialAuth before deletion. | ||
|
|
||
| Records deleted via ``redact_and_delete_social_auth`` will already be redacted; | ||
| this handler is a fallback for any other deletion path. |
There was a problem hiding this comment.
| this handler is a fallback for any other deletion path. | |
| this handler is a fallback for any missed deletion path. |
| Redaction happens before deletion so that any observers see only sanitised data. | ||
| Downstream copies of data may use soft-deletes, and redacting before deleting | ||
| ensures PII for retired users (or future retirements) is not retained. | ||
| The uid format matches ``get_redacted_social_auth_uid()``. |
There was a problem hiding this comment.
Moving this below...
| The uid format matches ``get_redacted_social_auth_uid()``. |
| """ | ||
| social_auth_queryset = UserSocialAuth.objects.filter(user_id=user_id) | ||
| social_auth_queryset.update( | ||
| uid=Concat( |
There was a problem hiding this comment.
Moved (and edited) comment:
| uid=Concat( | |
| # Important: this redacted uid must match the format used by ``get_redacted_social_auth_uid()``. | |
| uid=Concat( |
There was a problem hiding this comment.
Added the comment
| """ | ||
| Redact PII from all UserSocialAuth records for the given user, then delete them. | ||
|
|
||
| Redaction happens before deletion so that any observers see only sanitised data. |
There was a problem hiding this comment.
Consider dropping this comment. The comment about soft-deletes is probably enough.
| Redaction happens before deletion so that any observers see only sanitised data. |
|
|
||
|
|
||
| @skip_unless_lms | ||
| class RedactUserSocialAuthPIITest(TestCase): |
There was a problem hiding this comment.
- The signal tests belong in a
test_signals.pyfile with an appropriate class name. Some reasonable signal tests:
- Does the signal warn and redact if not already redacted?
- Does the signal skip warning (and redaction) if already redacted?
- Optional: Using mock, confirm
redact_and_delete_social_authis called withskip_delete=True.
- For utils tests of direct calls to
redact_and_delete_social_auth, you can cover any items you didn't cover in signals (like maybetest_delete_redacts_multiple_sso_providers), and this shouldn't require signal setup and teardown.
Note: You have much of what you need, so hopefully this is minor refactoring and clean-up.
There was a problem hiding this comment.
signal tests moved to test_signals.py using mock.patch, and utils tests now call redact_and_delete_social_auth directly without any signal setup/teardown.
|
|
||
| captured_states = [] | ||
|
|
||
| def capture_state_before_delete(sender, instance, **kwargs): # pylint: disable=unused-argument |
There was a problem hiding this comment.
- You may want to use the same UPDATE/DELETE query assertion you set up for the other test. See other comment for details.
- You'll also want to ensure that the real receiver you set up is not interfering with this test. For example, if you deleted the redaction from
retire_user.py, would this test still pass because the signal is taking care of the redaction for you? One way to to fix this would be to disconnect that signal in setUpClass (with an appropriate comment) and to re-connect it in tearDownClass. An alternative is to mock logging and ensure that there is no log.warn from the signal (about redacting). You can test that these assertions work by temporarily removing the redaction you are testing.
There was a problem hiding this comment.
switched to CaptureQueriesContext for UPDATE-before-DELETE assertions, and disconnected the safety-net pre_delete signal handler around both tests so they'd fail if retire_user itself stopped redacting.
3f3977a to
667de73
Compare
7fc7ec0 to
ebb2f96
Compare
2fa49b0 to
2af3cb4
Compare
373d581 to
9a8ba84
Compare
|
|
||
|
|
||
| @skip_unless_lms | ||
| class RedactAndDeleteSocialAuthTest(TestCase): |
There was a problem hiding this comment.
Since this test module already uses ddt, consider parameterizing the multiple SSO provider scenarios in test_redact_and_delete_redacts_multiple_sso_providers using @ddt.data / @ddt.unpack to reduce repetitive setup and make it easier to extend with additional providers later.
| """ | ||
| social_auth_queryset = UserSocialAuth.objects.filter(user_id=user_id) | ||
| # Important: this redacted uid must match the format used by ``get_redacted_social_auth_uid()``. | ||
| social_auth_queryset.update( |
There was a problem hiding this comment.
Consider wrapping the update + delete operations in transaction.atomic() so the redaction and deletion happen atomically. This would avoid partial completion scenarios where the UPDATE succeeds but DELETE fails.
There was a problem hiding this comment.
No need for this we did the same when did for user_retirement_Status table transition atomic thing is already present
Description
Implements automatic PII redaction for UserSocialAuth records before deletion to prevent personally identifiable information from persisting after records are removed.
Jira Ticket
https://2u-internal.atlassian.net/browse/BOMS-514