REFACTOR: Migrate Connection string sanitization from regex to parser-based#522
REFACTOR: Migrate Connection string sanitization from regex to parser-based#522bewithgaurav wants to merge 4 commits intomainfrom
Conversation
…-based approach - move sanitize_connection_string() to connection_string_parser.py using _ConnectionStringParser for correct ODBC braced-value handling - helpers.py retains thin delegate for backward compatibility - connection.py imports directly from connection_string_parser - on parse failure, redact entire string instead of regex fallback - add 5 new tests for braced values, escaped braces, and edge cases
9a9c4c7 to
9916cb1
Compare
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.8%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.2%
mssql_python.pybind.connection.connection.cpp: 75.3%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.3%🔗 Quick Links
|
There was a problem hiding this comment.
Pull request overview
Refactors connection string sanitization to use the existing ODBC connection string parser/builder instead of a regex, improving correctness for braced values and escaped braces per MS-ODBCSTR.
Changes:
- Added a parser-based
sanitize_connection_string()implementation inconnection_string_parser.pywith full redaction on parse failure. - Kept
helpers.sanitize_connection_string()as a thin delegate for backwards compatibility and adjustedconnection.pyto import the new implementation directly. - Expanded logging sanitization tests to cover braced values, escaped braces, end-of-string PWD, and malformed strings.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/test_007_logging.py |
Adds/updates unit tests for password masking (including braced and malformed cases). |
mssql_python/helpers.py |
Replaces regex sanitizer with a delegation wrapper to the parser-based implementation. |
mssql_python/connection_string_parser.py |
Introduces the new parser-based sanitizer and sensitive key list. |
mssql_python/connection.py |
Updates imports to use the new sanitizer directly for connection logging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for key, value in params.items(): | ||
| canonical = _ConnectionStringParser.normalize_key(key) | ||
| display_key = canonical if canonical else key | ||
| if key in _SENSITIVE_KEYS: | ||
| sanitized_params[display_key] = "***" | ||
| else: | ||
| sanitized_params[display_key] = value |
There was a problem hiding this comment.
sanitize_connection_string() claims to mask both PWD and Password, but when the input uses the key "Password" the parser lowercases it to "password" and normalize_key() returns None (since "password" isn’t in _ALLOWED_CONNECTION_STRING_PARAMS). This causes the sanitized output key to be "password=", which won’t satisfy existing expectations that look for "Password=" (e.g., logging integration tests). Consider adding a canonical mapping for "password" (or special-casing it here) so the emitted key is consistently "Password" or "PWD".
| def test_no_pwd_unchanged(self, cleanup_logger): | ||
| """Connection string without PWD should be returned intact.""" | ||
| from mssql_python.helpers import sanitize_connection_string | ||
|
|
||
| conn_str = "Server=localhost;Database=test;UID=user" | ||
| sanitized = sanitize_connection_string(conn_str) | ||
|
|
||
| assert "Server=" in sanitized | ||
| assert "Database=" in sanitized | ||
| assert "UID=" in sanitized |
There was a problem hiding this comment.
The docstring says the connection string without PWD should be returned "intact", but this sanitizer rebuilds via _ConnectionStringBuilder which sorts keys and may change formatting/casing even when no secrets are present. Either tighten the assertion to actually verify equality (if that’s intended behavior) or update the test/docstring to reflect that only sensitive values are masked and the string may be reformatted.
| def test_no_pwd_unchanged(self, cleanup_logger): | |
| """Connection string without PWD should be returned intact.""" | |
| from mssql_python.helpers import sanitize_connection_string | |
| conn_str = "Server=localhost;Database=test;UID=user" | |
| sanitized = sanitize_connection_string(conn_str) | |
| assert "Server=" in sanitized | |
| assert "Database=" in sanitized | |
| assert "UID=" in sanitized | |
| def test_no_pwd_preserves_non_sensitive_fields(self, cleanup_logger): | |
| """Connection string without PWD should preserve non-sensitive fields, even if reformatted.""" | |
| from mssql_python.helpers import sanitize_connection_string | |
| conn_str = "Server=localhost;Database=test;UID=user" | |
| sanitized = sanitize_connection_string(conn_str) | |
| assert "Server=localhost" in sanitized | |
| assert "Database=test" in sanitized | |
| assert "UID=user" in sanitized | |
| assert "PWD=***" not in sanitized | |
| assert "redacted" not in sanitized.lower() |
Work Item / Issue Reference
Summary
Replaces the regex-based
sanitize_connection_string()with a parser-based implementationthat uses
_ConnectionStringParserto correctly handle all ODBC connection string valueformats including braced values per ODBC spec.
Changes
sanitize_connection_string()toconnection_string_parser.pywhere it naturallybelongs alongside the parser it depends on — eliminates circular import between helpers
and parser modules
helpers.pyretains a thin delegate for backward compatibilityconnection.pyimports directly fromconnection_string_parserTesting
TestPasswordSanitizationtests pass