Skip to content

REFACTOR: Migrate Connection string sanitization from regex to parser-based#522

Open
bewithgaurav wants to merge 4 commits intomainfrom
bewithgaurav/sanitize-connstr-parser
Open

REFACTOR: Migrate Connection string sanitization from regex to parser-based#522
bewithgaurav wants to merge 4 commits intomainfrom
bewithgaurav/sanitize-connstr-parser

Conversation

@bewithgaurav
Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav commented Apr 14, 2026

Work Item / Issue Reference

AB#43979


Summary

Replaces the regex-based sanitize_connection_string() with a parser-based implementation
that uses _ConnectionStringParser to correctly handle all ODBC connection string value
formats including braced values per ODBC spec.

Changes

  • Moved sanitize_connection_string() to connection_string_parser.py where it naturally
    belongs alongside the parser it depends on — eliminates circular import between helpers
    and parser modules
  • helpers.py retains a thin delegate for backward compatibility
  • connection.py imports directly from connection_string_parser
  • Added 5 new test cases covering braced values, escaped braces, and edge cases

Testing

  • All existing TestPasswordSanitization tests pass
  • All connection string parser and allowlist tests pass (no regressions)

…-based approach

- move sanitize_connection_string() to connection_string_parser.py
  using _ConnectionStringParser for correct ODBC braced-value handling
- helpers.py retains thin delegate for backward compatibility
- connection.py imports directly from connection_string_parser
- on parse failure, redact entire string instead of regex fallback
- add 5 new tests for braced values, escaped braces, and edge cases
@bewithgaurav bewithgaurav force-pushed the bewithgaurav/sanitize-connstr-parser branch from 9a9c4c7 to 9916cb1 Compare April 14, 2026 18:57
@github-actions github-actions bot added the pr-size: medium Moderate update size label Apr 14, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 14, 2026

📊 Code Coverage Report

🔥 Diff Coverage

100%


🎯 Overall Coverage

79%


📈 Total Lines Covered: 6658 out of 8422
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/connection.py (100%)
  • mssql_python/connection_string_parser.py (100%)
  • mssql_python/helpers.py (100%)

Summary

  • Total: 21 lines
  • Missing: 0 lines
  • Coverage: 100%

📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.8%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 74.2%
mssql_python.pybind.connection.connection.cpp: 75.3%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.3%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

Comment thread tests/test_007_logging.py Dismissed
Comment thread tests/test_007_logging.py Dismissed
Comment thread tests/test_007_logging.py Dismissed
Comment thread tests/test_007_logging.py Dismissed
Comment thread tests/test_007_logging.py Dismissed
Comment thread tests/test_007_logging.py Dismissed
Comment thread tests/test_007_logging.py Dismissed
Comment thread tests/test_007_logging.py Dismissed
@bewithgaurav bewithgaurav marked this pull request as ready for review April 16, 2026 08:49
Copilot AI review requested due to automatic review settings April 16, 2026 08:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors connection string sanitization to use the existing ODBC connection string parser/builder instead of a regex, improving correctness for braced values and escaped braces per MS-ODBCSTR.

Changes:

  • Added a parser-based sanitize_connection_string() implementation in connection_string_parser.py with full redaction on parse failure.
  • Kept helpers.sanitize_connection_string() as a thin delegate for backwards compatibility and adjusted connection.py to import the new implementation directly.
  • Expanded logging sanitization tests to cover braced values, escaped braces, end-of-string PWD, and malformed strings.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/test_007_logging.py Adds/updates unit tests for password masking (including braced and malformed cases).
mssql_python/helpers.py Replaces regex sanitizer with a delegation wrapper to the parser-based implementation.
mssql_python/connection_string_parser.py Introduces the new parser-based sanitizer and sensitive key list.
mssql_python/connection.py Updates imports to use the new sanitizer directly for connection logging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +409 to +415
for key, value in params.items():
canonical = _ConnectionStringParser.normalize_key(key)
display_key = canonical if canonical else key
if key in _SENSITIVE_KEYS:
sanitized_params[display_key] = "***"
else:
sanitized_params[display_key] = value
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitize_connection_string() claims to mask both PWD and Password, but when the input uses the key "Password" the parser lowercases it to "password" and normalize_key() returns None (since "password" isn’t in _ALLOWED_CONNECTION_STRING_PARAMS). This causes the sanitized output key to be "password=", which won’t satisfy existing expectations that look for "Password=" (e.g., logging integration tests). Consider adding a canonical mapping for "password" (or special-casing it here) so the emitted key is consistently "Password" or "PWD".

Copilot uses AI. Check for mistakes.
Comment thread tests/test_007_logging.py
Comment on lines +373 to +382
def test_no_pwd_unchanged(self, cleanup_logger):
"""Connection string without PWD should be returned intact."""
from mssql_python.helpers import sanitize_connection_string

conn_str = "Server=localhost;Database=test;UID=user"
sanitized = sanitize_connection_string(conn_str)

assert "Server=" in sanitized
assert "Database=" in sanitized
assert "UID=" in sanitized
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says the connection string without PWD should be returned "intact", but this sanitizer rebuilds via _ConnectionStringBuilder which sorts keys and may change formatting/casing even when no secrets are present. Either tighten the assertion to actually verify equality (if that’s intended behavior) or update the test/docstring to reflect that only sensitive values are masked and the string may be reformatted.

Suggested change
def test_no_pwd_unchanged(self, cleanup_logger):
"""Connection string without PWD should be returned intact."""
from mssql_python.helpers import sanitize_connection_string
conn_str = "Server=localhost;Database=test;UID=user"
sanitized = sanitize_connection_string(conn_str)
assert "Server=" in sanitized
assert "Database=" in sanitized
assert "UID=" in sanitized
def test_no_pwd_preserves_non_sensitive_fields(self, cleanup_logger):
"""Connection string without PWD should preserve non-sensitive fields, even if reformatted."""
from mssql_python.helpers import sanitize_connection_string
conn_str = "Server=localhost;Database=test;UID=user"
sanitized = sanitize_connection_string(conn_str)
assert "Server=localhost" in sanitized
assert "Database=test" in sanitized
assert "UID=user" in sanitized
assert "PWD=***" not in sanitized
assert "redacted" not in sanitized.lower()

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants