Skip to content

Commit 7388593

Browse files
authored
FIX: False positive qmark detection for ? inside bracketed identifiers, string literals and comments (#465)
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below For external contributors: Insert Github Issue number below Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#42937](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/42937) <!-- External contributors: GitHub Issue --> > GitHub Issue: #464 ------------------------------------------------------------------- ### Summary This pull request introduces a robust fix for detecting real parameter placeholders in SQL statements, specifically addressing false positives caused by question marks inside bracketed identifiers, string literals, quoted identifiers, and comments. The changes add context-aware scanning logic and comprehensive tests, ensuring that only actual parameter placeholders are flagged and handled. This resolves a bug where SQL containing `?` inside brackets (e.g., `[q?marks]`) would incorrectly trigger parameter mismatch errors. ### Core logic improvements * Added `_skip_quoted_context` helper in `parameter_helper.py` to accurately skip over single-line comments, multi-line comments, single-quoted string literals (with escaped quotes), double-quoted identifiers, and bracketed identifiers when scanning SQL for placeholders. * Added `_has_unquoted_question_marks` function to detect real `?` placeholders only outside quoted contexts, using the new context-skipping logic. * Updated `detect_and_convert_parameters` to use `_has_unquoted_question_marks` for parameter style mismatch detection, preventing false positives when `?` appears inside bracketed identifiers or other quoted contexts. ### Testing improvements * Added extensive unit tests for `_skip_quoted_context` and `_has_unquoted_question_marks`, covering all relevant SQL quoting and commenting scenarios, including edge cases like unclosed quotes/brackets. * Added integration tests verifying that SQL with `?` inside bracketed identifiers, string literals, and comments does not trigger parameter style mismatch errors, and that real placeholders are still detected correctly. ### Test harness updates * Imported the new helper functions in the test module to facilitate direct testing.
1 parent 95eef16 commit 7388593

2 files changed

Lines changed: 649 additions & 14 deletions

File tree

mssql_python/parameter_helper.py

Lines changed: 145 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
Parameter style conversion helpers for mssql-python.
66
77
Supports both qmark (?) and pyformat (%(name)s) parameter styles.
8-
Simple character scanning approach - does NOT parse SQL contexts.
8+
Includes context-aware scanning for qmark and pyformat detection,
9+
skipping characters inside bracketed identifiers, string literals,
10+
quoted identifiers, and SQL comments.
911
1012
Reference: https://www.python.org/dev/peps/pep-0249/#paramstyle
1113
"""
@@ -18,13 +20,143 @@
1820
_ESCAPED_PERCENT_MARKER = "__MSSQL_PYFORMAT_ESCAPED_PERCENT_PLACEHOLDER__"
1921

2022

23+
def _skip_quoted_context(sql: str, i: int, length: int) -> int:
24+
"""
25+
If position i starts a SQL quoted context, skip past it and return the new position.
26+
Returns -1 if no quoted context starts at position i.
27+
28+
Handles:
29+
- Single-line comments: -- ... (to end of line)
30+
- Multi-line comments: /* ... */ (to closing delimiter)
31+
- Single-quoted string literals: '...' (with '' escape handling)
32+
- Double-quoted identifiers: "..."
33+
- Bracketed identifiers: [...]
34+
35+
Args:
36+
sql: Full SQL query string
37+
i: Current scan position
38+
length: Length of sql (len(sql))
39+
40+
Returns:
41+
New position after the quoted context, or -1 if position i
42+
does not start a quoted context.
43+
"""
44+
ch = sql[i]
45+
46+
# Single-line comment: skip to end of line
47+
if ch == "-" and i + 1 < length and sql[i + 1] == "-":
48+
i += 2
49+
while i < length and sql[i] != "\n":
50+
i += 1
51+
return i
52+
53+
# Multi-line comment: skip to closing */
54+
# SQL Server supports nested block comments, so we track nesting depth.
55+
if ch == "/" and i + 1 < length and sql[i + 1] == "*":
56+
i += 2
57+
depth = 1
58+
while i < length and depth > 0:
59+
if i + 1 < length and sql[i] == "/" and sql[i + 1] == "*":
60+
depth += 1
61+
i += 2
62+
elif i + 1 < length and sql[i] == "*" and sql[i + 1] == "/":
63+
depth -= 1
64+
i += 2
65+
else:
66+
i += 1
67+
return min(i, length) # already past final */, or at end if unterminated
68+
69+
# Single-quoted string literal: skip to closing '
70+
# Handles escaped quotes ('') inside strings
71+
if ch == "'":
72+
i += 1
73+
while i < length:
74+
if sql[i] == "'":
75+
if i + 1 < length and sql[i + 1] == "'":
76+
i += 2 # skip escaped quote
77+
continue
78+
break
79+
i += 1
80+
return min(i + 1, length) # skip closing quote
81+
82+
# Double-quoted identifier: skip to closing "
83+
# Handles escaped quotes ("") inside identifiers
84+
if ch == '"':
85+
i += 1
86+
while i < length:
87+
if sql[i] == '"':
88+
if i + 1 < length and sql[i + 1] == '"':
89+
i += 2 # skip escaped quote
90+
continue
91+
break
92+
i += 1
93+
return min(i + 1, length) # skip closing quote
94+
95+
# Bracketed identifier: skip to closing ]
96+
# Handles escaped brackets (]]) inside identifiers
97+
if ch == "[":
98+
i += 1
99+
while i < length:
100+
if sql[i] == "]":
101+
if i + 1 < length and sql[i + 1] == "]":
102+
i += 2 # skip escaped bracket
103+
continue
104+
break
105+
i += 1
106+
return min(i + 1, length) # skip closing bracket
107+
108+
return -1
109+
110+
111+
def _has_unquoted_question_marks(sql: str) -> bool:
112+
"""
113+
Check if SQL contains ? characters that are actual qmark parameter placeholders.
114+
115+
Uses _skip_quoted_context to skip ? characters that appear inside
116+
bracketed identifiers, string literals, quoted identifiers, and comments.
117+
118+
Args:
119+
sql: SQL query string to check
120+
121+
Returns:
122+
True if the SQL contains at least one unquoted/unbracketed ? character
123+
124+
Examples:
125+
>>> _has_unquoted_question_marks("SELECT * FROM t WHERE id = ?")
126+
True
127+
128+
>>> _has_unquoted_question_marks("SELECT [q?marks] FROM t")
129+
False
130+
131+
>>> _has_unquoted_question_marks("SELECT 'what?' FROM t")
132+
False
133+
"""
134+
i = 0
135+
length = len(sql)
136+
137+
while i < length:
138+
# Skip any quoted context (brackets, strings, comments)
139+
skipped = _skip_quoted_context(sql, i, length)
140+
if skipped >= 0:
141+
i = skipped
142+
continue
143+
144+
# Unquoted question mark — this is a real placeholder
145+
if sql[i] == "?":
146+
return True
147+
148+
i += 1
149+
150+
return False
151+
152+
21153
def parse_pyformat_params(sql: str) -> List[str]:
22154
"""
23155
Extract %(name)s parameter names from SQL string.
24156
25-
Uses simple character scanning approach - does NOT parse SQL contexts
26-
(strings, comments, identifiers). This means %(name)s patterns inside SQL
27-
string literals or comments WILL be detected as parameters.
157+
Uses context-aware scanning to skip %(name)s patterns inside SQL
158+
string literals, quoted identifiers, bracketed identifiers, and comments.
159+
Only %(name)s patterns in executable SQL are detected as parameters.
28160
29161
Args:
30162
sql: SQL query string with %(name)s placeholders
@@ -52,6 +184,12 @@ def parse_pyformat_params(sql: str) -> List[str]:
52184
length = len(sql)
53185

54186
while i < length:
187+
# Skip any quoted context (brackets, strings, comments)
188+
skipped = _skip_quoted_context(sql, i, length)
189+
if skipped >= 0:
190+
i = skipped
191+
continue
192+
55193
# Look for %(
56194
if i + 2 < length and sql[i] == "%" and sql[i + 1] == "(":
57195
# Find the closing )
@@ -317,7 +455,9 @@ def detect_and_convert_parameters(
317455
)
318456

319457
# Check if SQL appears to have qmark placeholders
320-
if "?" in sql and not parse_pyformat_params(sql):
458+
# Fast short-circuit: skip the O(n) context-aware scan if no ? exists at all
459+
# Then use context-aware check that ignores ? inside brackets, quotes, and comments
460+
if "?" in sql and _has_unquoted_question_marks(sql) and not parse_pyformat_params(sql):
321461
logger.error(
322462
"detect_and_convert_parameters: Parameter style mismatch - SQL has ? placeholders but received dict"
323463
)

0 commit comments

Comments
 (0)