Skip to content

Commit db7ff4a

Browse files
Use platform ?sql param for SQL SELECT (#14)
* Use platform ?sql param for SQL SELECT * fix sql parsing, add tests --------- Co-authored-by: Tim Pellissier <tpellissier@microsoft.com>
1 parent 28c45f8 commit db7ff4a

9 files changed

Lines changed: 184 additions & 59 deletions

File tree

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
A minimal Python SDK to use Microsoft Dataverse as a database for Azure AI Foundry–style apps.
44

5-
- Read (SQL) — Execute read-only T‑SQL via the McpExecuteSqlQuery Custom API. Returns `list[dict]`.
5+
- Read (SQL) — Execute constrained read-only SQL via the Dataverse Web API `?sql=` parameter. Returns `list[dict]`.
66
- OData CRUD — Thin wrappers over Dataverse Web API (create/get/update/delete).
77
- Bulk create — Pass a list of records to `create(...)` to invoke the bound `CreateMultiple` action; returns `list[str]` of GUIDs. If `@odata.type` is absent the SDK resolves the logical name from metadata (cached).
88
- Bulk update — Call `update_multiple(entity_set, records)` to invoke the bound `UpdateMultiple` action; returns nothing. Each record must include the real primary key attribute (e.g. `accountid`).
@@ -14,7 +14,7 @@ A minimal Python SDK to use Microsoft Dataverse as a database for Azure AI Found
1414
## Features
1515

1616
- Simple `DataverseClient` facade for CRUD, SQL (read-only), and table metadata.
17-
- SQL-over-API: T-SQL routed through Custom API endpoint (no ODBC / TDS driver required).
17+
- SQL-over-API: Constrained SQL (single SELECT with limited WHERE/TOP/ORDER BY) via native Web API `?sql=` parameter.
1818
- Table metadata ops: create simple custom tables with primitive columns (string/int/decimal/float/datetime/bool) and delete them.
1919
- Bulk create via `CreateMultiple` (collection-bound) by passing `list[dict]` to `create(entity_set, payloads)`; returns list of created IDs.
2020
- Bulk update via `UpdateMultiple` by calling `update_multiple(entity_set, records)` with primary key attribute present in each record; returns nothing.
@@ -35,12 +35,12 @@ Create and activate a Python 3.13+ environment, then install dependencies:
3535
python -m pip install -r requirements.txt
3636
```
3737

38-
Direct TDS via ODBC is not used; SQL reads are executed via the Custom API over OData.
38+
Direct TDS via ODBC is not used; SQL reads are executed via the Web API using the `?sql=` query parameter.
3939

4040
## Configuration Notes
4141

4242
- For Web API (OData), tokens target your Dataverse org URL scope: https://yourorg.crm.dynamics.com/.default. The SDK requests this scope from the provided TokenCredential.
43-
- For complete functionalities, please use one of the PREPROD BAP environments, otherwise McpExecuteSqlQuery might not work.
43+
(Preprod environments may surface newest SQL subset capabilities sooner than production.)
4444

4545
### Configuration (DataverseConfig)
4646

@@ -50,7 +50,7 @@ Pass a `DataverseConfig` or rely on sane defaults:
5050
from dataverse_sdk import DataverseClient
5151
from dataverse_sdk.config import DataverseConfig
5252

53-
cfg = DataverseConfig() # defaults: language_code=1033, sql_api_name="McpExecuteSqlQuery"
53+
cfg = DataverseConfig() # defaults: language_code=1033
5454
client = DataverseClient(base_url="https://yourorg.crm.dynamics.com", config=cfg)
5555

5656
# Optional HTTP tunables (timeouts/retries)
@@ -70,7 +70,7 @@ The quickstart demonstrates:
7070
- Creating, reading, updating, and deleting records (OData)
7171
- Bulk create (CreateMultiple) to insert many records in one call
7272
- Retrieve multiple with paging (contrasting `$top` vs `page_size`)
73-
- Executing a read-only SQL query
73+
- Executing a read-only SQL query (Web API `?sql=`)
7474

7575
## Examples
7676

@@ -112,7 +112,7 @@ print({"bulk_update": "ok"})
112112
# Delete
113113
client.delete("accounts", account_id)
114114

115-
# SQL (read-only) via Custom API
115+
# SQL (read-only) via Web API `?sql=`
116116
rows = client.query_sql("SELECT TOP 3 accountid, name FROM account ORDER BY createdon DESC")
117117
for r in rows:
118118
print(r.get("accountid"), r.get("name"))
@@ -271,7 +271,7 @@ Notes:
271271
- Passing a list of payloads to `create` triggers bulk create and returns `list[str]` of IDs.
272272
- Use `get_multiple` for paging through result sets; prefer `select` to limit columns.
273273
- For CRUD methods that take a record id, pass the GUID string (36-char hyphenated). Parentheses around the GUID are accepted but not required.
274-
- SQL is routed through the Custom API named in `DataverseConfig.sql_api_name` (default: `McpExecuteSqlQuery`).
274+
* SQL queries are executed directly against entity set endpoints using the `?sql=` parameter. Supported subset only (single SELECT, optional WHERE/TOP/ORDER BY, alias). Unsupported constructs will be rejected by the service.
275275

276276
### Pandas helpers
277277

examples/quickstart.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -320,15 +320,17 @@ def print_line_summaries(label: str, summaries: list[dict]) -> None:
320320
except Exception as e:
321321
print(f"Bulk update failed: {e}")
322322

323-
# 4) Query records via SQL Custom API
324-
print("Query (SQL via Custom API):")
323+
# 4) Query records via SQL (?sql parameter))
324+
print("Query (SQL via ?sql query parameter):")
325325
try:
326326
import time
327327
pause("Execute SQL Query")
328328

329329
def _run_query():
330-
log_call(f"client.query_sql(\"SELECT TOP 2 * FROM {logical} ORDER BY {attr_prefix}_amount DESC\")")
331-
return client.query_sql(f"SELECT TOP 2 * FROM {logical} ORDER BY {attr_prefix}_amount DESC")
330+
cols = f"{id_key}, {code_key}, {amount_key}, {when_key}"
331+
query = f"SELECT TOP 2 {cols} FROM {logical} ORDER BY {attr_prefix}_amount DESC"
332+
log_call(f"client.query_sql(\"{query}\") (Web API ?sql=)")
333+
return client.query_sql(query)
332334

333335
def _retry_if(ex: Exception) -> bool:
334336
msg = str(ex) if ex else ""
@@ -338,9 +340,9 @@ def _retry_if(ex: Exception) -> bool:
338340
id_key = f"{logical}id"
339341
ids = [r.get(id_key) for r in rows if isinstance(r, dict) and r.get(id_key)]
340342
print({"entity": logical, "rows": len(rows) if isinstance(rows, list) else 0, "ids": ids})
341-
tds_summaries = []
343+
record_summaries = []
342344
for row in rows if isinstance(rows, list) else []:
343-
tds_summaries.append(
345+
record_summaries.append(
344346
{
345347
"id": row.get(id_key),
346348
"code": row.get(code_key),
@@ -349,9 +351,9 @@ def _retry_if(ex: Exception) -> bool:
349351
"when": row.get(when_key),
350352
}
351353
)
352-
print_line_summaries("TDS record summaries (top 2 by amount):", tds_summaries)
354+
print_line_summaries("SQL record summaries (top 2 by amount):", record_summaries)
353355
except Exception as e:
354-
print(f"SQL via Custom API failed: {e}")
356+
print(f"SQL query failed: {e}")
355357

356358
# Pause between SQL query and retrieve-multiple demos
357359
pause("Retrieve multiple (OData paging demos)")

examples/quickstart_pandas.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -183,8 +183,8 @@ def backoff_retry(op, *, delays=(0, 2, 5, 10, 20), retry_http_statuses=(400, 403
183183
print(f"Update/verify failed: {e}")
184184
sys.exit(1)
185185

186-
# 4) Query records via SQL Custom API
187-
print("(Pandas) Query (SQL via Custom API):")
186+
# 4) Query records via SQL (Web API ?sql=)
187+
print("(Pandas) Query (SQL via Web API ?sql=):")
188188
try:
189189
# Try singular logical name first, then plural entity set, with short backoff
190190
import time
@@ -196,7 +196,9 @@ def backoff_retry(op, *, delays=(0, 2, 5, 10, 20), retry_http_statuses=(400, 403
196196
df_rows = None
197197
for name in candidates:
198198
def _run_query():
199-
return PANDAS.query_sql_df(f"SELECT TOP 3 * FROM {name} ORDER BY createdon DESC")
199+
id_key = f"{logical}id"
200+
cols = f"{id_key}, {attr_prefix}_code, {attr_prefix}_amount, {attr_prefix}_when"
201+
return PANDAS.query_sql_df(f"SELECT TOP 3 {cols} FROM {name} ORDER BY {attr_prefix}_amount DESC")
200202
def _retry_if(ex: Exception) -> bool:
201203
msg = str(ex) if ex else ""
202204
return ("Invalid table name" in msg) or ("Invalid object name" in msg)
@@ -211,7 +213,7 @@ def _retry_if(ex: Exception) -> bool:
211213
except SystemExit:
212214
pass
213215
except Exception as e:
214-
print(f"SQL via Custom API failed: {e}")
216+
print(f"SQL query failed: {e}")
215217

216218
# 5) Delete record
217219
print("(Pandas) Delete (OData via Pandas wrapper):")

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ requires = ["setuptools>=61.0"]
33
build-backend = "setuptools.build_meta"
44

55
[project]
6-
name = "dataverse-sdk-poc"
6+
name = "dataverse-python-client"
77
version = "0.1.0"
8-
description = "POC: Dataverse Python SDK with TDS reads and OData CRUD via SQL router"
8+
description = "Dataverse Python client"
99
authors = [{ name = "POC" }]
1010
readme = "README.md"
1111
requires-python = ">=3.10"

src/dataverse_sdk/client.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ class DataverseClient:
1414
1515
This client exposes a simple, stable surface for:
1616
- OData CRUD: create, get, update, delete records
17-
- SQL (read-only): execute T-SQL via Dataverse Custom API (no ODBC/TDS driver)
17+
- SQL (read-only): query SQL via ?sql parameter in Web API
1818
- Table metadata: create, inspect, and delete simple custom tables
1919
2020
The client owns authentication (Azure Identity) and configuration, and delegates
@@ -182,21 +182,25 @@ def get_multiple(
182182
page_size=page_size,
183183
)
184184

185-
# SQL via Custom API
186-
def query_sql(self, tsql: str):
187-
"""Execute a read-only SQL query via the configured Custom API.
185+
# SQL via Web API sql parameter
186+
def query_sql(self, sql: str):
187+
"""Execute a read-only SQL query using the Dataverse Web API `?sql=` capability.
188+
189+
The query must follow the currently supported subset: single SELECT with optional WHERE,
190+
TOP (integer), ORDER BY (columns only), and simple alias after FROM. Example:
191+
``SELECT TOP 3 accountid, name FROM account ORDER BY name DESC``
188192
189193
Parameters
190194
----------
191-
tsql : str
192-
A SELECT-only T-SQL statement (e.g., ``"SELECT TOP 3 * FROM account"``).
195+
sql : str
196+
Supported single SELECT statement.
193197
194198
Returns
195199
-------
196200
list[dict]
197-
Rows as a list of dictionaries.
201+
Result rows (empty list if none).
198202
"""
199-
return self._get_odata().query_sql(tsql)
203+
return self._get_odata().query_sql(sql)
200204

201205
# Table metadata helpers
202206
def get_table_info(self, tablename: str) -> Optional[Dict[str, Any]]:

src/dataverse_sdk/config.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
@dataclass(frozen=True)
88
class DataverseConfig:
99
language_code: int = 1033
10-
sql_api_name: str = "McpExecuteSqlQuery"
1110

1211
# Optional HTTP tuning (not yet wired everywhere; reserved for future use)
1312
http_retries: Optional[int] = None
@@ -19,7 +18,6 @@ def from_env(cls) -> "DataverseConfig":
1918
# Environment-free defaults
2019
return cls(
2120
language_code=1033,
22-
sql_api_name="McpExecuteSqlQuery",
2321
http_retries=None,
2422
http_backoff=None,
2523
http_timeout=None,

src/dataverse_sdk/odata.py

Lines changed: 103 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ def __init__(self, auth, base_url: str, config=None) -> None:
2929
)
3030
# Cache: entity set name -> logical name (resolved via metadata lookup)
3131
self._entityset_logical_cache = {}
32+
# Cache: logical name -> entity set name (reverse lookup for SQL endpoint)
33+
self._logical_to_entityset_cache: dict[str, str] = {}
3234

3335
def _headers(self) -> Dict[str, str]:
3436
"""Build standard OData headers with bearer auth."""
@@ -374,42 +376,120 @@ def _do_request(url: str, *, params: Optional[Dict[str, Any]] = None) -> Dict[st
374376
next_link = data.get("@odata.nextLink") or data.get("odata.nextLink") if isinstance(data, dict) else None
375377

376378
# --------------------------- SQL Custom API -------------------------
377-
def query_sql(self, tsql: str) -> list[dict[str, Any]]:
378-
"""Execute a read-only T-SQL query via the configured Custom API.
379+
def query_sql(self, sql: str) -> list[dict[str, Any]]:
380+
"""Execute a read-only SQL query using the Dataverse Web API `?sql=` capability.
381+
382+
The platform supports a constrained subset of SQL SELECT statements directly on entity set endpoints:
383+
GET /{entity_set}?sql=<encoded select statement>
384+
385+
This client extracts the logical table name from the query, resolves the corresponding
386+
entity set name (cached) and invokes the Web API using the `sql` query parameter.
379387
380388
Parameters
381389
----------
382-
tsql : str
383-
SELECT-style Dataverse-supported T-SQL (read-only).
390+
sql : str
391+
Single SELECT statement within supported subset.
384392
385393
Returns
386394
-------
387395
list[dict]
388-
Rows materialised as list of dictionaries (empty list if no rows).
396+
Result rows (empty list if none).
389397
390398
Raises
391399
------
400+
ValueError
401+
If the SQL is empty or malformed, or if the table logical name cannot be determined.
392402
RuntimeError
393-
If the Custom API response is missing the expected ``queryresult`` property or type is unexpected.
403+
If metadata lookup for the logical name fails.
394404
"""
395-
payload = {"querytext": tsql}
396-
headers = self._headers()
397-
api_name = self.config.sql_api_name
398-
url = f"{self.api}/{api_name}"
399-
r = self._request("post", url, headers=headers, json=payload)
405+
if not isinstance(sql, str) or not sql.strip():
406+
raise ValueError("sql must be a non-empty string")
407+
sql = sql.strip()
408+
409+
# Extract logical table name via helper (robust to identifiers ending with 'from')
410+
logical = self._extract_logical_table(sql)
411+
412+
entity_set = self._entity_set_from_logical(logical)
413+
# Issue GET /{entity_set}?sql=<query>
414+
headers = self._headers().copy()
415+
url = f"{self.api}/{entity_set}"
416+
params = {"sql": sql}
417+
r = self._request("get", url, headers=headers, params=params)
418+
try:
419+
r.raise_for_status()
420+
except Exception as e:
421+
# Attach response snippet to aid debugging unsupported SQL patterns
422+
resp_text = None
423+
try:
424+
resp_text = r.text[:500] if getattr(r, 'text', None) else None
425+
except Exception:
426+
pass
427+
detail = f" SQL query failed (status={getattr(r, 'status_code', '?')}): {resp_text}" if resp_text else ""
428+
raise RuntimeError(str(e) + detail) from e
429+
try:
430+
body = r.json()
431+
except ValueError:
432+
return []
433+
if isinstance(body, dict):
434+
value = body.get("value")
435+
if isinstance(value, list):
436+
# Ensure dict rows only
437+
return [row for row in value if isinstance(row, dict)]
438+
# Fallbacks: if body itself is a list
439+
if isinstance(body, list):
440+
return [row for row in body if isinstance(row, dict)]
441+
return []
442+
443+
@staticmethod
444+
def _extract_logical_table(sql: str) -> str:
445+
"""Extract the logical table name after the first standalone FROM.
446+
447+
Examples:
448+
SELECT * FROM account
449+
SELECT col1, startfrom FROM new_sampleitem WHERE col1 = 1
450+
451+
"""
452+
if not isinstance(sql, str):
453+
raise ValueError("sql must be a string")
454+
# Mask out single-quoted string literals to avoid matching FROM inside them.
455+
masked = re.sub(r"'([^']|'')*'", "'x'", sql)
456+
pattern = r"\bfrom\b\s+([A-Za-z0-9_]+)" # minimal, single-line regex
457+
m = re.search(pattern, masked, flags=re.IGNORECASE)
458+
if not m:
459+
raise ValueError("Unable to determine table logical name from SQL (expected 'FROM <name>').")
460+
return m.group(1).lower()
461+
462+
# ---------------------- Entity set resolution -----------------------
463+
def _entity_set_from_logical(self, logical: str) -> str:
464+
"""Resolve entity set name (plural) from a logical (singular) name using metadata.
465+
466+
Caches results for subsequent SQL queries.
467+
"""
468+
if not logical:
469+
raise ValueError("logical name required")
470+
cached = self._logical_to_entityset_cache.get(logical)
471+
if cached:
472+
return cached
473+
url = f"{self.api}/EntityDefinitions"
474+
logical_escaped = self._escape_odata_quotes(logical)
475+
params = {
476+
"$select": "LogicalName,EntitySetName",
477+
"$filter": f"LogicalName eq '{logical_escaped}'",
478+
}
479+
r = self._request("get", url, headers=self._headers(), params=params)
400480
r.raise_for_status()
401-
data = r.json()
402-
if "queryresult" not in data:
403-
raise RuntimeError(f"{api_name} response missing 'queryresult'.")
404-
q = data["queryresult"]
405-
if q is None:
406-
parsed = []
407-
elif isinstance(q, str):
408-
s = q.strip()
409-
parsed = [] if not s else json.loads(s)
410-
else:
411-
raise RuntimeError(f"Unexpected queryresult type: {type(q)}")
412-
return parsed
481+
try:
482+
body = r.json()
483+
items = body.get("value", []) if isinstance(body, dict) else []
484+
except ValueError:
485+
items = []
486+
if not items:
487+
raise RuntimeError(f"Unable to resolve entity set for logical name '{logical}'.")
488+
es = items[0].get("EntitySetName")
489+
if not es:
490+
raise RuntimeError(f"Metadata response missing EntitySetName for logical '{logical}'.")
491+
self._logical_to_entityset_cache[logical] = es
492+
return es
413493

414494
# ---------------------- Table metadata helpers ----------------------
415495
def _label(self, text: str) -> Dict[str, Any]:

src/dataverse_sdk/odata_pandas_wrappers.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
DataFrame summarizing success/failure.
1515
* get_ids: fetches a set of ids returning a DataFrame of the merged JSON
1616
objects (outer union of keys). Missing keys are NaN.
17-
* query_sql_df: runs a SQL query via Custom API and returns the result rows as
17+
* query_sql_df: runs a SQL query via the Web API `?sql=` parameter and returns the result rows as
1818
a DataFrame (empty DataFrame if no rows).
1919
2020
Edge cases & behaviors:
@@ -139,12 +139,13 @@ def get_ids(self, entity_set: str, ids: Sequence[str] | pd.Series | pd.Index, se
139139
return pd.DataFrame(rows)
140140

141141
# --------------------------- Query SQL -------------------------------
142-
def query_sql_df(self, tsql: str) -> pd.DataFrame:
143-
"""Execute a SQL query via Custom API and return a DataFrame.
142+
def query_sql_df(self, sql: str) -> pd.DataFrame:
143+
"""Execute a SQL query via the Dataverse Web API `?sql=` parameter and return a DataFrame.
144144
145+
The statement must adhere to the supported subset (single SELECT, optional WHERE/TOP/ORDER BY, no joins).
145146
Empty result -> empty DataFrame (columns inferred only if rows present).
146147
"""
147-
rows: Any = self._c.query_sql(tsql)
148+
rows: Any = self._c.query_sql(sql)
148149

149150
# If API returned a JSON string, parse it
150151
if isinstance(rows, str):

0 commit comments

Comments
 (0)