Skip to content

Commit 8e5e427

Browse files
sagebreeSamson Gebreclaude
authored andcommitted
Fix silent data truncation in client.query.sql() — add @odata.nextLink pagination (#157) (#159)
- client.query.sql() silently returned only the first 5,000 rows regardless of result set size. The method now follows @odata.nextLink until all pages are exhausted. - Added _extract_pagingcookie() helper to detect a confirmed server-side bug where the Dataverse SQL endpoint returns successive @odata.nextLink responses with pagenumber incrementing but the pagingcookie GUIDs (keyset cursor) never advancing — causing an infinite pagination loop. The SDK now detects and breaks out of this condition and emits a RuntimeWarning. - Pagination is guarded against three failure modes: exact URL cycles, stuck pagingcookie (server bug), and failed or non-JSON next-page responses. All three emit RuntimeWarning with the partial row count and actionable guidance. --------- Co-authored-by: Samson Gebre <sagebree@microsoft.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 7a96a47 commit 8e5e427

File tree

3 files changed

+463
-10
lines changed

3 files changed

+463
-10
lines changed

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,23 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
- Batch API: `client.batch` namespace for deferred-execution batch operations that pack multiple Dataverse Web API calls into a single `POST $batch` HTTP request (#129)
12+
- Batch DataFrame integration: `client.batch.dataframe` namespace with pandas DataFrame wrappers for batch operations (#129)
13+
- `client.records.upsert()` and `client.batch.records.upsert()` backed by the `UpsertMultiple` bound action with alternate-key support (#129)
14+
- QueryBuilder: `client.query.builder("table")` with a fluent API, 20+ chainable methods (`select`, `filter_eq`, `filter_contains`, `order_by`, `expand`, etc.), and composable filter expressions using Python operators (`&`, `|`, `~`) (#118)
15+
- Memo/multiline column type support: `"memo"` (or `"multiline"`) can now be passed as a column type in `client.tables.create()` and `client.tables.add_columns()` (#155)
16+
17+
### Changed
18+
- Picklist label-to-integer resolution now uses a single bulk `PicklistAttributeMetadata` API call for the entire table instead of per-attribute requests, with a 1-hour TTL cache (#154)
19+
20+
### Fixed
21+
- `client.query.sql()` silently truncated results at 5,000 rows. The method now follows `@odata.nextLink` pagination and returns all matching rows (#157).
22+
- Alternate key fields were incorrectly merged into the `UpsertMultiple` request body, causing `400 Bad Request` on the create path (#129)
23+
- Docstring type annotations corrected for Microsoft Learn API reference compatibility (#153)
24+
825
## [0.1.0b7] - 2026-03-17
926

1027
### Added
@@ -91,6 +108,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
91108
- Comprehensive error handling with specific exception types (`DataverseError`, `AuthenticationError`, etc.) (#22, #24)
92109
- HTTP retry logic with exponential backoff for resilient operations (#72)
93110

111+
[Unreleased]: https://github.com/microsoft/PowerPlatform-DataverseClient-Python/compare/v0.1.0b7...HEAD
94112
[0.1.0b7]: https://github.com/microsoft/PowerPlatform-DataverseClient-Python/compare/v0.1.0b6...v0.1.0b7
95113
[0.1.0b6]: https://github.com/microsoft/PowerPlatform-DataverseClient-Python/compare/v0.1.0b5...v0.1.0b6
96114
[0.1.0b5]: https://github.com/microsoft/PowerPlatform-DataverseClient-Python/compare/v0.1.0b4...v0.1.0b5

src/PowerPlatform/Dataverse/data/_odata.py

Lines changed: 108 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,13 @@
1313
import re
1414
import json
1515
import uuid
16+
import warnings
1617
from datetime import datetime, timezone
1718
import importlib.resources as ir
1819
from contextlib import contextmanager
1920
from contextvars import ContextVar
2021

21-
from urllib.parse import quote as _url_quote
22+
from urllib.parse import quote as _url_quote, parse_qs, urlparse
2223

2324
from ..core._http import _HttpClient
2425
from ._upload import _FileUploadMixin
@@ -55,6 +56,34 @@
5556
_MULTIPLE_BATCH_SIZE = 1000
5657

5758

59+
def _extract_pagingcookie(next_link: str) -> Optional[str]:
60+
"""Extract the raw pagingcookie value from a SQL ``@odata.nextLink`` URL.
61+
62+
The Dataverse SQL endpoint has a server-side bug where the pagingcookie
63+
(containing first/last record GUIDs) does not advance between pages even
64+
though ``pagenumber`` increments. Detecting a repeated cookie lets the
65+
pagination loop break instead of looping indefinitely.
66+
67+
Returns the pagingcookie string if present, or ``None`` if not found.
68+
"""
69+
try:
70+
qs = parse_qs(urlparse(next_link).query)
71+
skiptoken = qs.get("$skiptoken", [None])[0]
72+
if not skiptoken:
73+
return None
74+
# parse_qs already URL-decodes the value once, giving the outer XML with
75+
# pagingcookie still percent-encoded (e.g. pagingcookie="%3ccookie...").
76+
# A second decode is intentionally omitted: decoding again would turn %22
77+
# into " inside the cookie XML, breaking the regex and causing every page
78+
# to extract the same truncated prefix regardless of the actual GUIDs.
79+
m = re.search(r'pagingcookie="([^"]+)"', skiptoken)
80+
if m:
81+
return m.group(1)
82+
except Exception:
83+
pass
84+
return None
85+
86+
5887
@dataclass
5988
class _RequestContext:
6089
"""Structured request context used by ``_request`` to clarify payload and metadata."""
@@ -804,15 +833,86 @@ def _query_sql(self, sql: str) -> list[dict[str, Any]]:
804833
body = r.json()
805834
except ValueError:
806835
return []
807-
if isinstance(body, dict):
808-
value = body.get("value")
809-
if isinstance(value, list):
810-
# Ensure dict rows only
811-
return [row for row in value if isinstance(row, dict)]
812-
# Fallbacks: if body itself is a list
836+
837+
# Collect first page
838+
results: list[dict[str, Any]] = []
813839
if isinstance(body, list):
814840
return [row for row in body if isinstance(row, dict)]
815-
return []
841+
if not isinstance(body, dict):
842+
return results
843+
844+
value = body.get("value")
845+
if isinstance(value, list):
846+
results = [row for row in value if isinstance(row, dict)]
847+
848+
# Follow pagination links until exhausted
849+
raw_link = body.get("@odata.nextLink") or body.get("odata.nextLink")
850+
next_link: str | None = raw_link if isinstance(raw_link, str) else None
851+
visited: set[str] = set()
852+
seen_cookies: set[str] = set()
853+
while next_link:
854+
# Guard 1: exact URL cycle (same next_link returned twice)
855+
if next_link in visited:
856+
warnings.warn(
857+
f"SQL pagination stopped after {len(results)} rows — "
858+
"the Dataverse server returned the same nextLink URL twice, "
859+
"indicating an infinite pagination cycle. "
860+
"Returning the rows collected so far. "
861+
"To avoid pagination entirely, add a TOP clause to your query.",
862+
RuntimeWarning,
863+
stacklevel=4,
864+
)
865+
break
866+
visited.add(next_link)
867+
# Guard 2: server-side bug where pagingcookie does not advance between
868+
# pages (pagenumber increments but cookie GUIDs stay the same), which
869+
# causes an infinite loop even though URLs differ.
870+
cookie = _extract_pagingcookie(next_link)
871+
if cookie is not None:
872+
if cookie in seen_cookies:
873+
warnings.warn(
874+
f"SQL pagination stopped after {len(results)} rows — "
875+
"the Dataverse server returned the same pagingcookie twice "
876+
"(pagenumber incremented but the paging position did not advance). "
877+
"This is a server-side bug. Returning the rows collected so far. "
878+
"To avoid pagination entirely, add a TOP clause to your query.",
879+
RuntimeWarning,
880+
stacklevel=4,
881+
)
882+
break
883+
seen_cookies.add(cookie)
884+
try:
885+
page_resp = self._request("get", next_link)
886+
except Exception as exc:
887+
warnings.warn(
888+
f"SQL pagination stopped after {len(results)} rows — "
889+
f"the next-page request failed: {exc}. "
890+
"Add a TOP clause to your query to limit results to a single page.",
891+
RuntimeWarning,
892+
stacklevel=5,
893+
)
894+
break
895+
try:
896+
page_body = page_resp.json()
897+
except ValueError as exc:
898+
warnings.warn(
899+
f"SQL pagination stopped after {len(results)} rows — "
900+
f"the next-page response was not valid JSON: {exc}. "
901+
"Add a TOP clause to your query to limit results to a single page.",
902+
RuntimeWarning,
903+
stacklevel=5,
904+
)
905+
break
906+
if not isinstance(page_body, dict):
907+
break
908+
page_value = page_body.get("value")
909+
if not isinstance(page_value, list) or not page_value:
910+
break
911+
results.extend(row for row in page_value if isinstance(row, dict))
912+
raw_link = page_body.get("@odata.nextLink") or page_body.get("odata.nextLink")
913+
next_link = raw_link if isinstance(raw_link, str) else None
914+
915+
return results
816916

817917
@staticmethod
818918
def _extract_logical_table(sql: str) -> str:

0 commit comments

Comments
 (0)