Skip to content

Commit 4809dca

Browse files
Copilotsaurabhrb
andauthored
Fix array-like pd.notna crash, normalize NumPy scalars, add update/delete input validation, export DataFrameOperations (#145)
Addresses four unresolved review comments from PR #98 against the `client.dataframe` namespace: a crash on array-valued cells, silent NumPy serialization failures, missing ID validation in `update()` and `delete()`, and missing exports/tests. ## `utils/_pandas.py` - **Fix `pd.notna()` crash on array-like cells**: Guard with `pd.api.types.is_scalar(v)` before calling `pd.notna()`; non-scalar values (lists, dicts, numpy arrays) pass through directly. Previously raised `ValueError: The truth value of an array is ambiguous`. - **Normalize NumPy scalar types**: New `_normalize_scalar(v)` helper converts `np.integer` → `int`, `np.floating` → `float`, `np.bool_` → `bool`, `pd.Timestamp` → ISO string. DataFrames with integer columns produce `np.int64` by default, which `json.dumps()` cannot serialize. ```python # Before: would crash or produce non-serializable values df = pd.DataFrame([{"tags": ["a", "b"]}, {"count": np.int64(5)}]) dataframe_to_records(df) # ValueError / TypeError at serialization time # After: safe [{"tags": ["a", "b"]}, {"count": 5}] ``` ## `operations/dataframe.py` - **`update()` — validate `id_column` values**: After extracting IDs, raises `ValueError` listing offending row indices if any value is not a non-empty string (catches `NaN`, `None`, numeric IDs). - **`update()` — validate non-empty change columns**: Raises `ValueError` if the DataFrame contains only the `id_column` and no fields to update. - **`delete()` — validate `ids` Series**: Returns `None` immediately for an empty Series; raises `ValueError` listing offending indices for any non-string or blank value. ## `operations/__init__.py` - Exports `DataFrameOperations` so consumers can use it for type annotations. ## Tests - `tests/unit/test_pandas_helpers.py` — 11 isolated tests for `dataframe_to_records()` covering NaN handling, NumPy type normalization, Timestamp conversion, list/dict passthrough, and empty input. - `tests/unit/test_dataframe_operations.py` — 35 tests covering the full `DataFrameOperations` namespace, including all new validation paths. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ## Context This PR addresses unresolved review comments from PR #98 ("add dataframe methods") and adds comprehensive test coverage for the DataFrame operations namespace (`client.dataframe`). The base branch is `users/zhaodongwang/dataFrameExtensionClaude` which contains the current state of the DataFrame operations code from PR #98. ## Files to modify ### 1. `src/PowerPlatform/Dataverse/utils/_pandas.py` Current code at the HEAD of the PR branch (`8838bb69533dd8830bac8724c44696771a6704e7`): ```python # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. """Internal pandas helpers""" from __future__ import annotations from typing import Any, Dict, List import pandas as pd def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]: """Convert a DataFrame to a list of dicts, converting Timestamps to ISO strings. :param df: Input DataFrame. :param na_as_null: When False (default), missing values are omitted from each dict. When True, missing values are included as None (sends null to Dataverse, clearing the field). """ records = [] for row in df.to_dict(orient="records"): clean = {} for k, v in row.items(): if pd.notna(v): clean[k] = v.isoformat() if isinstance(v, pd.Timestamp) else v elif na_as_null: clean[k] = None records.append(clean) return records ``` **Required changes:** #### Fix A: `pd.notna()` crash on array-like values (unresolved comment #98 (comment)) `pd.notna(v)` raises `ValueError: The truth value of an array is ambiguous` when a cell contains a list, dict, numpy array, etc. Fix by guarding with `pd.api.types.is_scalar(v)`: ```python for k, v in row.items(): if pd.api.types.is_scalar(v): if pd.notna(v): clean[k] = _normalize_scalar(v) elif na_as_null: clean[k] = None else: clean[k] = v # pass through lists, dicts, etc. ``` #### Fix B: NumPy scalar types not normalized (acknowledged but deferred by author in #98 (comment)) NumPy scalars (`np.int64`, `np.float64`, `np.bool_`) are NOT JSON-serializable by default `json.dumps()`. DataFrames with integer columns produce `np.int64` values. Add a helper function `_normalize_scalar(v)` that: - Converts `pd.Timestamp` to `.isoformat()` - Converts `numpy.integer` types to Python `int` - Converts `numpy.floating` types to Python `float` - Converts `numpy.bool_` to Python `bool` - Passes everything else through Use `import numpy as np` and `isinstance` checks. ### 2. `src/PowerPlatform/Dataverse/operations/dataframe.py` Current code at the HEAD of the PR branch: ```python # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. """DataFrame CRUD operations namespace for the Dataverse SDK.""" from __future__ import annotations from typing import List, Optional, TYPE_CHECKING import pandas as pd from ..utils._pandas import dataframe_to_records if TYPE_CHECKING: from ..client import DataverseClient __all__ = ["DataFrameOperations"] class DataFrameOperations: """Namespace for pandas DataFrame CRUD operations. ... """ def __init__(self, client: DataverseClient) -> None: self._client = client def get(self, table, record_id=None, select=None, filter=None, orderby=None, top=None, expand=None, page_size=None) -> pd.DataFrame: # ... (current code) pass def create(self, table, records) -> pd.Series: # ... (current code with empty DataFrame check and ID count validation) pass def update(self, table, changes, id_column, clear_nulls=False) -> None: if not isinstance(changes, pd.DataFrame): raise TypeError("changes must be a pandas DataFrame") if id_column not in changes.columns: raise ValueError(f"id_column '{id_column}' not found in DataFrame columns") ids = changes[id_column].tolist() change_columns = [column for column in changes.columns if column != id_column] change_list = dataframe_to_records(changes[change_columns], na_as_null=clear_nulls) if len(ids) == 1: self._client.records.update(table, ids[0], change_list[0]) else: self._client.records.update(table, ids, change_list) def delete(self, table, ids, use_bulk_delete=True) -> Optional[str]: if not isinstance(ids, pd.Series): raise TypeError("ids must be a pandas Series") id_list = ids.tolist() if len(id_list) == 1: return self._client.records.delete(table, id_list[0]) else: return self._client.records.delete(table, id_list, use_bulk_delete=use_bulk_delete) ``` **Required changes:** #### Fix C: Validate `id_column` values in... </details> <!-- START COPILOT CODING AGENT SUFFIX --> *This pull request was created from Copilot chat.* > <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/PowerPlatform-DataverseClient-Python/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
1 parent 8838bb6 commit 4809dca

5 files changed

Lines changed: 528 additions & 6 deletions

File tree

src/PowerPlatform/Dataverse/operations/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,6 @@
88
SDK operations into logical groups: records, query, and tables.
99
"""
1010

11-
__all__ = []
11+
from .dataframe import DataFrameOperations
12+
13+
__all__ = ["DataFrameOperations"]

src/PowerPlatform/Dataverse/operations/dataframe.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,18 @@ def update(
236236
raise ValueError(f"id_column '{id_column}' not found in DataFrame columns")
237237

238238
ids = changes[id_column].tolist()
239+
invalid = [i for i, v in enumerate(ids) if not isinstance(v, str) or not v.strip()]
240+
if invalid:
241+
raise ValueError(
242+
f"id_column '{id_column}' contains invalid values at row index(es) {invalid}. "
243+
"All IDs must be non-empty strings."
244+
)
245+
239246
change_columns = [column for column in changes.columns if column != id_column]
247+
if not change_columns:
248+
raise ValueError(
249+
"No columns to update. The DataFrame must contain at least one column " "besides the id_column."
250+
)
240251
change_list = dataframe_to_records(changes[change_columns], na_as_null=clear_nulls)
241252

242253
if len(ids) == 1:
@@ -279,6 +290,15 @@ def delete(
279290
raise TypeError("ids must be a pandas Series")
280291

281292
id_list = ids.tolist()
293+
if not id_list:
294+
return None
295+
296+
invalid = [i for i, v in enumerate(id_list) if not isinstance(v, str) or not v.strip()]
297+
if invalid:
298+
raise ValueError(
299+
f"ids Series contains invalid values at index(es) {invalid}. " "All IDs must be non-empty strings."
300+
)
301+
282302
if len(id_list) == 1:
283303
return self._client.records.delete(table, id_list[0])
284304
else:

src/PowerPlatform/Dataverse/utils/_pandas.py

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,29 @@
77

88
from typing import Any, Dict, List
99

10+
import numpy as np
1011
import pandas as pd
1112

1213

14+
def _normalize_scalar(v: Any) -> Any:
15+
"""Convert numpy scalar types to their Python native equivalents.
16+
17+
:param v: A scalar value to normalize.
18+
:return: The value converted to a JSON-serializable Python type.
19+
"""
20+
if isinstance(v, pd.Timestamp):
21+
return v.isoformat()
22+
if isinstance(v, np.integer):
23+
return int(v)
24+
if isinstance(v, np.floating):
25+
return float(v)
26+
if isinstance(v, np.bool_):
27+
return bool(v)
28+
return v
29+
30+
1331
def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]:
14-
"""Convert a DataFrame to a list of dicts, converting Timestamps to ISO strings.
32+
"""Convert a DataFrame to a list of dicts, normalizing values for JSON serialization.
1533
1634
:param df: Input DataFrame.
1735
:param na_as_null: When False (default), missing values are omitted from each dict.
@@ -21,9 +39,12 @@ def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dic
2139
for row in df.to_dict(orient="records"):
2240
clean = {}
2341
for k, v in row.items():
24-
if pd.notna(v):
25-
clean[k] = v.isoformat() if isinstance(v, pd.Timestamp) else v
26-
elif na_as_null:
27-
clean[k] = None
42+
if pd.api.types.is_scalar(v):
43+
if pd.notna(v):
44+
clean[k] = _normalize_scalar(v)
45+
elif na_as_null:
46+
clean[k] = None
47+
else:
48+
clean[k] = v # pass through lists, dicts, arrays, etc.
2849
records.append(clean)
2950
return records

0 commit comments

Comments
 (0)