Commit 4809dca
Fix array-like pd.notna crash, normalize NumPy scalars, add update/delete input validation, export DataFrameOperations (#145)
Addresses four unresolved review comments from PR #98 against the
`client.dataframe` namespace: a crash on array-valued cells, silent
NumPy serialization failures, missing ID validation in `update()` and
`delete()`, and missing exports/tests.
## `utils/_pandas.py`
- **Fix `pd.notna()` crash on array-like cells**: Guard with
`pd.api.types.is_scalar(v)` before calling `pd.notna()`; non-scalar
values (lists, dicts, numpy arrays) pass through directly. Previously
raised `ValueError: The truth value of an array is ambiguous`.
- **Normalize NumPy scalar types**: New `_normalize_scalar(v)` helper
converts `np.integer` → `int`, `np.floating` → `float`, `np.bool_` →
`bool`, `pd.Timestamp` → ISO string. DataFrames with integer columns
produce `np.int64` by default, which `json.dumps()` cannot serialize.
```python
# Before: would crash or produce non-serializable values
df = pd.DataFrame([{"tags": ["a", "b"]}, {"count": np.int64(5)}])
dataframe_to_records(df) # ValueError / TypeError at serialization time
# After: safe
[{"tags": ["a", "b"]}, {"count": 5}]
```
## `operations/dataframe.py`
- **`update()` — validate `id_column` values**: After extracting IDs,
raises `ValueError` listing offending row indices if any value is not a
non-empty string (catches `NaN`, `None`, numeric IDs).
- **`update()` — validate non-empty change columns**: Raises
`ValueError` if the DataFrame contains only the `id_column` and no
fields to update.
- **`delete()` — validate `ids` Series**: Returns `None` immediately for
an empty Series; raises `ValueError` listing offending indices for any
non-string or blank value.
## `operations/__init__.py`
- Exports `DataFrameOperations` so consumers can use it for type
annotations.
## Tests
- `tests/unit/test_pandas_helpers.py` — 11 isolated tests for
`dataframe_to_records()` covering NaN handling, NumPy type
normalization, Timestamp conversion, list/dict passthrough, and empty
input.
- `tests/unit/test_dataframe_operations.py` — 35 tests covering the full
`DataFrameOperations` namespace, including all new validation paths.
<!-- START COPILOT ORIGINAL PROMPT -->
<details>
<summary>Original prompt</summary>
## Context
This PR addresses unresolved review comments from PR #98 ("add dataframe
methods") and adds comprehensive test coverage for the DataFrame
operations namespace (`client.dataframe`).
The base branch is `users/zhaodongwang/dataFrameExtensionClaude` which
contains the current state of the DataFrame operations code from PR #98.
## Files to modify
### 1. `src/PowerPlatform/Dataverse/utils/_pandas.py`
Current code at the HEAD of the PR branch
(`8838bb69533dd8830bac8724c44696771a6704e7`):
```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""Internal pandas helpers"""
from __future__ import annotations
from typing import Any, Dict, List
import pandas as pd
def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]:
"""Convert a DataFrame to a list of dicts, converting Timestamps to ISO strings.
:param df: Input DataFrame.
:param na_as_null: When False (default), missing values are omitted from each dict.
When True, missing values are included as None (sends null to Dataverse, clearing the field).
"""
records = []
for row in df.to_dict(orient="records"):
clean = {}
for k, v in row.items():
if pd.notna(v):
clean[k] = v.isoformat() if isinstance(v, pd.Timestamp) else v
elif na_as_null:
clean[k] = None
records.append(clean)
return records
```
**Required changes:**
#### Fix A: `pd.notna()` crash on array-like values (unresolved comment
#98 (comment))
`pd.notna(v)` raises `ValueError: The truth value of an array is
ambiguous` when a cell contains a list, dict, numpy array, etc. Fix by
guarding with `pd.api.types.is_scalar(v)`:
```python
for k, v in row.items():
if pd.api.types.is_scalar(v):
if pd.notna(v):
clean[k] = _normalize_scalar(v)
elif na_as_null:
clean[k] = None
else:
clean[k] = v # pass through lists, dicts, etc.
```
#### Fix B: NumPy scalar types not normalized (acknowledged but deferred
by author in
#98 (comment))
NumPy scalars (`np.int64`, `np.float64`, `np.bool_`) are NOT
JSON-serializable by default `json.dumps()`. DataFrames with integer
columns produce `np.int64` values. Add a helper function
`_normalize_scalar(v)` that:
- Converts `pd.Timestamp` to `.isoformat()`
- Converts `numpy.integer` types to Python `int`
- Converts `numpy.floating` types to Python `float`
- Converts `numpy.bool_` to Python `bool`
- Passes everything else through
Use `import numpy as np` and `isinstance` checks.
### 2. `src/PowerPlatform/Dataverse/operations/dataframe.py`
Current code at the HEAD of the PR branch:
```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""DataFrame CRUD operations namespace for the Dataverse SDK."""
from __future__ import annotations
from typing import List, Optional, TYPE_CHECKING
import pandas as pd
from ..utils._pandas import dataframe_to_records
if TYPE_CHECKING:
from ..client import DataverseClient
__all__ = ["DataFrameOperations"]
class DataFrameOperations:
"""Namespace for pandas DataFrame CRUD operations.
...
"""
def __init__(self, client: DataverseClient) -> None:
self._client = client
def get(self, table, record_id=None, select=None, filter=None, orderby=None, top=None, expand=None, page_size=None) -> pd.DataFrame:
# ... (current code)
pass
def create(self, table, records) -> pd.Series:
# ... (current code with empty DataFrame check and ID count validation)
pass
def update(self, table, changes, id_column, clear_nulls=False) -> None:
if not isinstance(changes, pd.DataFrame):
raise TypeError("changes must be a pandas DataFrame")
if id_column not in changes.columns:
raise ValueError(f"id_column '{id_column}' not found in DataFrame columns")
ids = changes[id_column].tolist()
change_columns = [column for column in changes.columns if column != id_column]
change_list = dataframe_to_records(changes[change_columns], na_as_null=clear_nulls)
if len(ids) == 1:
self._client.records.update(table, ids[0], change_list[0])
else:
self._client.records.update(table, ids, change_list)
def delete(self, table, ids, use_bulk_delete=True) -> Optional[str]:
if not isinstance(ids, pd.Series):
raise TypeError("ids must be a pandas Series")
id_list = ids.tolist()
if len(id_list) == 1:
return self._client.records.delete(table, id_list[0])
else:
return self._client.records.delete(table, id_list, use_bulk_delete=use_bulk_delete)
```
**Required changes:**
#### Fix C: Validate `id_column` values in...
</details>
<!-- START COPILOT CODING AGENT SUFFIX -->
*This pull request was created from Copilot chat.*
>
<!-- START COPILOT CODING AGENT TIPS -->
---
✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/PowerPlatform-DataverseClient-Python/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>1 parent 8838bb6 commit 4809dca
5 files changed
Lines changed: 528 additions & 6 deletions
File tree
- src/PowerPlatform/Dataverse
- operations
- utils
- tests/unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
236 | 236 | | |
237 | 237 | | |
238 | 238 | | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
239 | 246 | | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
240 | 251 | | |
241 | 252 | | |
242 | 253 | | |
| |||
279 | 290 | | |
280 | 291 | | |
281 | 292 | | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
282 | 302 | | |
283 | 303 | | |
284 | 304 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
13 | 31 | | |
14 | | - | |
| 32 | + | |
15 | 33 | | |
16 | 34 | | |
17 | 35 | | |
| |||
21 | 39 | | |
22 | 40 | | |
23 | 41 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
28 | 49 | | |
29 | 50 | | |
0 commit comments