Skip to content

Commit 8a6ef8c

Browse files
zhaodongwang-msftSaurabh Badenkal
andauthored
Add client.dataframe namespace for pandas DataFrame CRUD operations (#98)
## Summary Adds a `client.dataframe` namespace with pandas DataFrame/Series wrappers for all CRUD operations, plus two advanced example scripts, and a minor SDK enhancement for table metadata. Users can now query, create, update, and delete Dataverse records using DataFrame-native inputs and outputs -- no manual dict conversion required. ## Quick Example ```python import pandas as pd from azure.identity import InteractiveBrowserCredential from PowerPlatform.Dataverse.client import DataverseClient credential = InteractiveBrowserCredential() with DataverseClient("https://yourorg.crm.dynamics.com", credential) as client: # Query records as a DataFrame (all pages consolidated automatically) df = client.dataframe.get("account", select=["name", "telephone1"], top=5) # Create records from a DataFrame (returns Series of GUIDs) new_records = pd.DataFrame([ {"name": "Acme Corp", "telephone1": "555-9000"}, {"name": "Globex Inc", "telephone1": "555-9001"}, ]) new_records["accountid"] = client.dataframe.create("account", new_records) # Update records (NaN/None skipped by default; use clear_nulls=True to clear fields) new_records["telephone1"] = ["555-1111", "555-2222"] client.dataframe.update("account", new_records[["accountid", "telephone1"]], id_column="accountid") # Delete records client.dataframe.delete("account", new_records["accountid"]) ``` ## Changes ### DataFrame CRUD (`client.dataframe` namespace) | File | Description | |------|-------------| | `src/.../operations/dataframe.py` | `DataFrameOperations` class: `get()`, `create()`, `update()`, `delete()` | | `src/.../utils/_pandas.py` | `dataframe_to_records()` helper -- normalizes NumPy, datetime, NaN/None | | `client.py` | Added `self.dataframe = DataFrameOperations(self)` | | `pyproject.toml` | Added `pandas>=2.0.0` required dependency | | `README.md` | DataFrame usage examples | | `operations/__init__.py` | Cleanup (`__all__ = []`) | ### SDK Enhancement: TableInfo primary column metadata (fixes #148) | File | Description | |------|-------------| | `src/.../data/_odata.py` | `_get_entity_by_table_schema_name()` and `_get_table_info()` now select `PrimaryNameAttribute` and `PrimaryIdAttribute` from EntityDefinitions | | `src/.../models/table_info.py` | `TableInfo` includes `primary_name_attribute` and `primary_id_attribute` fields | | `tests/unit/models/test_table_info.py` | Tests for new fields in `from_dict`, `from_api_response`, and legacy key access | ### Advanced Examples | File | Description | |------|-------------| | `examples/advanced/dataframe_operations.py` | DataFrame CRUD walkthrough | | `examples/advanced/prodev_quick_start.py` | Pro-dev: 4-table system with relationships, DataFrame CRUD, query/analyze. Uses `result.primary_name_attribute` from `tables.create()` | | `examples/advanced/datascience_risk_assessment.py` | Data science: 5-step risk pipeline with 3 LLM provider options (Azure AI Inference, OpenAI, GitHub Copilot SDK), matplotlib charts | ### Test Files | File | Tests | |------|-------| | `test_dataframe_operations.py` | 44 | | `test_client_dataframe.py` | 26 | | `test_pandas_helpers.py` | 33 | | `test_table_info.py` | +1 (primary fields) | ## API Design | Method | Input | Output | Underlying API | |--------|-------|--------|----------------| | `get(table, ...)` | OData params | `pd.DataFrame` | `records.get()` | | `get(table, record_id=...)` | GUID | 1-row `pd.DataFrame` | `records.get()` | | `create(table, df)` | `pd.DataFrame` | `pd.Series` of GUIDs | `CreateMultiple` | | `update(table, df, id_column)` | `pd.DataFrame` | `None` | `UpdateMultiple` | | `delete(table, ids)` | `pd.Series` | `Optional[str]` | `BulkDelete` | ### Design Decisions - **`clear_nulls`**: Default `False` skips NaN (field unchanged). `True` sends null to clear. - **Type normalization**: np.int64/float64/bool_/ndarray, datetime/date/np.datetime64, pd.Timestamp -- all auto-converted. - **ID validation**: Strip whitespace, report DataFrame index labels in errors. - **pandas required**: Core dependency by team decision. ## Test Results ``` 396 passed, 8 warnings (pre-existing deprecation), 4 subtests passed ``` | Check | Result | |-------|--------| | Full test suite | 396 pass, 0 fail | | mypy | 0 errors | | black / isort | Clean | | E2E prodev | PASS (4 tables, 3 relationships, 13 records, full CRUD cycle) | | E2E datascience | PASS (7 accounts, 3 cases, 8 opportunities, risk scoring, charts) | | PR review threads | 54/54 resolved | ## Issues Addressed - Fixes #148: `tables.create()` now exposes `primary_name_attribute` via Dataverse metadata - Followup #147: `QueryBuilder.to_dataframe()` tracked for future work --------- Co-authored-by: Saurabh Badenkal <sbadenkal@microsoft.com>
1 parent c357eff commit 8a6ef8c

File tree

18 files changed

+3205
-6
lines changed

18 files changed

+3205
-6
lines changed

.claude/skills/dataverse-sdk-use/SKILL.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ The SDK supports Dataverse's native bulk operations: Pass lists to `create()`, `
3030
- Control page size with `page_size` parameter
3131
- Use `top` parameter to limit total records returned
3232

33+
### DataFrame Support
34+
- DataFrame operations are accessed via the `client.dataframe` namespace: `client.dataframe.get()`, `client.dataframe.create()`, `client.dataframe.update()`, `client.dataframe.delete()`
35+
3336
## Common Operations
3437

3538
### Import
@@ -129,7 +132,7 @@ client.records.update("account", [id1, id2, id3], {"industry": "Technology"})
129132
```
130133

131134
#### Upsert Records
132-
Creates or updates records identified by alternate keys. Single item PATCH; multiple items `UpsertMultiple` bulk action.
135+
Creates or updates records identified by alternate keys. Single item -> PATCH; multiple items -> `UpsertMultiple` bulk action.
133136
> **Prerequisite**: The table must have an alternate key configured in Dataverse for the columns used in `alternate_key`. Without it, Dataverse will reject the request with a 400 error.
134137
```python
135138
from PowerPlatform.Dataverse.models.upsert import UpsertItem
@@ -171,6 +174,42 @@ client.records.delete("account", account_id)
171174
client.records.delete("account", [id1, id2, id3], use_bulk_delete=True)
172175
```
173176

177+
### DataFrame Operations
178+
179+
The SDK provides DataFrame wrappers for all CRUD operations via the `client.dataframe` namespace, using pandas DataFrames and Series as input/output.
180+
181+
```python
182+
import pandas as pd
183+
184+
# Query records -- returns a single DataFrame
185+
df = client.dataframe.get("account", filter="statecode eq 0", select=["name"])
186+
print(f"Got {len(df)} rows")
187+
188+
# Limit results with top for large tables
189+
df = client.dataframe.get("account", select=["name"], top=100)
190+
191+
# Fetch single record as one-row DataFrame
192+
df = client.dataframe.get("account", record_id=account_id, select=["name"])
193+
194+
# Create records from a DataFrame (returns a Series of GUIDs)
195+
new_accounts = pd.DataFrame([
196+
{"name": "Contoso", "telephone1": "555-0100"},
197+
{"name": "Fabrikam", "telephone1": "555-0200"},
198+
])
199+
new_accounts["accountid"] = client.dataframe.create("account", new_accounts)
200+
201+
# Update records from a DataFrame (id_column identifies the GUID column)
202+
new_accounts["telephone1"] = ["555-0199", "555-0299"]
203+
client.dataframe.update("account", new_accounts, id_column="accountid")
204+
205+
# Clear a field by setting clear_nulls=True (by default, NaN/None fields are skipped)
206+
df = pd.DataFrame([{"accountid": "guid-1", "websiteurl": None}])
207+
client.dataframe.update("account", df, id_column="accountid", clear_nulls=True)
208+
209+
# Delete records by passing a Series of GUIDs
210+
client.dataframe.delete("account", new_accounts["accountid"])
211+
```
212+
174213
### SQL Queries
175214

176215
SQL queries are **read-only** and support limited SQL syntax. A single SELECT statement with optional WHERE, TOP (integer literal), ORDER BY (column names only), and a simple table alias after FROM is supported. But JOIN and subqueries may not be. Refer to the Dataverse documentation for the current feature set.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,4 @@ Thumbs.db
2525

2626
# Claude local settings
2727
.claude/*.local.json
28+
.claude/*.local.md

README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ A Python client library for Microsoft Dataverse that provides a unified interfac
2424
- [Basic CRUD operations](#basic-crud-operations)
2525
- [Bulk operations](#bulk-operations)
2626
- [Upsert operations](#upsert-operations)
27+
- [DataFrame operations](#dataframe-operations)
2728
- [Query data](#query-data)
2829
- [Table management](#table-management)
2930
- [Relationship management](#relationship-management)
@@ -39,6 +40,7 @@ A Python client library for Microsoft Dataverse that provides a unified interfac
3940
- **📊 SQL Queries**: Execute read-only SQL queries via the Dataverse Web API `?sql=` parameter
4041
- **🏗️ Table Management**: Create, inspect, and delete custom tables and columns programmatically
4142
- **🔗 Relationship Management**: Create one-to-many and many-to-many relationships between tables with full metadata control
43+
- **🐼 DataFrame Support**: Pandas wrappers for all CRUD operations, returning DataFrames and Series
4244
- **📎 File Operations**: Upload files to Dataverse file columns with automatic chunking for large files
4345
- **🔐 Azure Identity**: Built-in authentication using Azure Identity credential providers with comprehensive support
4446
- **🛡️ Error Handling**: Structured exception hierarchy with detailed error context and retry guidance
@@ -232,6 +234,42 @@ client.records.upsert("account", [
232234
])
233235
```
234236

237+
### DataFrame operations
238+
239+
The SDK provides pandas wrappers for all CRUD operations via the `client.dataframe` namespace, using DataFrames and Series for input and output.
240+
241+
```python
242+
import pandas as pd
243+
244+
# Query records as a single DataFrame
245+
df = client.dataframe.get("account", filter="statecode eq 0", select=["name", "telephone1"])
246+
print(f"Found {len(df)} accounts")
247+
248+
# Limit results with top for large tables
249+
df = client.dataframe.get("account", select=["name"], top=100)
250+
251+
# Fetch a single record as a one-row DataFrame
252+
df = client.dataframe.get("account", record_id=account_id, select=["name"])
253+
254+
# Create records from a DataFrame (returns a Series of GUIDs)
255+
new_accounts = pd.DataFrame([
256+
{"name": "Contoso", "telephone1": "555-0100"},
257+
{"name": "Fabrikam", "telephone1": "555-0200"},
258+
])
259+
new_accounts["accountid"] = client.dataframe.create("account", new_accounts)
260+
261+
# Update records from a DataFrame (id_column identifies the GUID column)
262+
new_accounts["telephone1"] = ["555-0199", "555-0299"]
263+
client.dataframe.update("account", new_accounts, id_column="accountid")
264+
265+
# Clear a field by setting clear_nulls=True (by default, NaN/None fields are skipped)
266+
df = pd.DataFrame([{"accountid": new_accounts["accountid"].iloc[0], "websiteurl": None}])
267+
client.dataframe.update("account", df, id_column="accountid", clear_nulls=True)
268+
269+
# Delete records by passing a Series of GUIDs
270+
client.dataframe.delete("account", new_accounts["accountid"])
271+
```
272+
235273
### Query data
236274

237275
```python
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# Copyright (c) Microsoft Corporation.
2+
# Licensed under the MIT license.
3+
4+
"""
5+
PowerPlatform Dataverse Client - DataFrame Operations Walkthrough
6+
7+
This example demonstrates how to use the pandas DataFrame extension methods
8+
for CRUD operations with Microsoft Dataverse.
9+
10+
Prerequisites:
11+
pip install PowerPlatform-Dataverse-Client
12+
pip install azure-identity
13+
"""
14+
15+
import sys
16+
import uuid
17+
18+
import pandas as pd
19+
from azure.identity import InteractiveBrowserCredential
20+
21+
from PowerPlatform.Dataverse.client import DataverseClient
22+
23+
24+
def main():
25+
# -- Setup & Authentication ------------------------------------
26+
base_url = input("Enter Dataverse org URL (e.g. https://yourorg.crm.dynamics.com): ").strip()
27+
if not base_url:
28+
print("[ERR] No URL entered; exiting.")
29+
sys.exit(1)
30+
base_url = base_url.rstrip("/")
31+
32+
print("[INFO] Authenticating via browser...")
33+
credential = InteractiveBrowserCredential()
34+
35+
with DataverseClient(base_url, credential) as client:
36+
_run_walkthrough(client)
37+
38+
39+
def _run_walkthrough(client):
40+
table = input("Enter table schema name to use [default: account]: ").strip() or "account"
41+
print(f"[INFO] Using table: {table}")
42+
43+
# Unique tag to isolate test records from existing data
44+
tag = uuid.uuid4().hex[:8]
45+
test_filter = f"contains(name,'{tag}')"
46+
print(f"[INFO] Using tag '{tag}' to identify test records")
47+
48+
select_cols = ["name", "telephone1", "websiteurl", "lastonholdtime"]
49+
50+
# -- 1. Create records from a DataFrame ------------------------
51+
print("\n" + "-" * 60)
52+
print("1. Create records from a DataFrame")
53+
print("-" * 60)
54+
55+
new_accounts = pd.DataFrame(
56+
[
57+
{
58+
"name": f"Contoso_{tag}",
59+
"telephone1": "555-0100",
60+
"websiteurl": "https://contoso.com",
61+
"lastonholdtime": pd.Timestamp("2024-06-15 10:30:00"),
62+
},
63+
{"name": f"Fabrikam_{tag}", "telephone1": "555-0200", "websiteurl": None, "lastonholdtime": None},
64+
{
65+
"name": f"Northwind_{tag}",
66+
"telephone1": None,
67+
"websiteurl": "https://northwind.com",
68+
"lastonholdtime": pd.Timestamp("2024-12-01 08:00:00"),
69+
},
70+
]
71+
)
72+
print(f" Input DataFrame:\n{new_accounts.to_string(index=False)}\n")
73+
74+
# create_dataframe returns a Series of GUIDs aligned with the input rows
75+
new_accounts["accountid"] = client.dataframe.create(table, new_accounts)
76+
print(f"[OK] Created {len(new_accounts)} records")
77+
print(f" IDs: {new_accounts['accountid'].tolist()}")
78+
79+
# -- 2. Query records as a DataFrame -------------------------
80+
print("\n" + "-" * 60)
81+
print("2. Query records as a DataFrame")
82+
print("-" * 60)
83+
84+
df_all = client.dataframe.get(table, select=select_cols, filter=test_filter)
85+
print(f"[OK] Got {len(df_all)} records in one DataFrame")
86+
print(f" Columns: {list(df_all.columns)}")
87+
print(f"{df_all.to_string(index=False)}")
88+
89+
# -- 3. Limit results with top ------------------------------
90+
print("\n" + "-" * 60)
91+
print("3. Limit results with top")
92+
print("-" * 60)
93+
94+
df_top2 = client.dataframe.get(table, select=select_cols, filter=test_filter, top=2)
95+
print(f"[OK] Got {len(df_top2)} records with top=2")
96+
print(f"{df_top2.to_string(index=False)}")
97+
98+
# -- 4. Fetch a single record by ID ----------------------------
99+
print("\n" + "-" * 60)
100+
print("4. Fetch a single record by ID")
101+
print("-" * 60)
102+
103+
first_id = new_accounts["accountid"].iloc[0]
104+
print(f" Fetching record {first_id}...")
105+
single = client.dataframe.get(table, record_id=first_id, select=select_cols)
106+
print(f"[OK] Single record DataFrame:\n{single.to_string(index=False)}")
107+
108+
# -- 5. Update records from a DataFrame ------------------------
109+
print("\n" + "-" * 60)
110+
print("5. Update records with different values per row")
111+
print("-" * 60)
112+
113+
new_accounts["telephone1"] = ["555-1100", "555-1200", "555-1300"]
114+
print(f" New telephone numbers: {new_accounts['telephone1'].tolist()}")
115+
client.dataframe.update(table, new_accounts[["accountid", "telephone1"]], id_column="accountid")
116+
print("[OK] Updated 3 records")
117+
118+
# Verify the updates
119+
verified = client.dataframe.get(table, select=select_cols, filter=test_filter)
120+
print(f" Verified:\n{verified.to_string(index=False)}")
121+
122+
# -- 6. Broadcast update (same value to all records) -----------
123+
print("\n" + "-" * 60)
124+
print("6. Broadcast update (same value to all records)")
125+
print("-" * 60)
126+
127+
broadcast_df = new_accounts[["accountid"]].copy()
128+
broadcast_df["websiteurl"] = "https://updated.example.com"
129+
print(f" Setting websiteurl to 'https://updated.example.com' for all {len(broadcast_df)} records")
130+
client.dataframe.update(table, broadcast_df, id_column="accountid")
131+
print("[OK] Broadcast update complete")
132+
133+
# Verify all records have the same websiteurl
134+
verified = client.dataframe.get(table, select=select_cols, filter=test_filter)
135+
print(f" Verified:\n{verified.to_string(index=False)}")
136+
137+
# Default: NaN/None fields are skipped (not overridden on server)
138+
print("\n Updating with NaN values (default: clear_nulls=False, fields should stay unchanged)...")
139+
sparse_df = pd.DataFrame(
140+
[
141+
{"accountid": new_accounts["accountid"].iloc[0], "telephone1": "555-9999", "websiteurl": None},
142+
]
143+
)
144+
client.dataframe.update(table, sparse_df, id_column="accountid")
145+
verified = client.dataframe.get(table, select=select_cols, filter=test_filter)
146+
print(f" Verified (Contoso telephone1 updated, websiteurl unchanged):\n{verified.to_string(index=False)}")
147+
148+
# Opt-in: clear_nulls=True sends None as null to clear the field
149+
print("\n Clearing websiteurl for Contoso with clear_nulls=True...")
150+
clear_df = pd.DataFrame([{"accountid": new_accounts["accountid"].iloc[0], "websiteurl": None}])
151+
client.dataframe.update(table, clear_df, id_column="accountid", clear_nulls=True)
152+
verified = client.dataframe.get(table, select=select_cols, filter=test_filter)
153+
print(f" Verified (Contoso websiteurl should be empty):\n{verified.to_string(index=False)}")
154+
155+
# -- 7. Delete records by passing a Series of GUIDs ------------
156+
print("\n" + "-" * 60)
157+
print("7. Delete records by passing a Series of GUIDs")
158+
print("-" * 60)
159+
160+
print(f" Deleting {len(new_accounts)} records...")
161+
client.dataframe.delete(table, new_accounts["accountid"], use_bulk_delete=False)
162+
print(f"[OK] Deleted {len(new_accounts)} records")
163+
164+
# Verify deletions - filter for our tagged records should return 0
165+
remaining = client.dataframe.get(table, select=select_cols, filter=test_filter)
166+
print(f" Verified: {len(remaining)} test records remaining (expected 0)")
167+
168+
print("\n" + "=" * 60)
169+
print("[OK] DataFrame operations walkthrough complete!")
170+
print("=" * 60)
171+
172+
173+
if __name__ == "__main__":
174+
main()

0 commit comments

Comments
 (0)