Skip to content

docs(security): expand RBAC skill from the Entra ID Learn article#14

Open
khelanmodi wants to merge 13 commits into
mainfrom
docs/security-rbac
Open

docs(security): expand RBAC skill from the Entra ID Learn article#14
khelanmodi wants to merge 13 commits into
mainfrom
docs/security-rbac

Conversation

@khelanmodi
Copy link
Copy Markdown
Collaborator

Summary

The existing security-entra-rbac.md rule was a 50-line stub; this PR reworks the security/ skill to faithfully capture the Connect using role-based access control and Microsoft Entra ID Learn article (~6,000 words).

The article describes a two-level access model that the kit was not previously teaching:

Layer Controls Granted via
Azure RBAC (control-plane) Cluster metadata, connection strings, firewall, private endpoints, registering Entra users Microsoft.DocumentDB/mongoClusters/* actions
Database roles (data-plane) Reading / writing documents, queries, collection ops MongoDB roles mapped to a registered Entra principal or native user

What's new

File Purpose
skills/security/SKILL.md Two-level access model table + refreshed rule index
skills/security/security-entra-rbac.md (rewritten) Auth modes, enabling Entra via authConfig.allowedModes, principal registration as mongoClusters/users, MONGODB-OIDC connection settings, Python / TypeScript / C# OIDC callback samples, replica auth-independence gotcha
skills/security/security-azure-rbac-actions.md (new) Full mongoClusters/* action table, custom-role Bicep + Terraform, narrow-role pattern for CI/CD identity, listConnectionStrings/action secret-grade warning
skills/security/security-database-roles.md (new) readWriteAnyDatabase + clusterAdmin must be granted together for cluster-wide read-write (most-easily-missed detail); readAnyDatabase for read-only; secondary-user management via mongo shell with customData.IdentityProvider; user-management permission matrix
skills/security/security-token-lifetime-revocation.md (new) Up to ~90 minute token attack window after principal disable/delete; two-step revocation (Entra refresh-token revoke + delete cluster users/<principal-id> resource); incident-response checklist
docs/SKILLS.md Refreshed security row

Validation

pwsh scripts/validate-skills.ps1
# → documentdb-security -> skills/security/   ✅  (15 skills total)

Notes

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

khelanmodi and others added 7 commits May 11, 2026 09:36
Existing `security-entra-rbac.md` was a 50-line stub. Reworked the security skill to faithfully capture the Azure DocumentDB role-based access control article, which is fundamentally a two-level model (Azure RBAC for the `mongoCluster` resource + Entra ID OIDC + MongoDB database roles for the data plane).

Changes:
- SKILL.md: two-level access model table; updated rule index.
- security-entra-rbac.md (rewritten): auth modes (NativeAuth / Entra / both), enabling Entra via `authConfig.allowedModes`, principal registration as `mongoClusters/users`, MONGODB-OIDC connection settings, Python / TypeScript / C# OIDC callback samples, replica auth-independence gotcha.
- security-azure-rbac-actions.md (new): full `Microsoft.DocumentDB/mongoClusters/*` action table, custom-role Bicep + Terraform, narrow-role example for CI/CD identity, listConnectionStrings warning.
- security-database-roles.md (new): readWriteAnyDatabase + clusterAdmin must be granted together; readAnyDatabase for read-only; secondary-user management via mongo shell with customData.IdentityProvider; user-management permission matrix.
- security-token-lifetime-revocation.md (new): up-to-90-minute token attack window after principal disable/delete; two-step revocation (Entra refresh-token revoke + delete cluster user resource); incident-response checklist.
- docs/SKILLS.md: refreshed security row.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds `security-firewall-rules.md` to cover the operational details from the Azure DocumentDB Configure Firewall Learn article that are not covered by `security-private-endpoint`:

- ~15-minute firewall-change propagation window (don't troubleshoot until it elapses).
- The 'Allow public access from Azure resources and services' toggle is separate from IP rules and is the right escape hatch for Azure Functions / Stream Analytics workloads.
- CIDR-form IP allow-listing for corporate / partner / CI egress.
- Warning against the `0.0.0.0-255.255.255.255` shortcut which lets every Azure tenant reach the cluster.
- Workflow guidance: portal 'Add current client IP', narrowest-CIDR preference, audit/cleanup cadence.
- Cross-links to `security-private-endpoint` and `security-entra-rbac` (defense-in-depth: identity stays a hard gate even if network controls are coarse).

Also updates `SKILL.md` rule index and `docs/SKILLS.md` security row.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ense-in-depth overview

Captures the two net-new bits from the 'Secure your cluster' Learn overview that aren't covered by existing rules:

1. `security-admin-password-and-identity-separation.md` (new):
   - Strong admin password policy (>=8 chars + upper + lower + digit + non-alphanumeric); generate from Key Vault, reference via @secure() param, rotate quarterly.
   - Explicit principle: use distinct Azure identities for control-plane (IaC/SRE) vs data-plane (runtime app). Worked Bicep example showing one identity with RBAC on mongoClusters/* and a *different* identity registered as a database user.
   - Bounded-blast-radius framing for why each side of the separation matters.

2. `SKILL.md`: added a 'Defense-in-depth checklist' table mirroring the structure of the Learn overview (network / transport / identity / control-plane / data-plane / encryption / backups / incident response) so readers get the same one-glance view, with each row linking to the rule that covers it.

All other points in the security overview article are already covered by existing rules (TLS, private endpoint, firewall, RBAC actions, database roles, CMK, backup retention) - no duplication added.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nk how-to

Previous rule was a 38-line sketch. Rewritten from the Azure DocumentDB `how-to-private-link` Learn article to capture the operational specifics that make the difference between a working and broken Private Link setup:

- DNS / group ID cheat sheet: zone name `privatelink.mongocluster.cosmos.azure.com`, subresource `MongoCluster`, SRV record `_mongodb._tcp.<cluster>.mongocluster.cosmos.azure.com`, port 27017.
- Up-front caveat: Private Link does *not* prevent public DNS resolution; defense is reachability-based.
- Up-front caveat: private DNS integration must be enabled for `mongodb+srv` discovery to work.
- Full Azure CLI flow: VNet/subnet, `--disable-private-endpoint-network-policies true` on subnet, PE create, DNS zone create, VNet link, dns-zone-group bind.
- `publicNetworkAccess=Disabled` to lock down once apps verify connectivity (Bicep + az resource update).
- Replica cluster nuance: only self connection strings on replicas; replica networking is not inherited from primary (cross-link to ha-cross-region-replica).
- Verify + troubleshoot section: az CLI status check, Windows PowerShell + Linux/macOS SRV/A-record DNS tests, three common failure modes (DNS resolves to public IP, connection times out, no records in private zone) with actionable checks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Net-new content from the `how-to-public-access` Learn article, folded into the existing firewall rule (no new file - heavy overlap):

- **Default state is locked-down**: explicit statement that a cluster with no firewall rules and no private endpoint has public access effectively disabled. Opening the cluster is a deliberate action.
- **Start-IP / End-IP form** alongside CIDR: clarified that both are valid (portal uses Start/End; CIDR /32 == single IP == same address in Start and End).
- **Portal IP detection caveat**: corporate proxies / VPN / IPv6 transition can make the detected IP differ from real egress - verify before saving.
- **'Allow Azure services' toggle warning sharpened**: admits traffic from *any Azure service in any customer subscription*, not just yours. Identity becomes the only remaining gate. Cross-link to security-entra-rbac + security-database-roles.
- **New 'Disable public access entirely' recipe**: remove all rules + clear Azure-services checkbox + wait for propagation + verify.
- Added second References entry pointing at how-to-public-access.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the 45-line stub with a full rewrite (~12 KB) sourced from the official `database-encryption-at-rest` Learn article.

Key additions:
- **Always-on encryption framing**: data is always encrypted at rest in both modes - the question is who owns the key.
- **SMK vs CMK decision table** with when-to-choose-which guidance.
- **Architectural pin**: SMK/CMK is a cluster-creation-time decision and CANNOT be changed for the lifetime of the cluster - choose CMK from day one if you might ever need it.
- **Pre-flight checklist for the Key Vault**: same Entra tenant, soft-delete, purge protection, RBAC permission model, `Disable public access` + trusted-services bypass, resource lock, logging, alerting, DR.
- **Key requirements**: RSA / RSA-HSM only, 2048 / 3072 / 4096 bits (recommend 4096), Enabled state, valid activation/expiry, import formats (.pfx, .byok, .backup).
- **User-assigned managed identity is required** (system-assigned not supported); `Key Vault Crypto Service Encryption User` role on RBAC vaults, or get/list/wrapKey/unwrapKey on legacy access-policy vaults.
- **Version-less keys + Key Vault autorotation**: recommended setup for production, no cluster action required on rotation.
- **Revocation and recovery runbooks**: explicit guidance to rehearse before relying on them.
- **Decision matrix** at the end summarising when CMK is the right call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Net-new rule sourced from the official `how-to-database-encryption-troubleshoot` Learn article. Complements the architecture/setup rule security-cmk-encryption with the operational runbook for when CMK goes wrong.

Covers:
- **Two timing facts**: ~60 minutes from breakage to `Inaccessible`, ~60 minutes from fix to `Ready`. You cannot force revalidation - plan SLAs accordingly and don't edit-thrash during recovery.
- **Cause table** for `Inaccessible` state: key expired / disabled / deleted, vault deleted, identity deleted, RBAC role removed, access policy revoked, vault firewall too restrictive - each with its resolution.
- **Managed-identity deletion subtlety**: a new identity with the same name is NOT the same principal (Entra IDs are object-ID-keyed). Either soft-restore the original OR create new + update cluster's identity reference.
- **Triage procedure** for `Inaccessible` clusters - investigate before restarting/recreating/rotating.
- **CMK provisioning failure recovery**: walk the requirements checklist, delete the `Failed` cluster entity, re-provision.
- **Pre-emptive monitoring table**: Key Vault key-disable/delete events, RBAC removal, vault firewall changes, identity deletion, key-near-expiry - alerts to set up *before* you need this rule.

Updated SKILL.md to list the new rule.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the skills/security/ documentation skill to reflect the Microsoft Learn guidance for Azure DocumentDB security, especially the two-level access model (Azure RBAC control-plane + Entra ID/OIDC + MongoDB roles data-plane), and adds new operational runbooks (RBAC action scoping, database-role mapping, token revocation, and CMK troubleshooting).

Changes:

  • Rewrites skills/security/SKILL.md to add a defense-in-depth checklist, a two-level access model summary, and an updated rule index.
  • Adds new security rules for token lifetime/revocation, Azure RBAC mongoClusters/* actions, database roles, admin-password/identity separation, and IP firewall guidance; expands Private Endpoint and Entra RBAC guidance.
  • Expands CMK encryption guidance and adds a CMK troubleshooting operational playbook; updates docs/SKILLS.md to reflect the richer security skill coverage.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
skills/security/SKILL.md Updates the security skill overview with checklist, two-level access model, and rule index.
skills/security/security-token-lifetime-revocation.md New rule documenting token validity window and revocation/runbook steps.
skills/security/security-private-endpoint.md Expanded Private Endpoint setup + DNS troubleshooting guidance.
skills/security/security-firewall-rules.md New rule documenting firewall behaviors, propagation delay, and safe patterns.
skills/security/security-entra-rbac.md Major expansion covering auth modes, principal registration, and OIDC connection examples.
skills/security/security-database-roles.md New rule documenting DocumentDB/MongoDB role mapping and user-management constraints.
skills/security/security-cmk-troubleshooting.md New operational playbook for CMK-caused Inaccessible state recovery.
skills/security/security-cmk-encryption.md Expanded CMK vs SMK guidance, vault/identity setup checklist, and rotation guidance.
skills/security/security-azure-rbac-actions.md New rule listing mongoClusters/* control-plane actions and custom-role patterns.
skills/security/security-admin-password-and-identity-separation.md New rule on admin password hygiene and control-plane vs data-plane identity separation.
docs/SKILLS.md Refreshes the catalog entry for the security skill to match the expanded scope.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/security/SKILL.md Outdated
Comment thread skills/security/SKILL.md Outdated
| **Control-plane authorization** | Subscription-level Azure RBAC | Custom role scoped to `Microsoft.DocumentDB/mongoClusters/*` at resource-group scope | [`security-azure-rbac-actions`](security-azure-rbac-actions.md) |
| **Data-plane authorization** | One admin user | Per-database least-privilege roles; admin identity ≠ runtime identity | [`security-database-roles`](security-database-roles.md), [`security-admin-password-and-identity-separation`](security-admin-password-and-identity-separation.md) |
| **Encryption at rest** | Service-managed AES-256 | CMK for regulated workloads (Premium SSD v1 only — see `storage/`) | [`security-cmk-encryption`](security-cmk-encryption.md) |
| **Backups** | Automated, 35-day retention | Restore drills; understand 7-day post-deletion window | [`high-availability/ha-backup-retention`](../high-availability/ha-backup-retention.md) |
Comment thread skills/security/SKILL.md Outdated
---
name: documentdb-security
description: Security best practices for Azure DocumentDB — TLS enforcement, Private Endpoint / firewall configuration, Microsoft Entra ID + RBAC for authentication, and customer-managed keys (CMK) for encryption at rest. Use when reviewing production security posture, configuring networking, setting up authentication / authorization, or preparing for compliance audits.
description: Security best practices for Azure DocumentDB — TLS enforcement, Private Endpoint / firewall configuration, two-level access control (Azure RBAC on the `mongoCluster` resource + Microsoft Entra ID OIDC authentication with MongoDB database roles for data-plane access), token-lifetime / revocation handling, and customer-managed keys (CMK) for encryption at rest. Use when reviewing production security posture, configuring networking, setting up authentication / authorization, granting per-app least-privilege access, revoking compromised tokens, or preparing for compliance audits.
Comment on lines +7 to +14
When you authenticate to Azure DocumentDB with Microsoft Entra ID, the MongoDB driver presents an **OIDC access token** issued by Entra. That token has a finite lifetime — typically **up to ~90 minutes from issuance** — and remains valid for that full window even if:

- The Entra principal is **disabled** or **deleted** in the tenant.
- The associated **refresh token is revoked**.
- The cluster user resource (`mongoClusters/users/<principal-id>`) is **deleted**.

In other words, **the access-token lifetime is the maximum attack window if a token is compromised.** A malicious actor with a valid token can keep using it until expiry. This is a fundamental property of token-based auth — not specific to DocumentDB — but the response pattern is specific.

Comment thread skills/security/security-entra-rbac.md Outdated
Use the global SRV host so the connection automatically follows promotion in multi-cluster setups:

```
mongodb+srv://<client-id>@<cluster-name>.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=MONGODB-OIDC&retrywrites=false&maxIdleTimeMS=120000
Comment thread skills/security/security-entra-rbac.md Outdated
Comment on lines +172 to +173
AzureIdentityTokenHandler tokenHandler = new(credential, tenantId);

Comment on lines +80 to +85
permissions: [
{
actions: [
'Microsoft.DocumentDb/mongoClusters/*'
]
}
Comment thread skills/security/security-azure-rbac-actions.md Outdated
Comment thread skills/security/security-cmk-encryption.md Outdated
Comment thread skills/security/security-cmk-encryption.md
khelanmodi and others added 3 commits May 12, 2026 09:14
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- SKILL.md: 'mongoCluster' -> 'mongoClusters' in description

- SKILL.md: replace broken link to ha-backup-retention.md with Learn URL (reliability-documentdb)

- security-azure-rbac-actions.md: fix Bicep 'Microsoft.DocumentDb' -> 'Microsoft.DocumentDB' casing

- security-cmk-encryption.md: References-section link now points at storage-service-encryption (not VM disk-encryption)

- security-token-lifetime-revocation.md: clarify cryptographic validity vs cluster authorization; remove false claim that deleting the user resource leaves the token 'valid' (it stays cryptographically valid but auth fails)

- security-entra-rbac.md: drop '<client-id>@' from connection-string template (OIDC carries principal in token, not URL)

- security-entra-rbac.md: fix TypeScript expiresInSeconds units bug (expiresOnTimestamp is ms-since-epoch, was being mixed with seconds)

- security-entra-rbac.md: replace undefined AzureIdentityTokenHandler/tenantId in C# sample with a working Func<OidcCallbackParameters,CancellationToken,Task<OidcAccessToken>> callback using DefaultAzureCredential

Note: retryWrites mismatch (table says false, Python/Node samples say true) left for maintainer guidance via PR comment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@khelanmodi
Copy link
Copy Markdown
Collaborator Author

@gahllevy quick question on retryWrites for Entra OIDC (Copilot review on security-entra-rbac.md flagged a mismatch):

  • The Required driver settings table says retrywrites: false.
  • The Python sample passes retryWrites=True and the TypeScript sample passes retryWrites: true.
  • The C# sample uses RetryWrites = false (matches the table).

Which is the supported configuration for DocumentDB + MONGODB-OIDC? I held off on changing this in the PR pending your guidance — happy to align in either direction in a follow-up commit.

Copy link
Copy Markdown
Collaborator

@gahl-levy gahl-levy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing

@khelanmodi
Copy link
Copy Markdown
Collaborator Author

@copilot resolve the merge conflicts in this pull request

Copilot AI and others added 2 commits May 12, 2026 21:35
Co-authored-by: khelanmodi <141972056+khelanmodi@users.noreply.github.com>
…alse in all OIDC samples

Agent-Logs-Url: https://github.com/Azure/documentdb-agent-kit/sessions/affae1f8-dec2-4df4-82bd-a20cdb3ad34a

Co-authored-by: khelanmodi <141972056+khelanmodi@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 12, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in commit 82134cc. The only conflict was in docs/SKILLS.md — the resolution keeps the expanded security row from this PR while incorporating the new storage/ and updated high-availability/ rows from main. As a bonus, the merge also brought in ha-backup-retention.md, which fixes the previously broken link flagged in the review.

@gahllevy quick question on retryWrites for Entra OIDC (Copilot review on security-entra-rbac.md flagged a mismatch):

  • The Required driver settings table says retrywrites: false....

retryWrites=false is the correct setting for DocumentDB + MONGODB-OIDC — the connection-string example and the "Required driver settings" table both say false, and the C# sample already matched. The Python (retryWrites=True) and TypeScript (retryWrites: true) samples were wrong. Fixed in commit 7a176f3 — all three language samples now consistently use false.

Copy link
Copy Markdown
Collaborator

@gahl-levy gahl-levy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a question. With these samples, it's not practical to do a review like this. Best to either test them out to ensure correctness. Copilot can test them out for you as well to speed this up.

Comment on lines +152 to +160
const credential = new DefaultAzureCredential();
const client = new MongoClient(
`mongodb+srv://${clusterName}.global.mongocluster.cosmos.azure.com/`,
{
connectTimeoutMS: 120000,
tls: true,
retryWrites: true,
authMechanism: 'MONGODB-OIDC',
authMechanismProperties: {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khelanmodi, was this already fixed. Should be false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a skill for RBAC

4 participants