docs(security): expand RBAC skill from the Entra ID Learn article#14
docs(security): expand RBAC skill from the Entra ID Learn article#14khelanmodi wants to merge 13 commits into
Conversation
Existing `security-entra-rbac.md` was a 50-line stub. Reworked the security skill to faithfully capture the Azure DocumentDB role-based access control article, which is fundamentally a two-level model (Azure RBAC for the `mongoCluster` resource + Entra ID OIDC + MongoDB database roles for the data plane). Changes: - SKILL.md: two-level access model table; updated rule index. - security-entra-rbac.md (rewritten): auth modes (NativeAuth / Entra / both), enabling Entra via `authConfig.allowedModes`, principal registration as `mongoClusters/users`, MONGODB-OIDC connection settings, Python / TypeScript / C# OIDC callback samples, replica auth-independence gotcha. - security-azure-rbac-actions.md (new): full `Microsoft.DocumentDB/mongoClusters/*` action table, custom-role Bicep + Terraform, narrow-role example for CI/CD identity, listConnectionStrings warning. - security-database-roles.md (new): readWriteAnyDatabase + clusterAdmin must be granted together; readAnyDatabase for read-only; secondary-user management via mongo shell with customData.IdentityProvider; user-management permission matrix. - security-token-lifetime-revocation.md (new): up-to-90-minute token attack window after principal disable/delete; two-step revocation (Entra refresh-token revoke + delete cluster user resource); incident-response checklist. - docs/SKILLS.md: refreshed security row. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds `security-firewall-rules.md` to cover the operational details from the Azure DocumentDB Configure Firewall Learn article that are not covered by `security-private-endpoint`: - ~15-minute firewall-change propagation window (don't troubleshoot until it elapses). - The 'Allow public access from Azure resources and services' toggle is separate from IP rules and is the right escape hatch for Azure Functions / Stream Analytics workloads. - CIDR-form IP allow-listing for corporate / partner / CI egress. - Warning against the `0.0.0.0-255.255.255.255` shortcut which lets every Azure tenant reach the cluster. - Workflow guidance: portal 'Add current client IP', narrowest-CIDR preference, audit/cleanup cadence. - Cross-links to `security-private-endpoint` and `security-entra-rbac` (defense-in-depth: identity stays a hard gate even if network controls are coarse). Also updates `SKILL.md` rule index and `docs/SKILLS.md` security row. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ense-in-depth overview Captures the two net-new bits from the 'Secure your cluster' Learn overview that aren't covered by existing rules: 1. `security-admin-password-and-identity-separation.md` (new): - Strong admin password policy (>=8 chars + upper + lower + digit + non-alphanumeric); generate from Key Vault, reference via @secure() param, rotate quarterly. - Explicit principle: use distinct Azure identities for control-plane (IaC/SRE) vs data-plane (runtime app). Worked Bicep example showing one identity with RBAC on mongoClusters/* and a *different* identity registered as a database user. - Bounded-blast-radius framing for why each side of the separation matters. 2. `SKILL.md`: added a 'Defense-in-depth checklist' table mirroring the structure of the Learn overview (network / transport / identity / control-plane / data-plane / encryption / backups / incident response) so readers get the same one-glance view, with each row linking to the rule that covers it. All other points in the security overview article are already covered by existing rules (TLS, private endpoint, firewall, RBAC actions, database roles, CMK, backup retention) - no duplication added. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nk how-to Previous rule was a 38-line sketch. Rewritten from the Azure DocumentDB `how-to-private-link` Learn article to capture the operational specifics that make the difference between a working and broken Private Link setup: - DNS / group ID cheat sheet: zone name `privatelink.mongocluster.cosmos.azure.com`, subresource `MongoCluster`, SRV record `_mongodb._tcp.<cluster>.mongocluster.cosmos.azure.com`, port 27017. - Up-front caveat: Private Link does *not* prevent public DNS resolution; defense is reachability-based. - Up-front caveat: private DNS integration must be enabled for `mongodb+srv` discovery to work. - Full Azure CLI flow: VNet/subnet, `--disable-private-endpoint-network-policies true` on subnet, PE create, DNS zone create, VNet link, dns-zone-group bind. - `publicNetworkAccess=Disabled` to lock down once apps verify connectivity (Bicep + az resource update). - Replica cluster nuance: only self connection strings on replicas; replica networking is not inherited from primary (cross-link to ha-cross-region-replica). - Verify + troubleshoot section: az CLI status check, Windows PowerShell + Linux/macOS SRV/A-record DNS tests, three common failure modes (DNS resolves to public IP, connection times out, no records in private zone) with actionable checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Net-new content from the `how-to-public-access` Learn article, folded into the existing firewall rule (no new file - heavy overlap): - **Default state is locked-down**: explicit statement that a cluster with no firewall rules and no private endpoint has public access effectively disabled. Opening the cluster is a deliberate action. - **Start-IP / End-IP form** alongside CIDR: clarified that both are valid (portal uses Start/End; CIDR /32 == single IP == same address in Start and End). - **Portal IP detection caveat**: corporate proxies / VPN / IPv6 transition can make the detected IP differ from real egress - verify before saving. - **'Allow Azure services' toggle warning sharpened**: admits traffic from *any Azure service in any customer subscription*, not just yours. Identity becomes the only remaining gate. Cross-link to security-entra-rbac + security-database-roles. - **New 'Disable public access entirely' recipe**: remove all rules + clear Azure-services checkbox + wait for propagation + verify. - Added second References entry pointing at how-to-public-access. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the 45-line stub with a full rewrite (~12 KB) sourced from the official `database-encryption-at-rest` Learn article. Key additions: - **Always-on encryption framing**: data is always encrypted at rest in both modes - the question is who owns the key. - **SMK vs CMK decision table** with when-to-choose-which guidance. - **Architectural pin**: SMK/CMK is a cluster-creation-time decision and CANNOT be changed for the lifetime of the cluster - choose CMK from day one if you might ever need it. - **Pre-flight checklist for the Key Vault**: same Entra tenant, soft-delete, purge protection, RBAC permission model, `Disable public access` + trusted-services bypass, resource lock, logging, alerting, DR. - **Key requirements**: RSA / RSA-HSM only, 2048 / 3072 / 4096 bits (recommend 4096), Enabled state, valid activation/expiry, import formats (.pfx, .byok, .backup). - **User-assigned managed identity is required** (system-assigned not supported); `Key Vault Crypto Service Encryption User` role on RBAC vaults, or get/list/wrapKey/unwrapKey on legacy access-policy vaults. - **Version-less keys + Key Vault autorotation**: recommended setup for production, no cluster action required on rotation. - **Revocation and recovery runbooks**: explicit guidance to rehearse before relying on them. - **Decision matrix** at the end summarising when CMK is the right call. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Net-new rule sourced from the official `how-to-database-encryption-troubleshoot` Learn article. Complements the architecture/setup rule security-cmk-encryption with the operational runbook for when CMK goes wrong. Covers: - **Two timing facts**: ~60 minutes from breakage to `Inaccessible`, ~60 minutes from fix to `Ready`. You cannot force revalidation - plan SLAs accordingly and don't edit-thrash during recovery. - **Cause table** for `Inaccessible` state: key expired / disabled / deleted, vault deleted, identity deleted, RBAC role removed, access policy revoked, vault firewall too restrictive - each with its resolution. - **Managed-identity deletion subtlety**: a new identity with the same name is NOT the same principal (Entra IDs are object-ID-keyed). Either soft-restore the original OR create new + update cluster's identity reference. - **Triage procedure** for `Inaccessible` clusters - investigate before restarting/recreating/rotating. - **CMK provisioning failure recovery**: walk the requirements checklist, delete the `Failed` cluster entity, re-provision. - **Pre-emptive monitoring table**: Key Vault key-disable/delete events, RBAC removal, vault firewall changes, identity deletion, key-near-expiry - alerts to set up *before* you need this rule. Updated SKILL.md to list the new rule. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Expands the skills/security/ documentation skill to reflect the Microsoft Learn guidance for Azure DocumentDB security, especially the two-level access model (Azure RBAC control-plane + Entra ID/OIDC + MongoDB roles data-plane), and adds new operational runbooks (RBAC action scoping, database-role mapping, token revocation, and CMK troubleshooting).
Changes:
- Rewrites
skills/security/SKILL.mdto add a defense-in-depth checklist, a two-level access model summary, and an updated rule index. - Adds new security rules for token lifetime/revocation, Azure RBAC
mongoClusters/*actions, database roles, admin-password/identity separation, and IP firewall guidance; expands Private Endpoint and Entra RBAC guidance. - Expands CMK encryption guidance and adds a CMK troubleshooting operational playbook; updates
docs/SKILLS.mdto reflect the richer security skill coverage.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/security/SKILL.md | Updates the security skill overview with checklist, two-level access model, and rule index. |
| skills/security/security-token-lifetime-revocation.md | New rule documenting token validity window and revocation/runbook steps. |
| skills/security/security-private-endpoint.md | Expanded Private Endpoint setup + DNS troubleshooting guidance. |
| skills/security/security-firewall-rules.md | New rule documenting firewall behaviors, propagation delay, and safe patterns. |
| skills/security/security-entra-rbac.md | Major expansion covering auth modes, principal registration, and OIDC connection examples. |
| skills/security/security-database-roles.md | New rule documenting DocumentDB/MongoDB role mapping and user-management constraints. |
| skills/security/security-cmk-troubleshooting.md | New operational playbook for CMK-caused Inaccessible state recovery. |
| skills/security/security-cmk-encryption.md | Expanded CMK vs SMK guidance, vault/identity setup checklist, and rotation guidance. |
| skills/security/security-azure-rbac-actions.md | New rule listing mongoClusters/* control-plane actions and custom-role patterns. |
| skills/security/security-admin-password-and-identity-separation.md | New rule on admin password hygiene and control-plane vs data-plane identity separation. |
| docs/SKILLS.md | Refreshes the catalog entry for the security skill to match the expanded scope. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | **Control-plane authorization** | Subscription-level Azure RBAC | Custom role scoped to `Microsoft.DocumentDB/mongoClusters/*` at resource-group scope | [`security-azure-rbac-actions`](security-azure-rbac-actions.md) | | ||
| | **Data-plane authorization** | One admin user | Per-database least-privilege roles; admin identity ≠ runtime identity | [`security-database-roles`](security-database-roles.md), [`security-admin-password-and-identity-separation`](security-admin-password-and-identity-separation.md) | | ||
| | **Encryption at rest** | Service-managed AES-256 | CMK for regulated workloads (Premium SSD v1 only — see `storage/`) | [`security-cmk-encryption`](security-cmk-encryption.md) | | ||
| | **Backups** | Automated, 35-day retention | Restore drills; understand 7-day post-deletion window | [`high-availability/ha-backup-retention`](../high-availability/ha-backup-retention.md) | |
| --- | ||
| name: documentdb-security | ||
| description: Security best practices for Azure DocumentDB — TLS enforcement, Private Endpoint / firewall configuration, Microsoft Entra ID + RBAC for authentication, and customer-managed keys (CMK) for encryption at rest. Use when reviewing production security posture, configuring networking, setting up authentication / authorization, or preparing for compliance audits. | ||
| description: Security best practices for Azure DocumentDB — TLS enforcement, Private Endpoint / firewall configuration, two-level access control (Azure RBAC on the `mongoCluster` resource + Microsoft Entra ID OIDC authentication with MongoDB database roles for data-plane access), token-lifetime / revocation handling, and customer-managed keys (CMK) for encryption at rest. Use when reviewing production security posture, configuring networking, setting up authentication / authorization, granting per-app least-privilege access, revoking compromised tokens, or preparing for compliance audits. |
| When you authenticate to Azure DocumentDB with Microsoft Entra ID, the MongoDB driver presents an **OIDC access token** issued by Entra. That token has a finite lifetime — typically **up to ~90 minutes from issuance** — and remains valid for that full window even if: | ||
|
|
||
| - The Entra principal is **disabled** or **deleted** in the tenant. | ||
| - The associated **refresh token is revoked**. | ||
| - The cluster user resource (`mongoClusters/users/<principal-id>`) is **deleted**. | ||
|
|
||
| In other words, **the access-token lifetime is the maximum attack window if a token is compromised.** A malicious actor with a valid token can keep using it until expiry. This is a fundamental property of token-based auth — not specific to DocumentDB — but the response pattern is specific. | ||
|
|
| Use the global SRV host so the connection automatically follows promotion in multi-cluster setups: | ||
|
|
||
| ``` | ||
| mongodb+srv://<client-id>@<cluster-name>.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=MONGODB-OIDC&retrywrites=false&maxIdleTimeMS=120000 |
| AzureIdentityTokenHandler tokenHandler = new(credential, tenantId); | ||
|
|
| permissions: [ | ||
| { | ||
| actions: [ | ||
| 'Microsoft.DocumentDb/mongoClusters/*' | ||
| ] | ||
| } |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- SKILL.md: 'mongoCluster' -> 'mongoClusters' in description - SKILL.md: replace broken link to ha-backup-retention.md with Learn URL (reliability-documentdb) - security-azure-rbac-actions.md: fix Bicep 'Microsoft.DocumentDb' -> 'Microsoft.DocumentDB' casing - security-cmk-encryption.md: References-section link now points at storage-service-encryption (not VM disk-encryption) - security-token-lifetime-revocation.md: clarify cryptographic validity vs cluster authorization; remove false claim that deleting the user resource leaves the token 'valid' (it stays cryptographically valid but auth fails) - security-entra-rbac.md: drop '<client-id>@' from connection-string template (OIDC carries principal in token, not URL) - security-entra-rbac.md: fix TypeScript expiresInSeconds units bug (expiresOnTimestamp is ms-since-epoch, was being mixed with seconds) - security-entra-rbac.md: replace undefined AzureIdentityTokenHandler/tenantId in C# sample with a working Func<OidcCallbackParameters,CancellationToken,Task<OidcAccessToken>> callback using DefaultAzureCredential Note: retryWrites mismatch (table says false, Python/Node samples say true) left for maintainer guidance via PR comment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@gahllevy quick question on
Which is the supported configuration for DocumentDB + MONGODB-OIDC? I held off on changing this in the PR pending your guidance — happy to align in either direction in a follow-up commit. |
|
@copilot resolve the merge conflicts in this pull request |
Co-authored-by: khelanmodi <141972056+khelanmodi@users.noreply.github.com>
…alse in all OIDC samples Agent-Logs-Url: https://github.com/Azure/documentdb-agent-kit/sessions/affae1f8-dec2-4df4-82bd-a20cdb3ad34a Co-authored-by: khelanmodi <141972056+khelanmodi@users.noreply.github.com>
Merge conflicts resolved in commit
|
gahl-levy
left a comment
There was a problem hiding this comment.
Left a question. With these samples, it's not practical to do a review like this. Best to either test them out to ensure correctness. Copilot can test them out for you as well to speed this up.
| const credential = new DefaultAzureCredential(); | ||
| const client = new MongoClient( | ||
| `mongodb+srv://${clusterName}.global.mongocluster.cosmos.azure.com/`, | ||
| { | ||
| connectTimeoutMS: 120000, | ||
| tls: true, | ||
| retryWrites: true, | ||
| authMechanism: 'MONGODB-OIDC', | ||
| authMechanismProperties: { |
There was a problem hiding this comment.
@khelanmodi, was this already fixed. Should be false
Summary
The existing
security-entra-rbac.mdrule was a 50-line stub; this PR reworks thesecurity/skill to faithfully capture the Connect using role-based access control and Microsoft Entra ID Learn article (~6,000 words).The article describes a two-level access model that the kit was not previously teaching:
Microsoft.DocumentDB/mongoClusters/*actionsWhat's new
skills/security/SKILL.mdskills/security/security-entra-rbac.md(rewritten)authConfig.allowedModes, principal registration asmongoClusters/users,MONGODB-OIDCconnection settings, Python / TypeScript / C# OIDC callback samples, replica auth-independence gotchaskills/security/security-azure-rbac-actions.md(new)mongoClusters/*action table, custom-role Bicep + Terraform, narrow-role pattern for CI/CD identity,listConnectionStrings/actionsecret-grade warningskills/security/security-database-roles.md(new)readWriteAnyDatabase+clusterAdminmust be granted together for cluster-wide read-write (most-easily-missed detail);readAnyDatabasefor read-only; secondary-user management via mongo shell withcustomData.IdentityProvider; user-management permission matrixskills/security/security-token-lifetime-revocation.md(new)users/<principal-id>resource); incident-response checklistdocs/SKILLS.mdValidation
Notes
maindirectly.high-availability/ha-cross-region-replica.mdfor the "auth methods are managed independently on the replica" gotcha.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com