diff --git a/docs/SKILLS.md b/docs/SKILLS.md index 66865ea..a81a85e 100644 --- a/docs/SKILLS.md +++ b/docs/SKILLS.md @@ -26,7 +26,7 @@ why it matters → incorrect example → correct example → references. | [`full-text-search/`](../skills/full-text-search/) | `fts-` | `createSearchIndexes` + `$search` for BM25 keyword / phrase / fuzzy; custom analyzers (keyword + edgeGram) for prefix matching on IDs; `pathHierarchy` for hierarchical identifiers; multi-field search indexes; hybrid (BM25 + vector) with RRF | | [`high-availability/`](../skills/high-availability/) | `ha-` | Enabling HA + zone redundancy, cross-region replica, automatic backup retention, documented SLAs | | [`storage/`](../skills/storage/) | `storage-` | Premium SSD v2 high-performance storage: compute-tier-gated IOPS/bandwidth caps, v1 vs v2 selection, limitations (no CMK, migration paths), disk-hydration sequencing | -| [`security/`](../skills/security/) | `security-` | TLS, Private Endpoint, Microsoft Entra RBAC, CMK | +| [`security/`](../skills/security/) | `security-` | TLS, Private Endpoint, IP firewall rules (CIDR + propagation), Azure RBAC actions for `mongoClusters/*`, Microsoft Entra ID + OIDC authentication, MongoDB database roles for data-plane access (incl. `readWriteAnyDatabase`+`clusterAdmin` pairing), token-lifetime / revocation pattern, CMK | | [`monitoring/`](../skills/monitoring/) | `monitoring-` | Slow query logs, metrics & alerts | | [`local-deployment/`](../skills/local-deployment/) | `local-` | Docker image choice, Compose, TLS, env-driven config, dev/prod parity | diff --git a/skills/security/SKILL.md b/skills/security/SKILL.md index f216aeb..af8eba3 100644 --- a/skills/security/SKILL.md +++ b/skills/security/SKILL.md @@ -1,16 +1,49 @@ --- name: documentdb-security -description: Security best practices for Azure DocumentDB — TLS enforcement, Private Endpoint / firewall configuration, Microsoft Entra ID + RBAC for authentication, and customer-managed keys (CMK) for encryption at rest. Use when reviewing production security posture, configuring networking, setting up authentication / authorization, or preparing for compliance audits. +description: Security best practices for Azure DocumentDB — TLS enforcement, Private Endpoint / firewall configuration, two-level access control (Azure RBAC on the `mongoClusters` resource + Microsoft Entra ID OIDC authentication with MongoDB database roles for data-plane access), token-lifetime / revocation handling, and customer-managed keys (CMK) for encryption at rest. Use when reviewing production security posture, configuring networking, setting up authentication / authorization, granting per-app least-privilege access, revoking compromised tokens, or preparing for compliance audits. license: MIT --- # Security — Azure DocumentDB -Core controls: TLS on the wire, network isolation with Private Endpoint, Microsoft Entra ID for identity, and CMK for data-at-rest encryption on regulated workloads. +Core controls: TLS on the wire, network isolation with Private Endpoint, **two-level access control** (Azure RBAC for the cluster resource + Entra ID + MongoDB database roles for data), and CMK for data-at-rest encryption on regulated workloads. + +## Defense-in-depth checklist + +A production cluster should have all eight layers in place: + +| Layer | Default | Production recommendation | Rule | +|---|---|---|---| +| **Network** | Public access + firewall rules | Private Endpoint; public access disabled; firewall ≠ `0.0.0.0/0` | [`security-private-endpoint`](security-private-endpoint.md), [`security-firewall-rules`](security-firewall-rules.md) | +| **Transport** | TLS up to 1.3 (always on) | TLS verified at client; `tlsAllowInvalidCertificates` never set | [`security-tls-required`](security-tls-required.md) | +| **Identity** | One built-in native admin | Entra ID enabled; managed identities per workload; admin password strong + rotated | [`security-entra-rbac`](security-entra-rbac.md), [`security-admin-password-and-identity-separation`](security-admin-password-and-identity-separation.md) | +| **Control-plane authorization** | Subscription-level Azure RBAC | Custom role scoped to `Microsoft.DocumentDB/mongoClusters/*` at resource-group scope | [`security-azure-rbac-actions`](security-azure-rbac-actions.md) | +| **Data-plane authorization** | One admin user | Per-database least-privilege roles; admin identity ≠ runtime identity | [`security-database-roles`](security-database-roles.md), [`security-admin-password-and-identity-separation`](security-admin-password-and-identity-separation.md) | +| **Encryption at rest** | Service-managed AES-256 | CMK for regulated workloads (Premium SSD v1 only — see `storage/`) | [`security-cmk-encryption`](security-cmk-encryption.md) | +| **Backups** | Automated, 35-day retention | Restore drills; understand 7-day post-deletion window | [Reliability in Azure DocumentDB](https://learn.microsoft.com/azure/reliability/reliability-documentdb) | +| **Incident response** | Audit + activity logs available | Token revocation playbook ready; monitoring alerts wired up | [`security-token-lifetime-revocation`](security-token-lifetime-revocation.md), [`monitoring/`](../monitoring/) | + +## Two-level access model + +Azure DocumentDB separates **who can manage the cluster as an Azure resource** from **who can read/write data inside it**: + +| Layer | What it controls | Granted via | +|---|---|---| +| **Azure RBAC** (control-plane) | Read cluster metadata, list connection strings, manage firewall rules, manage private endpoints, register/remove Entra users | Role assignments on `Microsoft.DocumentDB/mongoClusters/*` actions | +| **Database roles** (data-plane) | Read/write documents, run queries, create collections | MongoDB roles (`readWriteAnyDatabase`, `clusterAdmin`, `readAnyDatabase`, `root`) mapped to a registered Entra principal or native user | + +A principal needs both layers for end-to-end access, and they are managed independently. **Use different principals for the two layers** wherever practical — see [`security-admin-password-and-identity-separation`](security-admin-password-and-identity-separation.md). ## Rules - [security-tls-required](security-tls-required.md) — Always connect with TLS; never disable certificate validation in production. - [security-private-endpoint](security-private-endpoint.md) — Use Private Endpoint / firewall rules; disable public network access where possible. -- [security-entra-rbac](security-entra-rbac.md) — Prefer Microsoft Entra ID + RBAC over long-lived passwords; create per-app secondary users with least privilege. +- [security-firewall-rules](security-firewall-rules.md) — IP firewall rules in CIDR form; "Allow Azure services" toggle; ~15-minute propagation delay; avoid the `0.0.0.0-255.255.255.255` shortcut. +- [security-entra-rbac](security-entra-rbac.md) — Enable Microsoft Entra ID authentication, register principals as `mongoClusters/users`, connect with `MONGODB-OIDC`; prefer managed identities over passwords. +- [security-azure-rbac-actions](security-azure-rbac-actions.md) — Azure resource-level RBAC: actions exposed by `Microsoft.DocumentDB/mongoClusters/*`, custom-role pattern, control-plane least-privilege. +- [security-database-roles](security-database-roles.md) — MongoDB database roles for data-plane access: `readWriteAnyDatabase` + `clusterAdmin` must be granted together for read-write; `readAnyDatabase` for read-only; secondary-user management via mongo shell. +- [security-admin-password-and-identity-separation](security-admin-password-and-identity-separation.md) — Strong admin password policy (≥8 chars + complexity); use distinct Azure identities for control-plane vs data-plane to bound blast radius. +- [security-token-lifetime-revocation](security-token-lifetime-revocation.md) — Entra access tokens are valid up to ~90 minutes from issuance even after the principal is disabled; revoke data-plane access immediately by deleting the `mongoClusters/users/` resource. - [security-cmk-encryption](security-cmk-encryption.md) — Use customer-managed keys (CMK) for data-at-rest encryption on regulated workloads. +- [security-cmk-troubleshooting](security-cmk-troubleshooting.md) — CMK operational runbook: causes of `Inaccessible` cluster state, ~60-minute revalidation window, managed-identity / key / vault recovery procedures, and provisioning-failure triage. + diff --git a/skills/security/security-admin-password-and-identity-separation.md b/skills/security/security-admin-password-and-identity-separation.md new file mode 100644 index 0000000..3a6f41f --- /dev/null +++ b/skills/security/security-admin-password-and-identity-separation.md @@ -0,0 +1,166 @@ +# security-admin-password-and-identity-separation + +**Category:** Security · **Priority:** MEDIUM + +## Why it matters + +Two related habits move an Azure DocumentDB cluster from "default-secure" to "production-hardened": + +1. The cluster's built-in administrative account uses a **password**. That password is the fallback path that bypasses Entra ID, so its strength and rotation hygiene matter — even when most workloads use managed identities. +2. The Azure identity that **manages** the cluster (creates, scales, deletes, lists connection strings) and the identity that **uses** the cluster's data (reads, writes, queries) should be **different principals**. Sharing one identity across both planes is a classic privilege-escalation path — a data-plane bug that yields code execution now also yields cluster-management rights. + +This rule captures both habits because the Learn security overview groups them as identity-management best practices. + +## Admin password policy + +Azure DocumentDB enforces a minimum password policy on administrative accounts: **at least 8 characters, with all four of upper-case, lower-case, digits, and non-alphanumeric characters.** Treat the floor as the floor, not the target — generate longer passwords from a password manager and store them in Key Vault. + +## Identity separation: control plane vs data plane + +Recall the two-level access model (see `SKILL.md`): + +- **Control plane** — Azure RBAC on `Microsoft.DocumentDB/mongoClusters/*` (resize, firewall, list connection strings, register users). +- **Data plane** — Entra ID + MongoDB database roles (read/write documents). + +Use **distinct Azure identities** for these two layers wherever practical. The principle is the same as separating "deploy" identities from "runtime" identities elsewhere in Azure: a single compromised identity should not be able to both modify infrastructure and access data. + +## Incorrect + +Weak admin password: + +```bicep +// Fails policy if too short, but even meeting the minimum (`Pa55!ab`) is too weak. +administrator: { + userName: 'clusteradmin' + password: 'Pa55word!' // ← 9 chars, dictionary-derived, easily guessed +} +``` + +Hard-coding admin credentials in a connection string for runtime use: + +```javascript +// Anti-pattern — the admin account is now exposed to every host that runs this code. +const uri = `mongodb+srv://clusteradmin:${PROD_ADMIN_PASSWORD}@.global.mongocluster.cosmos.azure.com/?tls=true`; +``` + +Using the **same** managed identity for IaC (control plane) and for the application (data plane): + +```bicep +// Anti-pattern — the app's managed identity has both: +// 1. Contributor on the cluster (control plane) +// 2. mongoClusters/users registration as readWrite (data plane) +// A code-execution bug in the app can now resize, delete, or exfiltrate keys. +resource appIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' existing = { + name: 'app-identity' +} + +resource controlPlaneRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = { + // … + properties: { + roleDefinitionId: contributorRoleId + principalId: appIdentity.properties.principalId // same identity + } +} + +resource dataPlaneUser 'Microsoft.DocumentDB/mongoClusters/users@2025-09-01' = { + // … + properties: { + identityProvider: { /* … */ } + roles: [ { db: 'orders', role: 'readWrite' } ] // same identity + } +} +``` + +## Correct + +### Generate strong admin passwords from Key Vault + +Sample workflow: + +```bash +# Generate a 32-char password and store in Key Vault. +NEW_PWD=$(openssl rand -base64 24 | tr -d '/+=' | head -c 32) +az keyvault secret set \ + --vault-name "" \ + --name "docdb-admin-password" \ + --value "$NEW_PWD" +``` + +Reference it from Bicep instead of inlining a literal: + +```bicep +@secure() +param adminPassword string // sourced from Key Vault at deploy time + +resource cluster 'Microsoft.DocumentDB/mongoClusters@2025-09-01' = { + name: clusterName + location: location + properties: { + administrator: { + userName: 'clusteradmin' + password: adminPassword + } + // … + } +} +``` + +```bash +az deployment group create \ + --resource-group "" \ + --template-file cluster.bicep \ + --parameters adminPassword="$(az keyvault secret show --vault-name --name docdb-admin-password --query value -o tsv)" +``` + +Rotate the admin password on a schedule (e.g. quarterly) and after any incident. Use managed identities for everyday workload access so admin-password rotation is not on the critical path. + +### Use two separate identities + +Pattern: one identity for IaC / SRE, a different identity for each workload that consumes data. + +```bicep +// Control-plane identity — used by your deploy pipeline. +resource sreIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' existing = { + name: 'sre-deploy-identity' +} + +resource controlPlaneRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = { + name: guid(resourceGroup().id, sreIdentity.id, 'control-plane') + scope: resourceGroup() + properties: { + roleDefinitionId: docdbRbacOwnerRoleId // see security-azure-rbac-actions + principalId: sreIdentity.properties.principalId + } +} + +// Data-plane identity — used by the application at runtime. +resource appIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' existing = { + name: 'orders-api-identity' +} + +resource dataPlaneUser 'Microsoft.DocumentDB/mongoClusters/users@2025-09-01' = { + name: '${clusterName}/users/${appIdentity.properties.principalId}' + properties: { + identityProvider: { + type: 'Microsoft.EntraID' + properties: { principalType: 'ManagedIdentity' } + } + roles: [ + { db: 'orders', role: 'readWrite' } + ] + } +} +``` + +The SRE identity can resize / configure / register users but has **no** database role and cannot read data. The app identity can read and write `orders` but has **no** Azure RBAC role and cannot scale, list connection strings, or delete the cluster. A compromise of either is bounded. + +### Why this matters in practice + +- A leaked CI/CD identity that holds Contributor on the cluster but no database role can still cause damage (delete the cluster, change firewall, list connection strings, register a malicious user) — but it cannot directly exfiltrate data, buying the responder time. +- A leaked app identity that holds `readWrite` on one database cannot resize, delete, or reconfigure the cluster — the blast radius is the database it owns. + +## References + +- [Secure your cluster — Azure DocumentDB](https://learn.microsoft.com/azure/documentdb/security) +- [Create secondary users](https://learn.microsoft.com/azure/documentdb/secondary-users) +- Related: [security-entra-rbac](security-entra-rbac.md), [security-azure-rbac-actions](security-azure-rbac-actions.md), [security-database-roles](security-database-roles.md) diff --git a/skills/security/security-azure-rbac-actions.md b/skills/security/security-azure-rbac-actions.md new file mode 100644 index 0000000..ab7134b --- /dev/null +++ b/skills/security/security-azure-rbac-actions.md @@ -0,0 +1,151 @@ +# security-azure-rbac-actions + +**Category:** Security · **Priority:** MEDIUM + +## Why it matters + +Azure DocumentDB exposes its cluster as an Azure resource of type `Microsoft.DocumentDB/mongoClusters`. **Azure role-based access control** governs *resource-level* operations — reading cluster metadata, listing connection strings, managing firewall rules, managing private endpoints, and registering/removing users. This is independent of the database-level roles that control data-plane access (see [security-database-roles](security-database-roles.md)). + +Following the two-level access model: + +- **Use built-in Azure roles** (Reader, Contributor, Owner) for broad personas. +- **Use a custom role** scoped to the `Microsoft.DocumentDB/mongoClusters/*` actions when a persona needs cluster-management rights without subscription-wide access. +- **Never** rely on Azure role assignments alone to grant data access — data-plane authorization is governed by Entra principal registration + MongoDB database roles. + +## Actions available on `Microsoft.DocumentDB/mongoClusters` + +| Action | Description | +|---|---| +| `Microsoft.DocumentDB/mongoClusters/read` | Read a cluster resource or list clusters | +| `Microsoft.DocumentDB/mongoClusters/write` | Create / update cluster properties or tags | +| `Microsoft.DocumentDB/mongoClusters/delete` | Delete a cluster | +| `Microsoft.DocumentDB/mongoClusters/listConnectionStrings/action` | List connection strings (read access to credentials!) | +| `Microsoft.DocumentDB/mongoClusters/PrivateEndpointConnectionsApproval/action` | Approve a private endpoint connection | +| `Microsoft.DocumentDB/mongoClusters/firewallRules/{read,write,delete}` | Manage firewall rules | +| `Microsoft.DocumentDB/mongoClusters/privateEndpointConnections/{read,write,delete}` | Manage private endpoint connections | +| `Microsoft.DocumentDB/mongoClusters/privateEndpointConnectionProxies/{read,write,delete,validate/action}` | Manage private endpoint connection proxies | +| `Microsoft.DocumentDB/mongoClusters/privateLinkResources/read` | Read private link resources | +| `Microsoft.DocumentDB/mongoClusters/users/{read,write,delete}` | Register / remove Entra principals on the cluster | + +> ⚠️ `listConnectionStrings/action` returns the **administrator connection string** including the password (for native-auth clusters). Treat it as a secret-grade action and grant it sparingly. + +## Incorrect + +Giving an application's service principal the subscription-scoped Contributor role just to "let it read its own connection string": + +```bash +# Massively over-broad — grants write on everything in the subscription. +az role assignment create \ + --assignee "" \ + --role "Contributor" \ + --scope "/subscriptions/" +``` + +Using subscription-scoped custom roles when resource-group or resource scope would do: + +```bicep +assignableScopes: [ subscription().id ] // too broad for an app role +``` + +## Correct + +### Pattern 1 — built-in roles for the common cases + +| Persona | Built-in Azure role | Scope | +|---|---|---| +| Read-only operator (dashboards, audits) | **Reader** | Cluster resource | +| Cluster operator (resize, firewall) | **Contributor** | Cluster resource | +| Full owner (delete, role assignments) | **Owner** | Resource group | + +### Pattern 2 — custom role for cluster-management only + +For an SRE or automation principal that needs to manage clusters but not other Azure resources, define a custom role at resource-group scope: + +```bicep +metadata description = 'RBAC definition for Azure DocumentDB cluster management.' + +@description('Name of the role definition.') +param roleDefinitionName string = 'Azure DocumentDB RBAC Owner' + +@description('Description of the role definition.') +param roleDefinitionDescription string = 'Can perform all Azure role-based access control actions for Azure DocumentDB clusters.' + +resource definition 'Microsoft.Authorization/roleDefinitions@2022-04-01' = { + name: guid(subscription().id, resourceGroup().id, roleDefinitionName) + scope: resourceGroup() + properties: { + roleName: roleDefinitionName + description: roleDefinitionDescription + type: 'CustomRole' + permissions: [ + { + actions: [ + 'Microsoft.DocumentDB/mongoClusters/*' + ] + } + ] + assignableScopes: [ + resourceGroup().id + ] + } +} + +output definitionId string = definition.id +``` + +Assign it to a principal: + +```bicep +resource assignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = { + name: guid(subscription().id, resourceGroup().id, roleDefinitionId, identityId) + scope: resourceGroup() + properties: { + roleDefinitionId: roleDefinitionId + principalId: identityId + } +} +``` + +Terraform equivalent: + +```terraform +resource "azurerm_role_definition" "control_plane" { + name = "Azure DocumentDB RBAC Owner" + scope = data.azurerm_resource_group.existing.id + description = "Can perform all Azure role-based access control actions for Azure DocumentDB clusters." + + permissions { + actions = [ "Microsoft.DocumentDB/mongoClusters/*" ] + } + + assignable_scopes = [ data.azurerm_resource_group.existing.id ] +} + +resource "azurerm_role_assignment" "control_plane" { + scope = data.azurerm_resource_group.existing.id + role_definition_id = azurerm_role_definition.control_plane.role_definition_resource_id + principal_id = var.identity_id +} +``` + +### Pattern 3 — narrow custom role for an automation principal + +If a CI/CD identity should only register users on existing clusters (not create or delete clusters), restrict the actions: + +```bicep +permissions: [ + { + actions: [ + 'Microsoft.DocumentDB/mongoClusters/read' + 'Microsoft.DocumentDB/mongoClusters/users/read' + 'Microsoft.DocumentDB/mongoClusters/users/write' + 'Microsoft.DocumentDB/mongoClusters/users/delete' + ] + } +] +``` + +## References + +- [Azure RBAC custom roles](https://learn.microsoft.com/azure/role-based-access-control/custom-roles) +- [Connect using role-based access control and Microsoft Entra ID](https://learn.microsoft.com/azure/documentdb/how-to-connect-role-based-access-control) diff --git a/skills/security/security-cmk-encryption.md b/skills/security/security-cmk-encryption.md index 4ce275b..8b5876c 100644 --- a/skills/security/security-cmk-encryption.md +++ b/skills/security/security-cmk-encryption.md @@ -4,41 +4,183 @@ ## Why it matters -Azure DocumentDB encrypts data at rest by default with Microsoft-managed keys. For regulated workloads (finance, healthcare, government), compliance often requires **customer-managed keys (CMK)** backed by Azure Key Vault so that your organization controls the key lifecycle and can revoke access. +**Every Azure DocumentDB cluster encrypts data at rest, always.** That includes user databases, system databases, temporary files, logs, and backups. The question isn't *whether* data is encrypted — it's **who controls the key**. -Decisions to make up front: -- CMK must be enabled on cluster creation (migration paths are limited). -- The cluster's managed identity needs `Key Vault Crypto Service Encryption User` on the key. -- Key rotation and expiration policies must be defined and monitored — a revoked/expired key can make the cluster unreachable. +Azure DocumentDB offers two encryption modes: + +| Mode | Key owner | When to choose it | +|------|-----------|--------------------| +| **Service-managed keys (SMK)** — default | Microsoft | You want zero key-management overhead and have no regulatory requirement to hold the key yourself. | +| **Customer-managed keys (CMK)** | You, via your Azure Key Vault | Regulated workload (finance, healthcare, government), separation-of-duties policy, or you need to be able to **revoke** the key to make a database inaccessible on demand. | + +Under the hood, both modes rely on [server-side encryption of Azure Storage](https://learn.microsoft.com/azure/storage/common/storage-service-encryption). In CMK mode, **Azure Storage wraps the root Data Encryption Key (DEK) with your key in Key Vault** — your key encrypts a key, not the data directly. Data stays encrypted at all times; switching the wrapping key has no effect on the ciphertext. + +### The non-negotiables + +1. **CMK vs. SMK is a cluster-creation-time decision and cannot be changed for the lifetime of the cluster.** Pick correctly the first time. If you might ever need CMK, create the cluster as CMK from day one — even if the initial key is owned by you with minimal restrictions. +2. **With CMK, you own the responsibility for every component required to keep the cluster decryptable**: the Key Vault, the user-assigned managed identity, the key itself, the network configuration of the vault, and auditing. A misconfiguration on any of these can make the cluster unreachable. +3. **Revocation is a feature, not a bug.** Removing the identity's access to the key, disabling the key, or deleting the key will make the cluster inaccessible. Build that into your runbooks intentionally; don't trip over it accidentally. +4. **Performance is not affected** by either mode. CMK is not slower than SMK at runtime — Storage wraps/unwraps the DEK on key-rotation events, not on every read/write. ## Incorrect -Enabling CMK in production without a documented key-rotation and recovery plan — a missing or expired key will render the database unavailable. +```text +☐ Cluster created with SMK because "we'll switch to CMK later when audit asks for it." + → Not possible. The mode is fixed at creation time. + +☐ Key Vault used for CMK has soft-delete and purge-protection disabled. + → An accidental delete of the key or the vault permanently destroys the cluster's + ability to decrypt. There is no recovery. + +☐ Key Vault and DocumentDB cluster live in different Microsoft Entra tenants. + → Unsupported. The cluster's managed identity cannot read the key across tenants. + +☐ Cluster uses a system-assigned managed identity for CMK. + → CMK requires a user-assigned managed identity. System-assigned cannot be used. + +☐ Key Vault firewall is set to "Allow public access from all networks" in production. + → Works, but defeats the purpose of CMK. Use "Disable public access" + "Allow + trusted Microsoft services to bypass this firewall" instead. + +☐ Symmetric key, or RSA-1024, or EC key used for wrapping. + → Only asymmetric RSA / RSA-HSM keys at 2048, 3072, or 4096 bits are supported. + +☐ Key has an activation date in the future, or an expiry in the past, or is Disabled. + → The wrap operation will fail; cluster operations on encrypted data stop. + +☐ No Azure resource lock on the Key Vault; no logging; no alerting. + → A misclick in the portal or a runaway script can delete the vault. No telemetry + means you find out by way of cluster outage. + +☐ No backup of the key kept outside the vault. + → If Key Vault generates the key for you, take a key backup *before* first use. + The backup can only be restored to Key Vault, but it survives vault loss. +``` ## Correct -1. Create (or reuse) a Key Vault in the same region with **Soft Delete** and **Purge Protection** enabled. -2. Create a key intended for DocumentDB encryption. -3. Grant the DocumentDB cluster's managed identity the Crypto Service Encryption User role on the key. -4. Configure CMK at cluster creation, referencing the Key Vault URI. -5. Define rotation cadence and alerting on key expiration / deletion. -6. Test the revocation/restore runbook in a non-production environment before go-live. - -```bicep -// sketch — confirm current property names/versions in the docs -resource ddb 'Microsoft.DocumentDB/...@...' = { - identity: { type: 'SystemAssigned' } - properties: { - encryption: { - type: 'CustomerManaged' - keyVaultKeyUri: kv.keyUri - } - } -} +### 1. Decide at architecture time + +Before opening the portal: + +- **Compliance review**: does the workload require customer-controlled keys? If unsure, default to SMK — switching from SMK to CMK later requires recreating the cluster and migrating data. +- **Key ownership**: who in the org owns the Key Vault? Typically not the same team as the DocumentDB cluster — CMK is the right tool when you want **separation of duties** between database administrators and security/key custodians. + +### 2. Stand up the Key Vault correctly + +Pre-flight checklist for the vault that will hold the CMK: + +| Setting | Required value | Why | +|---------|---------------|-----| +| Tenant | Same Microsoft Entra tenant as the DocumentDB cluster | Cross-tenant managed-identity access is not supported | +| Soft-delete | **Enabled** (90-day retention recommended) | Lets you recover an accidentally deleted key or vault | +| Purge protection | **Enabled** | Enforces mandatory retention even against malicious deletes | +| Days to retain deleted vaults | **90** (set at vault creation — cannot be changed later) | Maximum safety window | +| Permission model | **RBAC** (preferred) or access policies (legacy) | RBAC is the modern model — use it for new vaults | +| Public network access | **Disabled** + **Allow trusted Microsoft services to bypass this firewall** | Closes the public surface while letting DocumentDB reach the key | +| Resource lock | `CanNotDelete` | Belt-and-suspenders against accidental deletion | +| Logging | Diagnostic settings → Log Analytics / SIEM | You need an audit trail of every key access | +| Alerting | Alerts on key delete, key disable, role-assignment removal | Detect revocation events fast | +| Availability / redundancy | Review and configure per [Key Vault DR guidance](https://learn.microsoft.com/azure/key-vault/general/disaster-recovery-guidance) | Vault unavailability ≈ cluster outage | + +After enabling the firewall lockdown above, the portal may surface the warning **"You enabled the network access control. Only allowed networks have access to this key vault."** when you try to administer the vault from your laptop. **This is expected** and does not block DocumentDB from fetching the key during cluster operations — the cluster reaches the vault via the "trusted Microsoft services" exception. + +### 3. Generate or import the encryption key + +Requirements that the cluster *will not* relax: + +- Algorithm: **RSA** or **RSA-HSM** (asymmetric only — no symmetric, no EC). +- Size: **2048**, **3072**, or **4096** bits. **Recommendation: 4096** for better security. +- State: **Enabled**. +- Activation date: past or unset. +- Expiry: future or unset. +- If importing an existing key: supported file formats are `.pfx`, `.byok`, or `.backup`. + +If Key Vault generates the key, immediately take a [key backup](https://learn.microsoft.com/azure/key-vault/general/backup) before any encryption operation runs against it. The backup can only be restored to Key Vault, but it protects against the catastrophic case of a vault being lost entirely. Store a copy of the key (or the backup) in a separate secure location, or use a key-escrow service — your call, but document the location. + +### 4. Create the user-assigned managed identity and grant it on the key + +CMK on DocumentDB **requires a user-assigned managed identity**. Create one in the same subscription/region as the cluster. + +Grant the identity access to the key: + +**Preferred (RBAC permission model on the vault):** + +- Role: **Key Vault Crypto Service Encryption User** +- Scope: the specific key (preferred) or the vault. + +```azurecli +az role assignment create \ + --assignee-object-id \ + --assignee-principal-type ServicePrincipal \ + --role "Key Vault Crypto Service Encryption User" \ + --scope ``` +**Legacy (access-policy permission model on the vault):** + +Grant the managed identity these key permissions: + +| Permission | Used for | +|------------|----------| +| `get` | Read the public part and properties of the key | +| `list` | Iterate and discover keys in the vault | +| `wrapKey` | Encrypt the DEK with the customer key | +| `unwrapKey` | Decrypt the DEK with the customer key | + +`wrapKey` and `unwrapKey` are the operational permissions DocumentDB uses; `get` and `list` are used during initial setup. + +### 5. Create the cluster with CMK + +CMK must be specified at cluster creation. The cluster references: + +- The user-assigned managed identity (by resource ID). +- The Key Vault key URI (either versioned or version-less — see next section). + +After creation, verify in the portal that the cluster reports CMK as active and that the key URI matches what you provisioned. + +### 6. Take advantage of version-less keys + autorotation + +Azure DocumentDB CMK supports **automatic key-version updates**, a.k.a. **version-less keys**. When the underlying key rolls to a new version, DocumentDB picks it up automatically and re-wraps the DEK — no cluster action required. + +Combine this with **Key Vault [autorotation](https://learn.microsoft.com/azure/key-vault/keys/how-to-configure-key-rotation)** to fully automate key rotation: + +1. Reference the key in the cluster by its version-less URI (no `/` suffix). +2. Configure a rotation policy on the key in Key Vault (e.g., rotate every 90 days, expire after 1 year). +3. DocumentDB will follow the active version automatically. + +This is the recommended setup for production: rotation is automatic, the cluster never goes through a stale-key window, and you get key freshness without operational toil. + +### 7. Plan revocation and recovery + +Document and rehearse: + +- **Revoke** — disable the key in Key Vault, or remove the role assignment from the managed identity. This intentionally renders the cluster inaccessible to the data plane. Use for compromise response. +- **Restore** — re-enable the key (or reassign the role). Access resumes within the propagation window. Cluster does not need to be restarted. +- **Vault DR** — if the Key Vault is lost, restore from the key backup you took in step 3 into a new vault (same tenant, same key material). Update the cluster to reference the new vault if necessary. + +Test all three runbooks in a non-production environment before relying on them. + +## Decision: should I use CMK? + +| Signal | Choice | +|--------|--------| +| Regulatory mandate (PCI-DSS, HIPAA BAA addendum, FedRAMP High, etc.) requires customer-controlled keys | **CMK** | +| Internal policy mandates separation of duties between DBAs and key custodians | **CMK** | +| You need to be able to **immediately revoke access** to the database by disabling a key | **CMK** | +| You want to centrally manage all encryption keys in Key Vault alongside other workloads | **CMK** | +| No regulatory or policy driver, no key-revocation requirement | **SMK** (default) | +| You're unsure but suspect CMK might be needed within the cluster's lifetime | **CMK from day one** — the choice is fixed at creation | + ## References -- [Data encryption at rest](https://learn.microsoft.com/azure/documentdb/database-encryption-at-rest) -- [Configure customer-managed key encryption](https://learn.microsoft.com/azure/documentdb/how-to-data-encryption) -- [Troubleshoot CMK encryption](https://learn.microsoft.com/azure/documentdb/how-to-database-encryption-troubleshoot) +- [Encryption at rest in Azure DocumentDB](https://learn.microsoft.com/azure/documentdb/database-encryption-at-rest) +- [Server-side encryption of Azure Storage](https://learn.microsoft.com/azure/storage/common/storage-service-encryption) +- [Azure Key Vault basic concepts](https://learn.microsoft.com/azure/key-vault/general/basic-concepts) +- [Key Vault RBAC guide](https://learn.microsoft.com/azure/key-vault/general/rbac-guide) +- [Key Vault soft-delete](https://learn.microsoft.com/azure/key-vault/general/soft-delete-overview) +- [Key Vault best practices — purge protection](https://learn.microsoft.com/azure/key-vault/general/best-practices#turn-on-data-protection-for-your-vault) +- [Key Vault key autorotation](https://learn.microsoft.com/azure/key-vault/keys/how-to-configure-key-rotation) +- [Key Vault disaster recovery](https://learn.microsoft.com/azure/key-vault/general/disaster-recovery-guidance) +- [User-assigned managed identities](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview#managed-identity-types) +- Related: [security-entra-rbac](security-entra-rbac.md), [security-private-endpoint](security-private-endpoint.md) diff --git a/skills/security/security-cmk-troubleshooting.md b/skills/security/security-cmk-troubleshooting.md new file mode 100644 index 0000000..80aaf8e --- /dev/null +++ b/skills/security/security-cmk-troubleshooting.md @@ -0,0 +1,155 @@ +# security-cmk-troubleshooting + +**Category:** Security · **Priority:** MEDIUM + +## Why it matters + +With CMK, the cluster's ability to decrypt data depends on a chain of resources you own: a **user-assigned managed identity** → a **role assignment / access policy** on a **Key Vault** → a specific **key** in that vault, reachable over the network. **Break any link and the cluster transitions to the `Inaccessible` state and refuses all connections.** This is by design — it's the security feature you opted into when you chose CMK over service-managed keys. But it means CMK clusters need a sharp operational playbook that SMK clusters do not. + +Two timing facts to memorize: + +- After a key becomes disabled / deleted / expired / unreachable, the cluster transitions to **`Inaccessible`** within **~60 minutes** (not instantly — there's a revalidation cadence). +- After the underlying problem is fixed, the cluster takes **up to ~60 minutes** to revalidate the key and return to **`Ready`**. **You cannot force this.** No restart, no manual revalidation knob — you wait. + +So the worst case is an outage window of ~2 hours from misconfiguration to full recovery, even if you detect and fix in seconds. Plan SLAs accordingly, and make sure operators **don't keep "fixing" things mid-revalidation** under the assumption that the first fix didn't work — they'll thrash the configuration during the recovery window. + +The companion architecture/setup rule is [security-cmk-encryption](security-cmk-encryption.md); this rule covers what to do when it goes wrong. + +## Common causes of `Inaccessible` state + +The cluster goes `Inaccessible` when the managed identity can no longer perform key operations against the configured key. Every cause reduces to one of these: + +| Cause | What changed | Resolution | +|-------|--------------|------------| +| **Key expired** | The key in Key Vault hit its configured expiry date/time | **Extend the expiry date on the existing key** and wait for revalidation. ⚠️ Don't rotate to a new key version or create a new key while the cluster is `Inaccessible` — wait for it to return to `Ready`, *then* rotate. | +| **Key disabled** | Someone toggled the key's state to Disabled | Re-enable the key in Key Vault. Wait for revalidation. | +| **Key deleted** | Someone deleted the key (soft-delete catches this) | Recover the key from soft-delete in Key Vault. Wait for revalidation. (This is why soft-delete + purge-protection are non-negotiable — see [security-cmk-encryption](security-cmk-encryption.md).) | +| **Key Vault deleted** | The vault itself was deleted | [Recover the Key Vault](https://learn.microsoft.com/azure/key-vault/general/key-vault-recovery) from soft-delete. Wait for revalidation. | +| **Managed identity deleted** | The user-assigned identity referenced by the cluster was removed from Entra ID | See "Recovering from managed identity deletion" below — this one has a subtlety. | +| **RBAC role removed** | The `Key Vault Crypto Service Encryption User` assignment was deleted from the identity (or from the vault scope) | **Re-grant the role** to the same identity. Wait for revalidation. **Or**, grant the role to a *different* managed identity and update the cluster to use that identity. | +| **Access policy revoked** (legacy permission model) | One of `list`, `get`, `wrapKey`, `unwrapKey` was removed from the identity's access policy | Re-grant the missing permissions, **or** grant them to a different identity and update the cluster. | +| **Vault firewall too restrictive** | Vault networking was tightened in a way that blocks DocumentDB | Either set vault to **Disable public access** + **Allow trusted Microsoft services to bypass this firewall**, or allow public access from all networks. The trusted-services bypass is the right answer for production. | + +## Incorrect + +```text +☐ Reacting to "Inaccessible" by recreating the cluster. + → Pointless and destructive. Fix the key/identity/vault problem and wait the + revalidation window. The data is still encrypted and intact. + +☐ Rotating the key (creating a new version) while the cluster is Inaccessible. + → The cluster's pointer is still bound to the old (expired/disabled) version's + metadata until revalidation succeeds. Fix the existing key first, get the + cluster back to Ready, *then* rotate. + +☐ Tightening Key Vault networking without selecting "Allow trusted Microsoft + services to bypass this firewall." + → The cluster will lose access at the next revalidation. The vault doesn't + fail loudly when you save the change - failure happens an hour later when + the cluster goes Inaccessible. + +☐ "Recovering" a deleted managed identity by creating a new identity with the + same name. + → Entra ID identities are identified by object ID, not name. A new identity + with the same name is NOT the same principal. Either soft-restore the + original identity, OR create a new one and update the cluster to reference + the new identity's resource ID. + +☐ Repeatedly editing the cluster configuration during the ~60-minute recovery + window because "it's not working yet." + → Revalidation runs on its own cadence. Edit-thrashing extends the outage. + Make the fix, log it, walk away for an hour, then verify. + +☐ No alerting on Key Vault key-disable / role-removal / vault-delete events. + → You'll find out about CMK problems via a cluster outage, ~60 minutes after + the fact. Configure Key Vault diagnostic logs + alerts (see + security-cmk-encryption). +``` + +## Correct + +### Triage: cluster is reported as `Inaccessible` + +1. **Don't restart, recreate, or rotate keys yet.** Investigate first. +2. Check the cluster's CMK configuration in the portal: note the **Key Vault URI**, **key name / version (if versioned)**, and the **user-assigned managed identity** resource ID. You'll need all three. +3. In Key Vault, verify the key: + - Does it still exist? (Check soft-deleted items if not.) + - Is it **Enabled**? + - Activation date in the past, expiry in the future (or unset)? +4. Verify the managed identity exists in Entra ID and has the expected role assignment / access policy on the vault. +5. Verify Key Vault networking allows the cluster to reach it (public-from-all-networks **or** disabled-public + trusted-services bypass). +6. Once you find and fix the broken link, **wait up to ~60 minutes** for the periodic revalidation to flip the cluster back to `Ready`. Don't keep poking. + +### Recovering from managed identity deletion (the subtle one) + +If the user-assigned managed identity was deleted from Entra ID: + +1. **Try to recover the original identity** from soft-delete in Entra ID first ([Entra recovery guidance](https://learn.microsoft.com/azure/active-directory/fundamentals/recover-from-deletions)). If recovery succeeds, the original object ID is restored — no cluster reconfiguration needed. +2. **If recovery is not possible**, create a **new** user-assigned managed identity. Then: + - Grant it the `Key Vault Crypto Service Encryption User` role (RBAC) or the `get` / `list` / `wrapKey` / `unwrapKey` access policy (legacy) on the same key. + - **Update the cluster's `identity` properties to reference the new identity's resource ID.** This step is mandatory — the cluster does not auto-discover the new identity. +3. Wait ~60 minutes for revalidation. + +> ⚠️ **Creating a new identity with the same name as the deleted one does NOT recover the original principal.** Entra ID identities are keyed by object ID (a GUID), not name. The new identity is a different principal — the cluster will not authenticate as it without an explicit reconfiguration. + +### Recovering from key or Key Vault deletion + +1. In Key Vault, navigate to **Managed deleted vaults** (subscription level) or **Manage deleted keys** (vault level). +2. Recover the deleted resource. Soft-delete retention is 90 days by default; after purge, recovery is impossible — your fallback is the key backup you took at vault provisioning time (see [security-cmk-encryption](security-cmk-encryption.md), step 3). +3. Verify the recovered key is **Enabled** and the cluster's managed identity still has the required role / access policy. +4. Wait ~60 minutes for revalidation. + +### Recovering from over-restrictive Key Vault firewall + +Symptoms: the vault and the key are fine, the identity and role are fine, but the cluster still goes `Inaccessible` after a recent Key Vault networking change. + +Fix: + +- Open the Key Vault → **Networking** → set **Allow access from** to either: + - **All networks** (works, but loses the public-surface lockdown), or + - **Disable public access** + tick **Allow trusted Microsoft services to bypass this firewall** ✅ (recommended). +- Save. Wait for the firewall change to propagate (a few minutes) and then for cluster revalidation (~60 minutes). + +The "trusted Microsoft services" bypass is what lets DocumentDB reach the vault without you needing to enumerate cluster egress IPs. It's the production-correct setting. + +### When CMK provisioning fails at cluster creation + +If you see the error: + +> *"Couldn't get access to the key. It might be missing, the provided user identity doesn't have GET permissions on it, or the key vault hasn't enabled access to the public internet."* + +then one of the CMK requirements isn't met. **The failed cluster entity stays around with `clusterStatus: Failed`** — you must clean it up. Procedure: + +1. Walk through the CMK requirements checklist in [security-cmk-encryption](security-cmk-encryption.md): + - Key Vault and DocumentDB in the **same Microsoft Entra tenant**. + - Key Vault firewall allows the cluster to reach the key (public-from-all-networks, or disabled-public + trusted-services bypass). + - Key is **RSA / RSA-HSM**, **Enabled**, 2048 / 3072 / 4096 bits, valid activation and expiry. + - User-assigned managed identity exists. + - Identity has `Key Vault Crypto Service Encryption User` role (RBAC) **or** `get` / `list` / `wrapKey` / `unwrapKey` (legacy access policies) on the key. +2. **Delete the failed cluster** (you can find `clusterStatus = Failed` on the **Overview** blade). +3. Re-provision the cluster, referencing the verified identity and key URI. + +The error message is intentionally vague because the failure mode is on the caller side — DocumentDB cannot tell you which of the requirements is missing, only that the wrap operation failed. Walk the full checklist; don't guess. + +## Reference: monitoring to set up *before* you need this rule + +Bake these alerts into the Key Vault that holds the CMK so you find out *before* the cluster goes `Inaccessible`: + +| Signal | Source | Why | +|--------|--------|-----| +| Key delete / disable event | Key Vault diagnostic logs (`AuditEvent`) | Catches the most common cause of CMK outage | +| Role assignment removed from the cluster's identity on the vault | Azure Activity Log (Authorization category) | Catches RBAC-revocation outage | +| Key Vault firewall configuration change | Azure Activity Log (resource-write on the vault) | Catches networking-tightening mistakes before revalidation | +| User-assigned managed identity deleted | Azure Activity Log on the identity resource | Catches the highest-impact, hardest-to-recover failure mode | +| Key approaching expiry (e.g., 30 days out) | Key Vault key-expiry events / custom alerting | Lets you extend or rotate before automatic outage | + +## References + +- [Troubleshoot CMK encryption — Azure DocumentDB](https://learn.microsoft.com/azure/documentdb/how-to-database-encryption-troubleshoot) +- [Encryption at rest in Azure DocumentDB](https://learn.microsoft.com/azure/documentdb/database-encryption-at-rest) +- [Recover a deleted Key Vault](https://learn.microsoft.com/azure/key-vault/general/key-vault-recovery) +- [Recover from Entra ID deletions](https://learn.microsoft.com/azure/active-directory/fundamentals/recover-from-deletions) +- [Manage user-assigned managed identities](https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities) +- [Key Vault trusted services](https://learn.microsoft.com/azure/key-vault/general/overview-vnet-service-endpoints#trusted-services) +- [Key Vault RBAC roles](https://learn.microsoft.com/azure/key-vault/general/rbac-guide#azure-built-in-roles-for-key-vault-data-plane-operations) +- Related: [security-cmk-encryption](security-cmk-encryption.md), [security-private-endpoint](security-private-endpoint.md) diff --git a/skills/security/security-database-roles.md b/skills/security/security-database-roles.md new file mode 100644 index 0000000..204cf19 --- /dev/null +++ b/skills/security/security-database-roles.md @@ -0,0 +1,151 @@ +# security-database-roles + +**Category:** Security · **Priority:** HIGH + +## Why it matters + +Once an Entra principal (or a native user) is registered on an Azure DocumentDB cluster, **MongoDB database roles** decide what it can actually do with data. Azure DocumentDB exposes the standard MongoDB role names but applies them with a few cluster-specific rules that are easy to get wrong: + +- **Full read-write at cluster scope requires *two* roles together**: `readWriteAnyDatabase` *and* `clusterAdmin`. You cannot grant either of them alone for full read-write — they must both be present. +- `readAnyDatabase` is the read-only equivalent at cluster scope. +- For per-database least privilege, use `readWrite` or `read` scoped to a specific `db`. +- Administrative privileges = `{ db: 'admin', role: 'root' }`. Reserve this for genuine cluster administrators. +- **Non-admin (secondary) users — including Entra ones — cannot create, delete, or update other users.** Only admin principals can. Native non-admin users can change their own password but nothing else. + +| Provider | Role(s) | CreateUser | DeleteUser | UpdateUser | ListUser | +|---|---|---|---|---|---| +| Microsoft Entra ID | `readWriteAnyDatabase` + `clusterAdmin` | ❌ | ❌ | ❌ | ✔️ | +| Microsoft Entra ID | `readAnyDatabase` | ❌ | ❌ | ❌ | ✔️ | +| Native DocumentDB | `readWriteAnyDatabase` + `clusterAdmin` | ❌ | ❌ | Own password only | ✔️ | +| Native DocumentDB | `readAnyDatabase` | ❌ | ❌ | Own password only | ✔️ | + +## Incorrect + +Granting `readWriteAnyDatabase` alone and expecting full write access: + +```bicep +roles: [ + { db: 'admin', role: 'readWriteAnyDatabase' } // ← incomplete, must also include clusterAdmin +] +``` + +Granting `root` for an app that only needs to write to one database: + +```bicep +// Anti-pattern — single-DB workloads should never be assigned root. +roles: [ + { db: 'admin', role: 'root' } +] +``` + +Trying to drop a user from a non-admin session: + +```javascript +// Will fail — only admin principals can manage users. +db.runCommand({ dropUser: "" }); +``` + +## Correct + +### Per-database least privilege (preferred for apps) + +```bicep +resource appUser 'Microsoft.DocumentDB/mongoClusters/users@2025-09-01' = { + name: '${clusterName}/users/${appPrincipalId}' + properties: { + identityProvider: { + type: 'Microsoft.EntraID' + properties: { principalType: 'ManagedIdentity' } + } + roles: [ + { db: 'orders', role: 'readWrite' } // only the database the app actually needs + ] + } +} +``` + +### Cluster-wide read-write (operational roles, migration tools) + +Both roles must be present: + +```bicep +roles: [ + { db: 'admin', role: 'readWriteAnyDatabase' } + { db: 'admin', role: 'clusterAdmin' } // required alongside readWriteAnyDatabase +] +``` + +### Read-only (reporting, BI, audit) + +```bicep +roles: [ + { db: 'admin', role: 'readAnyDatabase' } +] +``` + +### Administrative (rarely used for non-humans) + +```bicep +roles: [ + { db: 'admin', role: 'root' } +] +``` + +## Manage secondary (non-admin) users via the mongo shell + +Sign in as an admin principal first, then run management commands. The `customData.IdentityProvider` field marks the principal type for Entra users; native users omit it. + +Add a non-admin Entra user with cluster-wide read-write: + +```javascript +db.runCommand({ + createUser: "", + roles: [ + { role: "clusterAdmin", db: "admin" }, + { role: "readWriteAnyDatabase", db: "admin" } + ], + customData: { + IdentityProvider: { + type: "MicrosoftEntraID", + properties: { principalType: "user" } // or "servicePrincipal" / "ManagedIdentity" + } + } +}); +``` + +Add a non-admin Entra user with cluster-wide read-only: + +```javascript +db.runCommand({ + createUser: "", + roles: [ + { role: "readAnyDatabase", db: "admin" } + ], + customData: { + IdentityProvider: { + type: "MicrosoftEntraID", + properties: { principalType: "user" } + } + } +}); +``` + +Remove a non-admin user: + +```javascript +db.runCommand({ dropUser: "" }); +``` + +List all users on the cluster (Entra + native, admin + non-admin): + +```javascript +db.runCommand({ usersInfo: 1 }); +``` + +> Admin Entra principals are also registered as Azure resources under `Microsoft.DocumentDB/mongoClusters/users` and replicated to the database. Non-admin Entra principals managed via the mongo shell are **not** registered as Azure resources and won't appear in the Azure portal user list — they only show up in `usersInfo`. + +## References + +- [Connect using role-based access control and Microsoft Entra ID](https://learn.microsoft.com/azure/documentdb/how-to-connect-role-based-access-control) +- [Create secondary users](https://learn.microsoft.com/azure/documentdb/secondary-users) +- MongoDB docs: [Built-in roles](https://www.mongodb.com/docs/manual/reference/built-in-roles/) diff --git a/skills/security/security-entra-rbac.md b/skills/security/security-entra-rbac.md index d60b03d..d1b5343 100644 --- a/skills/security/security-entra-rbac.md +++ b/skills/security/security-entra-rbac.md @@ -1,50 +1,203 @@ # security-entra-rbac -**Category:** Security · **Priority:** MEDIUM +**Category:** Security · **Priority:** HIGH ## Why it matters -Long-lived database passwords in app config or Key Vault entries are a persistent attack surface: they leak, get checked into code, and rotate poorly. Azure DocumentDB supports **Microsoft Entra ID (Azure AD) authentication with role-based access control (RBAC)**, so apps can authenticate with a managed identity and receive short-lived tokens — no secrets to rotate. +Long-lived database passwords in app config or Key Vault entries are a persistent attack surface — they leak, get checked into code, and rotate poorly. Azure DocumentDB supports **Microsoft Entra ID** authentication (via OIDC) so apps can authenticate with a **managed identity** and receive short-lived tokens — no secrets to rotate. Entra-authenticated principals also benefit from centralized credential management, MFA, passwordless sign-in, and uniform identity across Azure services. -For shared-secret scenarios, create **secondary users** with least-privilege roles instead of using the admin account from applications. +Authentication on Azure DocumentDB is **non-disruptive to toggle** — you can enable or change auth methods on a running cluster without a restart. Every cluster is created with native authentication enabled and one built-in admin user; the supported configurations are: + +- **NativeAuth only** (default at create time — native must always be enabled when the cluster is created) +- **MicrosoftEntraID only** (native can be disabled *after* the cluster is provisioned) +- **NativeAuth + MicrosoftEntraID** (most common; recommended during migration) + +Once Entra is enabled, you register Entra principals (users, service principals, system- or user-assigned managed identities, workload identities) on the cluster as Azure resources of type `Microsoft.DocumentDB/mongoClusters/users` and map them to MongoDB database roles. Multiple admin principals of different types can coexist. ## Incorrect +Hard-coded admin credentials in a connection string: + ```javascript -// Hard-coded admin creds in config -const uri = `mongodb+srv://admin:SuperSecret123@prod-ddb.mongocluster.documentdb.azure.com/?tls=true`; +const uri = `mongodb+srv://admin:SuperSecret123@prod-ddb.mongocluster.cosmos.azure.com/?tls=true`; +``` + +Registering an app's service principal with `root` on `admin` "just in case": + +```bicep +// Over-privileged — any data-plane bug now executes with cluster-wide admin. +resource user 'Microsoft.DocumentDB/mongoClusters/users@2025-09-01' = { + name: '${clusterName}/users/${appPrincipalId}' + properties: { + identityProvider: { + type: 'Microsoft.EntraID' + properties: { principalType: 'ServicePrincipal' } + } + roles: [ { db: 'admin', role: 'root' } ] // ← grants everything + } +} ``` ## Correct -Use a managed identity via Entra + RBAC: +### 1. Enable Entra auth on the cluster -```javascript -// Node example — use the driver's Entra auth mechanism -// (verify current driver support/spec in the official docs) -const client = new MongoClient(uri, { - authMechanism: "MONGODB-OIDC", // or the current documented mechanism - // credentialProvider: Azure Managed Identity - tls: true -}); +Add `MicrosoftEntraID` to `authConfig.allowedModes` (keep `NativeAuth` during migration; remove it later if policy allows): + +```bicep +resource cluster 'Microsoft.DocumentDB/mongoClusters@2025-09-01' = { + name: clusterName + location: location + properties: { + authConfig: { + allowedModes: [ + 'MicrosoftEntraID' + 'NativeAuth' + ] + } + } +} ``` -Or, if you must use SCRAM auth, create a dedicated least-privilege user: +Or via Azure CLI: + +```bash +az resource patch \ + --resource-group "" \ + --name "" \ + --resource-type "Microsoft.DocumentDB/mongoClusters" \ + --properties '{"authConfig":{"allowedModes":["MicrosoftEntraID","NativeAuth"]}}' \ + --latest-include-preview +``` + +### 2. Register the principal on the cluster with a least-privilege database role + +```bicep +@allowed([ 'User', 'ServicePrincipal', 'ManagedIdentity' ]) +param principalType string = 'ManagedIdentity' +param principalId string // object ID of the Entra principal +param dbName string = 'sales' + +resource user 'Microsoft.DocumentDB/mongoClusters/users@2025-09-01' = { + name: '${clusterName}/users/${principalId}' + properties: { + identityProvider: { + type: 'Microsoft.EntraID' + properties: { principalType: principalType } + } + roles: [ + { db: dbName, role: 'readWrite' } // scoped to one database + ] + } +} +``` + +For role-shape details (`readWriteAnyDatabase` + `clusterAdmin` must be granted together; `readAnyDatabase` for read-only) see [security-database-roles](security-database-roles.md). + +### 3. Connect using `MONGODB-OIDC` + +Use the global SRV host so the connection automatically follows promotion in multi-cluster setups: -```javascript -// As admin, create a per-app user with only what it needs -db.adminCommand({ - createUser: "orders-api", - pwd: passwordFromKeyVault, - roles: [ - { role: "readWrite", db: "orders" } - ] -}); ``` +mongodb+srv://.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=MONGODB-OIDC&retrywrites=false&maxIdleTimeMS=120000 +``` + +Required driver settings: + +| Option | Value | +|---|---| +| `scheme` | `mongodb+srv` | +| `host` | `.global.mongocluster.cosmos.azure.com` (or `.mongocluster.cosmos.azure.com`) | +| `tls` | `true` | +| `authMechanism` | `MONGODB-OIDC` | +| `retrywrites` | `false` | +| `maxIdleTimeMS` | `120000` | + +#### Python — `DefaultAzureCredential` + OIDC callback + +```python +class AzureIdentityTokenCallback(OIDCCallback): + def __init__(self, credential): + self.credential = credential + + def fetch(self, context: OIDCCallbackContext) -> OIDCCallbackResult: + token = self.credential.get_token( + "https://ossrdbms-aad.database.windows.net/.default").token + return OIDCCallbackResult(access_token=token) + +credential = DefaultAzureCredential() +authProperties = {"OIDC_CALLBACK": AzureIdentityTokenCallback(credential)} + +client = MongoClient( + f"mongodb+srv://{clusterName}.global.mongocluster.cosmos.azure.com/", + connectTimeoutMS=120000, + tls=True, + retryWrites=False, + authMechanism="MONGODB-OIDC", + authMechanismProperties=authProperties, +) +``` + +#### TypeScript / Node + +```typescript +const callback = async (params: OIDCCallbackParams, credential: TokenCredential): Promise => { + const tokenResponse = await credential.getToken(['https://ossrdbms-aad.database.windows.net/.default']); + return { + accessToken: tokenResponse?.token || '', + expiresInSeconds: Math.max(0, Math.floor(((tokenResponse?.expiresOnTimestamp ?? 0) - Date.now()) / 1000)), + }; +}; + +const credential = new DefaultAzureCredential(); +const client = new MongoClient( + `mongodb+srv://${clusterName}.global.mongocluster.cosmos.azure.com/`, + { + connectTimeoutMS: 120000, + tls: true, + retryWrites: false, + authMechanism: 'MONGODB-OIDC', + authMechanismProperties: { + OIDC_CALLBACK: (params) => callback(params, credential), + ALLOWED_HOSTS: ['*.azure.com'], + }, + }, +); +``` + +#### C# / .NET + +```csharp +DefaultAzureCredential credential = new(); + +Func> tokenCallback = + async (_, ct) => + { + AccessToken token = await credential.GetTokenAsync( + new TokenRequestContext(new[] { "https://ossrdbms-aad.database.windows.net/.default" }), + ct); + return new OidcAccessToken(token.Token, token.ExpiresOn - DateTimeOffset.UtcNow); + }; + +MongoUrl url = MongoUrl.Create($"mongodb+srv://{clusterName}.global.mongocluster.cosmos.azure.com/"); +MongoClientSettings settings = MongoClientSettings.FromUrl(url); +settings.UseTls = true; +settings.RetryWrites = false; +settings.MaxConnectionIdleTime = TimeSpan.FromMinutes(2); +settings.Credential = MongoCredential.CreateOidcCredential(tokenCallback); +settings.Freeze(); + +MongoClient client = new(settings); +``` + +The OIDC callback acquires a token for the scope `https://ossrdbms-aad.database.windows.net/.default` (same OAuth resource used by Azure Database for PostgreSQL / MySQL — this is by design and required). + +## Authentication on replica clusters -Rotate secondary-user passwords on a schedule; prefer Entra when available. +Authentication methods are managed **independently** on the primary and replica clusters. Users and managed identities are managed on the primary and synchronized to the replica; auth-mode toggles are not. **Gotcha:** if native auth is disabled on the primary at the moment the replica is created, you cannot enable native auth on the replica without first promoting it. See `high-availability/ha-cross-region-replica.md`. ## References -- [Use Microsoft Entra ID and role-based access control](https://learn.microsoft.com/azure/documentdb/how-to-connect-role-based-access-control) +- [Connect using role-based access control and Microsoft Entra ID](https://learn.microsoft.com/azure/documentdb/how-to-connect-role-based-access-control) - [Create secondary users](https://learn.microsoft.com/azure/documentdb/secondary-users) +- Related: [security-azure-rbac-actions](security-azure-rbac-actions.md), [security-database-roles](security-database-roles.md), [security-token-lifetime-revocation](security-token-lifetime-revocation.md) diff --git a/skills/security/security-firewall-rules.md b/skills/security/security-firewall-rules.md new file mode 100644 index 0000000..ecb201f --- /dev/null +++ b/skills/security/security-firewall-rules.md @@ -0,0 +1,116 @@ +# security-firewall-rules + +**Category:** Security · **Priority:** MEDIUM + +## Why it matters + +When public network access is enabled on an Azure DocumentDB cluster, **firewall rules are the only thing standing between the database and the internet.** They allow inbound traffic from explicit IPv4 sources expressed as either **CIDR ranges** or **Start-IP / End-IP** pairs (the portal uses the latter; both are equivalent — a `/32` is a single IP, and the same address in Start and End fields does the same thing). Firewall rules are independent of (and complementary to) Private Endpoint — Private Endpoint is the strong-isolation control (see [security-private-endpoint](security-private-endpoint.md)); firewall rules are for cases where you must keep public access on but want to scope it to known sources. + +**Default state: locked down.** A newly created cluster with no firewall rules and no private endpoint has **public access effectively disabled** — nothing can reach the data plane until you either add a firewall rule or create a private endpoint. This is a default-deny posture; opening the cluster is a deliberate action, not the absence of one. + +Operational gotchas to bake into runbooks: + +1. **Firewall changes propagate in up to ~15 minutes** — during that window the firewall can behave inconsistently. Don't troubleshoot "connection refused" for the first 15 minutes after a change. +2. **"Allow public access from Azure resources and services" is a separate toggle** from IP rules. It grants access to Azure services (like Azure Functions or Stream Analytics) without listing their IPs — but ⚠️ **it admits connections from *any* Azure service in *any* customer subscription**, not just yours. Identity (Entra ID + database role) is the only remaining gate. +3. **The `0.0.0.0 - 255.255.255.255` shortcut** is essentially "no firewall." Don't use it for production. +4. **The portal's "current client IP" detection can be wrong** — corporate proxies, VPNs, or IPv6 transition can make the portal-detected IP differ from your actual egress. Verify with a "what is my IP" service before saving. + +## Incorrect + +Opening the firewall to the world during an incident and forgetting to close it: + +```text +Cluster: production-db +Firewall: 0.0.0.0 - 255.255.255.255 ← effectively no firewall +Public access: enabled +``` + +Whitelisting a developer's home IP on a production cluster: + +```text +Cluster: production-db +Firewall: 73.123.45.67/32 ← residential DHCP IP — rotates without warning +``` + +Treating a firewall change as immediate and rolling forward before propagation completes: + +```bash +# t+0 Add a CI/CD egress range +az documentdb mongo-cluster firewall-rule create ... + +# t+30s Trigger the deploy — may fail for up to ~15 minutes +ci-pipeline run +``` + +## Correct + +### 1. Add your current client IP for short-lived admin work + +Easiest path is the Azure portal: + +1. Open the cluster → **Networking**. +2. Select **+ Add current client IP address**. +3. **Verify** the detected IP matches your real egress (corporate proxies and VPN concentrators can shift it). +4. **Save**. + +Remove the rule when you're done — don't leave temporary IPs in place. + +### 2. Allow Azure services without enumerating IPs + +For workloads like Azure Functions or Stream Analytics where listing the source IPs isn't practical: + +1. Cluster → **Networking**. +2. Toggle **Allow public access from Azure resources and services** (a.k.a. **Allow Azure services and resources to access this cluster**) on. +3. **Save**. + +> ⚠️ This toggle admits traffic from **any Azure service in any customer subscription** — not just yours. The network gate becomes coarse; **identity** (Entra ID + database role) is the only remaining gate that distinguishes your workload from someone else's. Pair this toggle with managed-identity auth and tight database roles (see [security-entra-rbac](security-entra-rbac.md), [security-database-roles](security-database-roles.md)). For sensitive workloads, prefer Private Endpoint instead. + +### 3. Allow specific CIDR ranges + +For corporate egress NAT, VPN concentrators, partner ranges, or a pinned CI/CD egress IP set: + +1. Cluster → **Networking** → **Firewall and virtual networks**. +2. Add entries in CIDR form, e.g. `203.0.113.0/24`, `198.51.100.42/32`. +3. **Save** and wait ~15 minutes before considering the change applied. + +Prefer `/32` for single hosts and the narrowest CIDR your environment supports for ranges. + +### 4. Plan around the propagation window + +- **Never** stack a firewall change in front of a critical deployment without ~15 minutes of slack. +- After a change, smoke-test connectivity from a representative source IP before declaring it done. +- During the window, expect intermittent failures, not a clean cutover. + +### 5. Avoid `0.0.0.0 - 255.255.255.255` + +The portal exposes a shortcut to allow all IPs *via Azure infrastructure*. The cluster help text labels it as a wide allowance and warns it limits the effectiveness of the firewall policy. The only legitimate use is short-lived debugging in non-production — and even then prefer a tighter rule. + +### 6. Prefer Private Endpoint where you can + +If the workload runs in Azure and doesn't need public access, the right answer is to disable public network access entirely and use Private Endpoint instead. Firewall rules are best thought of as a stopgap for hybrid or partner-access scenarios. See [security-private-endpoint](security-private-endpoint.md). + +### 7. Disable public access entirely + +To fully close the public path on an existing cluster: + +1. Cluster → **Networking**. +2. **Remove every firewall rule** in the Public access section. +3. **Clear** the **Allow Azure services and resources to access this cluster** checkbox. +4. **Save**. + +With no rules and the Azure-services toggle off, the public path is effectively closed — the cluster reverts to the default locked-down posture and only Private Endpoint (if configured) provides reachability. Confirm by attempting a connection from a previously-allowed public source after the ~15-minute propagation window — it should fail. + +## Operational checklist + +| When | Do | +|---|---| +| Adding a rule | Use the narrowest CIDR possible; document why | +| After saving | Wait ~15 minutes before troubleshooting connectivity | +| Quarterly | Audit firewall rules; remove residential / ad-hoc IPs | +| Production hardening | Disable public access; rely on Private Endpoint instead | + +## References + +- [Configure firewall — Azure DocumentDB](https://learn.microsoft.com/azure/documentdb/how-to-configure-firewall) +- [Enable and manage public access — Azure DocumentDB](https://learn.microsoft.com/azure/documentdb/how-to-public-access) +- Related: [security-private-endpoint](security-private-endpoint.md), [security-entra-rbac](security-entra-rbac.md), [security-database-roles](security-database-roles.md) diff --git a/skills/security/security-private-endpoint.md b/skills/security/security-private-endpoint.md index 5859907..2d24820 100644 --- a/skills/security/security-private-endpoint.md +++ b/skills/security/security-private-endpoint.md @@ -1,38 +1,219 @@ # security-private-endpoint -**Category:** Security · **Priority:** MEDIUM +**Category:** Security · **Priority:** HIGH ## Why it matters -By default a DocumentDB cluster can accept traffic from approved public IPs via its firewall. For regulated or production workloads, exposing the data plane to the public internet is unnecessary risk. A **Private Endpoint** attaches the cluster to your VNet with a private IP, and public network access can be **disabled** entirely so that only traffic from your VNet (and peered networks) can reach it. +A **Private Endpoint** attaches an Azure DocumentDB cluster to your virtual network with a **private IP**, so client traffic never traverses the public internet. Combined with disabling public network access and tight NSG rules, this is the strongest network-isolation posture for the cluster — far stronger than IP firewall rules (see [security-firewall-rules](security-firewall-rules.md)). + +Two things to know up front so you don't fight the platform: + +- **Private Link does not prevent your cluster's FQDN from being resolved by public DNS.** The defense is at the application/connection level — clients can only reach the cluster's private IP, and only from networks that can route to that IP. The public DNS name is harmless without network reachability. +- **Private DNS integration must be enabled for the connection to resolve correctly.** The cluster's MongoDB `mongodb+srv` discovery uses SRV records, and those SRV records must resolve to private IPs from inside the VNet. + +Private Link works from: + +- The same virtual network as the private endpoint. +- **Peered virtual networks**. +- **On-premises networks** connected via VPN or ExpressRoute (private peering). + +## DNS, group ID, and subresource cheat sheet + +| Field | Value | +|---|---| +| Resource type | `Microsoft.DocumentDB/mongoClusters` | +| Group ID / target subresource | `MongoCluster` | +| Private DNS zone name | `privatelink.mongocluster.cosmos.azure.com` | +| SRV record for discovery | `_mongodb._tcp..mongocluster.cosmos.azure.com` | +| Public host (unchanged) | `.mongocluster.cosmos.azure.com` / `.global.mongocluster.cosmos.azure.com` | +| MongoDB driver port | 27017 | + +Inside the VNet, the public host resolves (via the private DNS zone) to a private IP. Outside the VNet, it resolves to the public IP — but cannot be reached unless public access is also enabled. ## Incorrect +Leaving public access on and the firewall wide-open as a "fallback" after creating a private endpoint: + ```text -Cluster: production-db -Firewall: 0.0.0.0 – 255.255.255.255 (allow all, "temporarily") -Public access: enabled +Cluster: production-db +Public access: Enabled +Firewall: 0.0.0.0 - 255.255.255.255 +Private endpoint: Created (but bypassed by the open firewall) +``` + +Creating a private endpoint **without** linking the private DNS zone to your VNet — clients will still resolve the cluster to its public IP and fail: + +```bash +az network private-endpoint create ... # OK +az network private-dns zone create ... # OK +# Missing: az network private-dns link vnet create ... +# Missing: az network private-endpoint dns-zone-group create ... +# Result: app fails with DNS / connection errors inside the VNet. +``` + +Forgetting to disable subnet network policies — the private endpoint create fails: + +```bash +# Required on the target subnet before creating the private endpoint: +az network vnet subnet update \ + --vnet-name $VNetName \ + --name $SubnetName \ + --resource-group $ResourceGroupName \ + --disable-private-endpoint-network-policies true ``` ## Correct -1. Create a Private Endpoint for the cluster in the app subnet. -2. Configure Private DNS so the DocumentDB FQDN resolves to the private IP inside the VNet. -3. In the cluster's networking blade, **disable public network access** once the app has verified connectivity. -4. Restrict any remaining firewall rules to specific management/CI IPs only. +### Full Azure CLI flow + +```bash +ResourceGroupName="myResourceGroup" +ClusterName="myMongoCluster" +SubscriptionId="" +SubResourceType="MongoCluster" # group ID +VNetName="myVnet" +SubnetName="mySubnet" +PrivateEndpointName="myPrivateEndpoint" +PrivateConnectionName="myConnection" + +# 1. VNet + subnet (or use existing). +az network vnet create \ + --name $VNetName \ + --resource-group $ResourceGroupName \ + --subnet-name $SubnetName + +# 2. Disable PE network policies on the subnet. +az network vnet subnet update \ + --name $SubnetName \ + --resource-group $ResourceGroupName \ + --vnet-name $VNetName \ + --disable-private-endpoint-network-policies true + +# 3. Create the private endpoint. +az network private-endpoint create \ + --name $PrivateEndpointName \ + --resource-group $ResourceGroupName \ + --vnet-name $VNetName \ + --subnet $SubnetName \ + --private-connection-resource-id "/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroupName/providers/Microsoft.DocumentDB/mongoClusters/$ClusterName" \ + --group-ids $SubResourceType \ + --connection-name $PrivateConnectionName + +# 4. Private DNS zone (exact name matters). +zoneName="privatelink.mongocluster.cosmos.azure.com" +az network private-dns zone create \ + --resource-group $ResourceGroupName \ + --name $zoneName + +# 5. Link the DNS zone to the VNet. +az network private-dns link vnet create \ + --resource-group $ResourceGroupName \ + --zone-name $zoneName \ + --name "${VNetName}-link" \ + --virtual-network $VNetName \ + --registration-enabled false + +# 6. Bind the zone to the private endpoint so A records auto-populate. +az network private-endpoint dns-zone-group create \ + --resource-group $ResourceGroupName \ + --endpoint-name $PrivateEndpointName \ + --name "default" \ + --private-dns-zone $zoneName \ + --zone-name mongocluster +``` + +### Lock the data plane down to private only + +Once apps have verified connectivity through the private endpoint, disable public access and reset firewall rules: + +```bash +az resource update \ + --ids "/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroupName/providers/Microsoft.DocumentDB/mongoClusters/$ClusterName" \ + --set properties.publicNetworkAccess="Disabled" +``` + +Bicep sketch: ```bicep -// sketch — verify current property names -resource pe 'Microsoft.Network/privateEndpoints@...' = { /* ... */ } -resource ddb 'Microsoft.DocumentDB/...@...' = { +resource cluster 'Microsoft.DocumentDB/mongoClusters@2025-09-01' = { + name: clusterName properties: { publicNetworkAccess: 'Disabled' - // firewallRules: [] (empty or minimal) } } ``` +### Replica clusters: connection-string detail + +On a **replica cluster**, only **self** connection strings are exposed — there is no global read-write string on the replica. Apps that need to reach a replica via Private Link must use the replica's self connection string. (See `high-availability/ha-cross-region-replica.md` for replica networking — settings do **not** inherit from the primary, so the replica needs its own private endpoint and DNS link.) + +## Verify and troubleshoot + +### Verify the endpoint + +```bash +az network private-endpoint show \ + --resource-group $ResourceGroupName \ + --name $PrivateEndpointName \ + --query '{Name:name, PrivateIpAddress:customDnsConfigs[0].ipAddresses[0], FQDN:customDnsConfigs[0].fqdn, ProvisioningState:provisioningState}' \ + --output table +``` + +Expect `ProvisioningState = Succeeded` and a private IP in your subnet range. + +Also confirm in the portal that the connection state is **Approved** under the cluster's Networking → Private endpoint connections. + +### Test SRV-based discovery from inside the VNet + +Driver discovery uses an SRV record — DNS for the public hostname alone isn't enough. + +**Windows / PowerShell:** + +```powershell +Resolve-DnsName -Name _mongodb._tcp..mongocluster.cosmos.azure.com -Type SRV +Resolve-DnsName -Name .mongocluster.cosmos.azure.com +# A-record answer should be a 10.x.x.x / private RFC1918 address. + +# Alternative: +nslookup -type=SRV _mongodb._tcp..mongocluster.cosmos.azure.com +nslookup .mongocluster.cosmos.azure.com +``` + +**Linux / macOS:** + +```bash +dig _mongodb._tcp..mongocluster.cosmos.azure.com SRV +dig .mongocluster.cosmos.azure.com +# A-record answer should be a private IP in the subnet range. + +# Alternative: +nslookup -type=SRV _mongodb._tcp..mongocluster.cosmos.azure.com +nslookup .mongocluster.cosmos.azure.com +``` + +### Common failure modes + +**DNS resolves to a public IP / fails inside the VNet** + +- Verify the private DNS zone is **linked to the VNet** (`az network private-dns link vnet list`). +- Verify the **DNS zone group** is bound to the private endpoint. +- Confirm the VNet's DNS settings are Azure-provided DNS (`168.63.129.16`) or a custom resolver that forwards to Azure DNS. +- Test from a resource actually inside the VNet (or a peered VNet) — not from the developer's laptop. + +**Connection times out** + +- Check **NSG rules** on the subnet — outbound to port **27017** must be allowed. +- Check that the private endpoint's NIC has the expected private IP. +- Verify the connection string uses `mongodb+srv` scheme. +- Confirm cluster-side firewall rules aren't blocking the source (only relevant if public access is still enabled). + +**Private DNS zone exists but no records appear** + +- The zone name must be **exactly** `privatelink.mongocluster.cosmos.azure.com`. +- The DNS zone group on the endpoint must be created (`az network private-endpoint dns-zone-group create …`) — that's the binding that populates A records. + ## References -- [Configure firewall rules](https://learn.microsoft.com/azure/documentdb/how-to-configure-firewall) -- [Security guide](https://learn.microsoft.com/azure/documentdb/security) +- [Use Azure Private Link with Azure DocumentDB](https://learn.microsoft.com/azure/documentdb/how-to-private-link) +- [What is Azure Private Link?](https://learn.microsoft.com/azure/private-link/private-endpoint-overview) +- Related: [security-firewall-rules](security-firewall-rules.md), [security-entra-rbac](security-entra-rbac.md), [high-availability/ha-cross-region-replica](../high-availability/ha-cross-region-replica.md) diff --git a/skills/security/security-token-lifetime-revocation.md b/skills/security/security-token-lifetime-revocation.md new file mode 100644 index 0000000..6ff6e4c --- /dev/null +++ b/skills/security/security-token-lifetime-revocation.md @@ -0,0 +1,91 @@ +# security-token-lifetime-revocation + +**Category:** Security · **Priority:** HIGH + +## Why it matters + +When you authenticate to Azure DocumentDB with Microsoft Entra ID, the MongoDB driver presents an **OIDC access token** issued by Entra. That token has a finite lifetime — typically **up to ~90 minutes from issuance** — and remains **cryptographically valid** (signature checks pass, `exp` claim not yet reached) for that full window even if: + +- The Entra principal is **disabled** or **deleted** in the tenant. +- The associated **refresh token is revoked**. + +**Cryptographic validity ≠ authorization.** Two independent gates have to clear for an operation to succeed: + +1. **Token validity** (Entra side) — the JWT signature verifies and isn't expired. Entra-side actions above don't change this until expiry. +2. **Cluster authorization** (DocumentDB side) — the principal in the token must still be registered as a user on the cluster (`mongoClusters/users/`). + +So **deleting the cluster user resource immediately revokes authorization** even if the token is still cryptographically valid — that's the operative control during incident response. Conversely, leaving the user resource in place but only revoking the principal in Entra leaves an attacker with up to ~90 minutes of usable token. **The access-token lifetime is the maximum attack window if you rely only on Entra-side revocation.** + +## Incorrect + +Assuming that removing a principal from Entra immediately ends all sessions: + +```bash +# This stops *new* tokens from being issued — it does NOT invalidate tokens +# already in flight, which can stay valid for up to ~90 minutes. +az ad user delete --id "" +``` + +Treating "user deleted in Entra" as the only step in incident response: + +```bash +# Missing the second half — the cluster user resource is still present, +# and any currently-valid token will still authenticate. +az ad sp delete --id "" +``` + +## Correct + +### Immediate revocation: a two-step pattern + +To shrink the attack window as far as possible, do **both** of these as fast as possible: + +1. **Revoke the principal's sign-in / refresh tokens in Entra** so no new access tokens can be issued — follow the [Microsoft Entra revoke-access guidance](https://learn.microsoft.com/entra/identity/users/users-revoke-access). +2. **Delete the cluster user resource** so the principal is no longer authorized as a DocumentDB user even if a still-valid access token is presented: + + ```bash + az resource delete \ + --resource-group "" \ + --name "/users/" \ + --resource-type "Microsoft.DocumentDB/mongoClusters/users" \ + --latest-include-preview + ``` + + Or via Bicep — deploy a template that omits the user resource (or use `existing` + `Microsoft.Authorization/locks` patterns for change control). + +3. **Drop any non-admin entries from the mongo shell** (those aren't represented as Azure resources): + + ```javascript + db.runCommand({ dropUser: "" }); + ``` + +After step 2, the user record is gone from the cluster metadata, so the principal is no longer a recognized DocumentDB user — even a still-valid Entra access token will fail authorization on subsequent operations. + +### Treat connection-string actions as secret-grade + +`Microsoft.DocumentDB/mongoClusters/listConnectionStrings/action` returns the administrator credentials for the native-auth admin user. Grant this action only to identities that absolutely need it, and audit its use. See [security-azure-rbac-actions](security-azure-rbac-actions.md). + +### Prefer managed identities over service-principal secrets + +Managed identities don't have client secrets that can leak. Use **system-assigned managed identity** when only one workload needs the identity; use **user-assigned managed identity** when several workloads share it. + +### Limit token attack window where you can + +- Use **Conditional Access policies** in Entra to require MFA / compliant device / corporate network on token issuance, raising the bar for an attacker even before revocation. +- Rotate workload identities periodically as defense-in-depth, even though they don't have static secrets. + +## Operational checklist for a compromised principal + +| Step | Where | Effect | +|---|---|---| +| 1. Revoke refresh tokens / disable account | Entra ID | No new access tokens issued | +| 2. Delete `mongoClusters/users/` resource | Azure RBAC | DocumentDB stops recognizing principal as a user | +| 3. `dropUser` non-admin entries via mongo shell | Database | Removes any shell-managed user records | +| 4. Rotate the cluster's native admin password if also exposed | Cluster | Closes the SCRAM/native fallback | +| 5. Audit `usersInfo` and Azure activity logs | Both | Confirm no residual access | + +## References + +- [Connect using role-based access control and Microsoft Entra ID — Access Token Validity](https://learn.microsoft.com/azure/documentdb/how-to-connect-role-based-access-control) +- [Revoke user access in Microsoft Entra ID](https://learn.microsoft.com/entra/identity/users/users-revoke-access) +- Related: [security-entra-rbac](security-entra-rbac.md), [security-database-roles](security-database-roles.md), [security-azure-rbac-actions](security-azure-rbac-actions.md)