Skip to content

Audit-event recording: align CRUD actions to one pattern #5246

Description

@philippthun

This document was generated with AI assistance and may not be 100% accurate. It is intended as a starting point for discussion: whether the call sites that drift from the dominant pattern should be aligned, and whether the resources currently without audit events should gain them.

CRUD actions in cloud_controller_ng record audit events with inconsistent ordering, transaction scope, or not at all. Today, a failure between mutate and record can leave a row mutated with no audit trail. The dominant pattern in app/actions/ already addresses this:

  1. Open a db.transaction do ... end block.
  2. Mutate the model (create / save / destroy).
  3. Record the audit event inside the same transaction, after the mutation.
  4. Run any post-commit side effects (notify Diego, broker calls, blobstore writes, ...) outside the transaction.

This issue catalogs every action-layer (and selected job/lib) audit-event call site that drifts from that pattern, plus actions with no audit event at all. V2 paths (app/controllers/runtime/* and app/actions/v2/*) are out of scope - they are being retired.

Inside transaction, mutate → record (dominant - reference pattern)

26 compliant call sites (click to expand)
Resource Action Reference
App create AppCreate#create
App update AppUpdate#update
Build create BuildCreate#create
Buildpack create BuildpackCreate#create
Buildpack update BuildpackUpdate#update
Deployment create DeploymentCreate#create - records via private record_audit_event in two distinct code paths (stopped-app early-exit and the regular path); both inside the transaction
Droplet copy DropletCopy#copy
Organization update OrganizationUpdate#update
Organization delete OrganizationDelete#delete
OrganizationQuota create OrganizationQuotasCreate#create
OrganizationQuota update OrganizationQuotasUpdate#update
Package create PackageCreate#create (via private record_audit_event)
Process create ProcessCreate#create
Process update ProcessUpdate#update
Revision create RevisionCreate#create (via private record_audit_event)
RoutePolicy create RoutePolicyCreate#create
RoutePolicy update RoutePolicyUpdate#update
RoutePolicy delete RoutePolicyDestroy#delete
ServiceBroker create V3::ServiceBrokerCreate#create
Space create SpaceCreate#create
Space update SpaceUpdate#update and SpaceUpdateIsolationSegment#update
Space delete SpaceDelete#delete
SpaceQuota create SpaceQuotasCreate#create
SpaceQuota update SpaceQuotaUpdate#update
Task create TaskCreate#create

Inside transaction, record → mutate

Audit event written before the mutation. The Sequel transaction rolls back both on failure, so observable behaviour today is equivalent to mutate → record. Re-ordering protects against regressions if either step grows side effects in a future refactor.

Resource Action Reference
App delete AppDelete#delete
Buildpack delete BuildpackDelete#delete
OrganizationQuota delete OrganizationQuotaDelete#delete
Process delete ProcessDelete#delete
Role delete RoleDelete#delete (via private record_event)
Route delete (underlying helper) VCAP::CloudController::RouteDelete#do_delete and #delete_unmapped_route in app/actions/routing/route_delete.rb - records record_route_delete_request before the deletion job runs
SpaceQuota delete SpaceQuotaDelete#delete
Stack delete StackDelete#delete

Outside transaction, mutate → record

A failure between the mutation and Event.create leaves the row mutated with no audit trail. Fix by wrapping the mutate-and-record pair in a db.transaction do ... end block (keep any external side effects - broker HTTP calls, blobstore writes, Diego notifies - outside). The nine groups below expand to ~17 individual call sites once DropletCreate (3), RoleCreate (8) and the two route-delete actions are counted.

Resource Action Reference
Droplet create DropletCreate#create, DropletCreate#create_docker_droplet, DropletCreate#find_or_create_buildpack_droplet - three call sites, all record after the transaction closes
Droplet delete DropletDelete#delete
Organization create OrganizationCreate#create
Package delete PackageDelete#delete
Role create RoleCreate - eight private create_* methods each call record_*_role_add then create the row, with no surrounding transaction
Route create RouteCreate#create
Route delete RouteDeleteAction#delete (V3 action in app/actions/route_delete.rb)
Stack create StackCreate#create
Task delete TaskDelete#delete - records record_task_cancel; cancellation is the only way to delete a task in V3

Special case: async + external side effects

The following actions record audit events outside any DB transaction. Unlike the entries above, a naive transaction wrap is not safe because at least one external side effect - a service-broker HTTP call, a Diego notification, or both - runs between the row write and the audit. Wrapping the side effect inside a DB transaction is incorrect: long-held transactions block other writers, and rolling back can't undo work already performed externally.

Resource Action Reference
ServiceInstance create / update / delete V3::ServiceInstanceCreateManaged, V3::ServiceInstanceUpdateManaged, V3::ServiceInstanceDelete - record_service_instance_event is called outside the transaction(s) that wrap the broker-state writes; broker HTTP calls in between
ServiceBinding (V3) create V3::ServiceBindingCreate#complete_binding_and_save (called from ServiceCredentialBindingAppCreate, ServiceCredentialBindingKeyCreate, ServiceRouteBindingCreate). The row write at save_with_attributes_and_new_operation happens after the broker call has returned, but post_bind_action (e.g. notify_diego for route bindings) sits between the row write and record_create
ServiceCredentialBinding delete ServiceCredentialBindingDelete (extends V3::ServiceBindingDelete) - record_delete and record_start_delete outside any transaction; the broker unbind call happens before perform_delete_actions runs
ServiceRouteBinding delete ServiceRouteBindingDelete#perform_delete_actions - record_delete before binding.destroy, then binding.notify_diego. Diego notification is an external side effect between the row write and post-record state, so the same constraint applies

This family needs its own design pass - a small transaction around the post-side-effect state write + audit, with the side effect clearly outside. Out of scope for the rest of this issue.

Audit events recorded outside app/actions/

Audit calls in jobs and lib/. These bypass the action-layer pattern entirely; listed for completeness, no action proposed.

Where Reference Notes
app/jobs/v3/buildpack_bits.rb record_buildpack_upload Async upload follow-up
lib/cloud_controller/diego/staging_completion_handler.rb record_build_failed, record_build_staged Worker-driven build state transitions

No audit event at all

Actions that mutate persistent state but never call any record_* method. Split by recommended response.

Add an audit event

Resources with no event repository today.

Resource Actions Notes
Domain DomainCreate, DomainUpdate, DomainDelete No domain_event_repository.rb exists. Every other top-level CC resource is audited. Shared domains are platform-wide (no owning org); private domains belong to an org. A new repository would need to decide which space/org an event row belongs to for each flavour, and which roles can read it
SecurityGroup SecurityGroupCreate, SecurityGroupUpdate, SecurityGroupDelete No event repository. Security group mutations directly affect egress networking
IsolationSegment IsolationSegmentCreate, IsolationSegmentUpdate, IsolationSegmentDelete Admin-only, infrequent; candidate for the same reason as Domain

Candidate for existing event repository

Resource Actions Notes
Sidecar SidecarCreate, SidecarUpdate, SidecarDelete Sidecars are subordinate to apps. AppEventRepository already records subordinate events like record_map_route, record_unmap_route, record_app_map_droplet; sidecar events would fit the same pattern
RouteMapping (destination) RouteDestinationUpdate#update Updates the protocol field on a route destination. AppEventRepository already records record_map_route / record_unmap_route for the same association - destination protocol updates would fit alongside
Organization (default isolation segment) SetDefaultIsolationSegment#set Mutates Organization.default_isolation_segment_guid. OrganizationEventRepository.record_organization_update already covers org updates; this could route through that
ServicePlanVisibility V3::ServicePlanVisibilityUpdate, ServicePlanVisibilityDelete Service-plan visibility changes are auditable via ServiceEventRepository.record_service_plan_visibility_event and similar (already used for the legacy controller path). Wire V3 actions through that repo

Needs design discussion

Resource Actions Notes
User UserCreate, UserUpdate, UserDelete user_event_repository.rb exists but only records role add/remove. UserCreate is dual-purpose: it can shadow-sync a UAA user OR register a guid-only stub during permission lookups. The latter is high-frequency and auditing it would flood the events table. Needs a "what counts as user-initiated" rule before adding
ServiceOffering / ServicePlan ServiceOfferingDelete, ServicePlanDelete Admin-initiated catalog changes. Design alongside any future broader catalog-audit work

Intentionally unaudited - leave as-is

Internal state-machine transitions driven by workers, or pure value updates with no security relevance.

Resource Actions Notes
AppFeature AppFeatureUpdate Feature-flag toggles; covered by app update audit
Build BuildUpdate, BuildDelete Build state transitions (STAGING → STAGED / FAILED) are recorded by the worker via record_build_staged / record_build_failed in staging_completion_handler.rb. Build create is audited (user-initiated)
Deployment DeploymentUpdate, DeploymentDelete Worker-driven state transitions. Deployment create is audited
Droplet DropletUpdate Worker-driven state transition. Droplet create and delete are audited
EnvironmentVariableGroup EnvironmentVariableGroupUpdate Affects new app pushes only; no per-app audit value
FeatureFlag FeatureFlagUpdate Global toggle; admin-only and rarely changed
Revision RevisionDelete Revisions are immutable artefacts; deletion is cleanup
Route RouteUpdate V3 path updates route.options (TCP/HTTP routing config) and metadata. Map/unmap surfaces are audited separately via record_map_route / record_unmap_route. Auditing RouteUpdate itself would be a small gain - promote to "high priority" only if the options-change story matters to operators

Suggested next steps

Independent pieces of work; each is its own PR.

  1. Reorder record → mutate in the eight entries listed under "Inside transaction, record → mutate". Pure cleanup, no behaviour change. Specs that assert call order via receive(...).ordered will need adjustment.
  2. Wrap mutate + record in a transaction for the nine groups under "Outside transaction, mutate → record" (~17 call sites; not the "Special case: async + external side effects" section). Keep all external side effects (notify_diego, broker calls, blobstore writes) outside the transaction. A spec asserting the audit row is absent when the model save raises would catch regressions.
  3. Scope out missing audit events - open separate issues per resource:
    • 3a. Sidecar (smallest; likely just extends AppEventRepository)
    • 3b. Domain (needs new event repository and a decision on which space/org an event belongs to, given shared vs. private domains have different ownership)
    • 3c. SecurityGroup (needs new event repository)
    • 3d. IsolationSegment (needs new event repository)
    • 3e. ServicePlanVisibility (route through existing ServiceEventRepository)
    • 3f. RouteDestinationUpdate, SetDefaultIsolationSegment (route through existing parent repos)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions