Skip to content

beholder: log partial delivery per-event errors with error_code and reason#2210

Draft
pkcll wants to merge 1 commit into
mainfrom
chipingress-partial-delivery-cl-nodes
Draft

beholder: log partial delivery per-event errors with error_code and reason#2210
pkcll wants to merge 1 commit into
mainfrom
chipingress-partial-delivery-cl-nodes

Conversation

@pkcll

@pkcll pkcll commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Type-assert the send callback error to *batch.PublishError so partial delivery failures (individual events rejected server-side while the batch RPC itself succeeded) are logged with structured error_code and reason fields instead of a generic error string.
  • Adds failure_type attribute (partial_delivery vs rpc_error) to the chip_ingress.events_dropped metric to let dashboards distinguish the two failure modes.
  • Whole-batch RPC failures keep the existing log message unchanged.

Jira: INFOPLAT-13901 | Epic: INFOPLAT-7177

Must merge before the chainlink PR that bumps this dep.

Changes

  • pkg/beholder/batch_emitter_service.go: type-assert *batch.PublishError; structured logging; dropMetricAttrsFor helper with failure_type attribute.
  • pkg/beholder/batch_emitter_service_test.go: TestChipIngressBatchEmitterService_PartialDeliveryError — covers log fields (error_code, reason, domain, entity) and metric attribute (failure_type=partial_delivery).

Test plan

  • go test ./pkg/beholder/... -count=1 passes (all existing + new tests)
  • Partial delivery log entry includes error_code and reason fields
  • chip_ingress.events_dropped{failure_type="partial_delivery"} increments for per-event errors
  • chip_ingress.events_dropped{failure_type="rpc_error"} increments for whole-batch failures

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

📊 API Diff Results

No changes detected for module github.com/smartcontractkit/chainlink-common

View full report

@pkcll pkcll force-pushed the chipingress-partial-delivery-cl-nodes branch 2 times, most recently from bf62ff1 to bcb946f Compare July 2, 2026 22:07
…code label

- Type-assert send callback error to *batch.PublishError; log at WARN
  with error_code, reason, domain, entity for per-event server rejections.
- Add error_code label to chip_ingress.events_dropped metric so dashboards
  and alerts can distinguish rejection types without relying on logs.
- Whole-batch RPC failures keep the existing ERROR log; their drop metric
  omits error_code (no structured server code available).
- Also adds failure_type label (partial_delivery / rpc_error) to the
  dropped counter to distinguish the two failure modes.

Jira: INFOPLAT-13901
@pkcll pkcll force-pushed the chipingress-partial-delivery-cl-nodes branch from bcb946f to 59011ac Compare July 2, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant