Skip to content

netty: Allow network errors to override graceful shutdown status#12822

Open
themechbro wants to merge 2 commits into
grpc:masterfrom
themechbro:fix/shutdown-error-masking
Open

netty: Allow network errors to override graceful shutdown status#12822
themechbro wants to merge 2 commits into
grpc:masterfrom
themechbro:fix/shutdown-error-masking

Conversation

@themechbro
Copy link
Copy Markdown
Contributor

Fixes #12812

Motivation

Currently, ClientTransportLifecycleManager acts as a strict latch for the first shutdown status it receives. If a user initiates a graceful shutdown (shutdownStatus with no cause) and the channel subsequently experiences a hard network drop (channelInactive firing a ClosedChannelException), the transport masks the physical network error and propagates the benign graceful status to active streams. This blinds telemetry to actual infrastructure drops during shutdown windows.

Modifications

  • Updated ClientTransportLifecycleManager#notifyShutdown to introduce a status upgrade mechanism. If the existing shutdownStatus is a graceful intent (has no Throwable cause) and the incoming Status represents a hard error (has a Throwable cause), the manager now overwrites the cached status.
  • Added networkErrorOverridesGracefulShutdownStatus to NettyClientTransportTest which perfectly simulates the reproducer by firing a graceful shutdown followed by fireChannelInactive(), asserting that the ClosedChannelException is properly propagated to the transport listener.

Result

Active streams that are forcefully interrupted during a graceful shutdown window will now correctly fail with UNAVAILABLE: channel closed (with the underlying Netty exception) rather than UNAVAILABLE: Channel shutdown invoked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

channel shutdown error obtrudes on network errors

1 participant