Skip to content

Fix reused-connection hangs: HTTP/2 GOAWAY and HTTP/1.1 active-once stranding#886

Merged
benoitc merged 4 commits into
masterfrom
fix/h2-goaway-pooled-reuse
Jun 17, 2026
Merged

Fix reused-connection hangs: HTTP/2 GOAWAY and HTTP/1.1 active-once stranding#886
benoitc merged 4 commits into
masterfrom
fix/h2-goaway-pooled-reuse

Conversation

@benoitc

@benoitc benoitc commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Two fixes for intermittent hangs on reused/pooled connections, both validated against a server behind an AWS ALB.

HTTP/2: don't reuse a connection after GOAWAY

A server that recycles a connection by sending GOAWAY and keeping the socket open (as AWS ALB does) made h2_on_goaway/2 abort in-flight streams but keep the connection connected and pooled, so checkout_h2/h2_conn_usable kept handing it out and every reused request opened a stream past last_stream_id that the peer ignored, parking the read until recv_timeout.

Fix: treat GOAWAY like a graceful close (abort in-flight, close the h2_connection, set no_reuse, transition to closed, mirroring h2_on_closed/2). The pool stops reusing it and new requests dial fresh; the conn self-cleans via the existing closed-grace and pool DOWN removal. A request racing a GOAWAY in-flight gets a clean retryable {error, goaway} instead of a hang.

HTTP/1.1: consume bytes stranded by active-once, don't drop the connection

Idle pooled sockets are put in {active, once} for #544 close detection, which delivers the next response's first bytes to the connection mailbox as a {tcp/ssl,Socket,Data} message and reverts the socket to passive. The code treated those bytes as a broken connection (close/drop) and recv_data only did a passive socket read, so on a reused connection the bytes were stranded and the status read blocked to recv_timeout with no body. Reproduces even with {pool, false}.

Fix: consume the bytes instead of discarding them. Buffer them in connected(info,...) and re-arm close detection; drain the mailbox into the read buffer at the active->passive transition and in recv_data; preserve the read buffer across request setup; is_ready keeps such a connection reusable. A server close still refuses reuse (#544); the fresh-connection path is unchanged.

Tests

hackney_http2_goaway_tests (GOAWAY-and-keep-open server; reuse dials fresh, no recv_timeout), hackney_h1_reuse_tests (50 reused requests stay prompt with intact bodies; bytes stranded by active-once are consumed), plus the frame-level repro harnesses. Full eunit (967) and dialyzer pass.

benoitc added 2 commits June 17, 2026 15:11
A server that recycles a connection by sending GOAWAY and keeping the socket
open (as AWS ALB does) left hackney looping: h2_on_goaway aborted in-flight
streams but kept the connection `connected` and pooled, so checkout_h2 kept
handing it out and every reused request opened a stream past last_stream_id
that the peer ignored, hanging to recv_timeout with no body.

Treat GOAWAY like a graceful close: abort in-flight, close the h2_connection,
mark no_reuse, and transition to `closed`. The pool then stops reusing it
(h2_conn_usable requires `connected`) and new requests dial a fresh connection;
the conn self-cleans via the existing closed-grace and pool DOWN removal. A
request that races the GOAWAY in-flight now gets a clean retryable
{error, goaway} instead of a recv_timeout hang.

Adds a self-contained regression test (a minimal frame-level h2 server that
GOAWAYs every few responses and stays open) plus the frame-level repro harness
used to localize the bug.
Idle pooled sockets are put in {active, once} for #544 close detection. That
delivers the next response's first bytes to the connection mailbox as a
{tcp/ssl,Socket,Data} message and reverts the socket to passive. The code
treated those bytes as a broken connection (close/drop) and recv_data only did
a passive socket read, so on a reused connection the bytes were stranded and
the status read blocked to recv_timeout returning nothing.

Consume the stranded bytes instead of discarding them: buffer them in
connected(info,...) and re-arm close detection, drain the mailbox into the read
buffer at the active->passive transition (sending/streaming_body enter) and in
recv_data, preserve the read buffer across request setup, and have is_ready
keep such a connection reusable. A server close still refuses reuse (#544);
the fresh-connection path is unchanged.

Adds hackney_h1_reuse_tests (50 reused requests stay prompt with intact bodies;
bytes stranded by active-once are consumed) and reworks the idle-data pool test
for the consume-not-drop behavior.
@benoitc benoitc changed the title Don't reuse an HTTP/2 connection after GOAWAY Fix reused-connection hangs: HTTP/2 GOAWAY and HTTP/1.1 active-once stranding Jun 17, 2026
benoitc added 2 commits June 17, 2026 17:36
…buffer

- HTTP/2 recv_timeout watchdog: RST_STREAM(CANCEL) the stalled stream via
  h2_connection:cancel_stream/2 before dropping it, so the peer stops sending
  and the h2 layer retires the stream. Otherwise the pooled connection (only
  gated on get_state == connected) could be reused with an orphaned stalled
  stream still open.
- HTTP/1.1 idle buffering: cap bytes buffered from an idle connection
  (#544 {active, once}) at 64 KB. A peer flooding an idle connection is now
  dropped rather than buffered unboundedly.
@benoitc benoitc merged commit 1fe387f into master Jun 17, 2026
9 of 12 checks passed
@benoitc benoitc deleted the fix/h2-goaway-pooled-reuse branch June 17, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant