Fix reused-connection hangs: HTTP/2 GOAWAY and HTTP/1.1 active-once stranding#886
Merged
Conversation
A server that recycles a connection by sending GOAWAY and keeping the socket
open (as AWS ALB does) left hackney looping: h2_on_goaway aborted in-flight
streams but kept the connection `connected` and pooled, so checkout_h2 kept
handing it out and every reused request opened a stream past last_stream_id
that the peer ignored, hanging to recv_timeout with no body.
Treat GOAWAY like a graceful close: abort in-flight, close the h2_connection,
mark no_reuse, and transition to `closed`. The pool then stops reusing it
(h2_conn_usable requires `connected`) and new requests dial a fresh connection;
the conn self-cleans via the existing closed-grace and pool DOWN removal. A
request that races the GOAWAY in-flight now gets a clean retryable
{error, goaway} instead of a recv_timeout hang.
Adds a self-contained regression test (a minimal frame-level h2 server that
GOAWAYs every few responses and stays open) plus the frame-level repro harness
used to localize the bug.
Idle pooled sockets are put in {active, once} for #544 close detection. That
delivers the next response's first bytes to the connection mailbox as a
{tcp/ssl,Socket,Data} message and reverts the socket to passive. The code
treated those bytes as a broken connection (close/drop) and recv_data only did
a passive socket read, so on a reused connection the bytes were stranded and
the status read blocked to recv_timeout returning nothing.
Consume the stranded bytes instead of discarding them: buffer them in
connected(info,...) and re-arm close detection, drain the mailbox into the read
buffer at the active->passive transition (sending/streaming_body enter) and in
recv_data, preserve the read buffer across request setup, and have is_ready
keep such a connection reusable. A server close still refuses reuse (#544);
the fresh-connection path is unchanged.
Adds hackney_h1_reuse_tests (50 reused requests stay prompt with intact bodies;
bytes stranded by active-once are consumed) and reworks the idle-data pool test
for the consume-not-drop behavior.
…buffer - HTTP/2 recv_timeout watchdog: RST_STREAM(CANCEL) the stalled stream via h2_connection:cancel_stream/2 before dropping it, so the peer stops sending and the h2 layer retires the stream. Otherwise the pooled connection (only gated on get_state == connected) could be reused with an orphaned stalled stream still open. - HTTP/1.1 idle buffering: cap bytes buffered from an idle connection (#544 {active, once}) at 64 KB. A peer flooding an idle connection is now dropped rather than buffered unboundedly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two fixes for intermittent hangs on reused/pooled connections, both validated against a server behind an AWS ALB.
HTTP/2: don't reuse a connection after GOAWAY
A server that recycles a connection by sending
GOAWAYand keeping the socket open (as AWS ALB does) madeh2_on_goaway/2abort in-flight streams but keep the connectionconnectedand pooled, socheckout_h2/h2_conn_usablekept handing it out and every reused request opened a stream pastlast_stream_idthat the peer ignored, parking the read until recv_timeout.Fix: treat GOAWAY like a graceful close (abort in-flight, close the
h2_connection, setno_reuse, transition toclosed, mirroringh2_on_closed/2). The pool stops reusing it and new requests dial fresh; the conn self-cleans via the existing closed-grace and poolDOWNremoval. A request racing a GOAWAY in-flight gets a clean retryable{error, goaway}instead of a hang.HTTP/1.1: consume bytes stranded by active-once, don't drop the connection
Idle pooled sockets are put in
{active, once}for #544 close detection, which delivers the next response's first bytes to the connection mailbox as a{tcp/ssl,Socket,Data}message and reverts the socket to passive. The code treated those bytes as a broken connection (close/drop) andrecv_dataonly did a passive socket read, so on a reused connection the bytes were stranded and the status read blocked to recv_timeout with no body. Reproduces even with{pool, false}.Fix: consume the bytes instead of discarding them. Buffer them in
connected(info,...)and re-arm close detection; drain the mailbox into the read buffer at the active->passive transition and inrecv_data; preserve the read buffer across request setup;is_readykeeps such a connection reusable. A server close still refuses reuse (#544); the fresh-connection path is unchanged.Tests
hackney_http2_goaway_tests(GOAWAY-and-keep-open server; reuse dials fresh, no recv_timeout),hackney_h1_reuse_tests(50 reused requests stay prompt with intact bodies; bytes stranded by active-once are consumed), plus the frame-level repro harnesses. Full eunit (967) and dialyzer pass.