streamable_http: early response.aclose() poisons keepalive connection, causes ~260ms latency on every subsequent tool call

## Summary

In `mcp.client.streamable_http.StreamableHTTPTransport._handle_sse_response`, the client calls `await response.aclose()` immediately after receiving the first JSON-RPC response event. This early close leaves the underlying HTTP/1.1 keepalive connection in a state where the **next** request reusing the same connection blocks for ~260 ms before the server's response status arrives.

The result is that every `session.call_tool(...)` (and `send_ping`, `list_tools`, ...) over `streamable_http` pays a fixed ~260 ms penalty when calls are serial on a single connection.

Removing the early `aclose()` and draining the SSE stream to EOF eliminates the penalty entirely (**37× speedup**: 265 ms → 7 ms per call in my setup).

## Environment

- `mcp == 1.27.1`
- Python 3.12.8, Windows 11
- Server: `mcp.server.streamable_http` (also 1.27.1), localhost, SSE response mode
- Transport: streamable HTTP, single long-lived client session, sequential requests

## Symptom (numbers)

Same `tools/call`, same server, same `httpx.AsyncClient`, all on localhost:

| Path | Avg latency |
|---|---|
| Raw `httpx.AsyncClient.stream("POST", ...)` + `aiter_bytes()` to EOF | **~5 ms** |
| `ClientSession.call_tool(...)` (current code) | **~265 ms** |
| `ClientSession.call_tool(...)` (with `aclose()` removed) | **~7 ms** |

Status code arrival timing (measured with raw httpx on the same client/headers):
- Status: 1.5 ms
- First chunk: 4.7 ms
- EOF: 5 ms

So the server replies in single-digit ms. The 260 ms appears only after MCP's early `aclose()` on the previous request.

## Reproducer

Assuming any reachable streamable_http MCP server with one cheap read-only tool. Replace `URL` and `TOOL_NAME`:

```python
import asyncio, time, httpx
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

URL = "http://localhost:PORT/mcp"
TOOL_NAME = "your_cheap_tool"
TOOL_ARGS = {}

async def main():
    async with streamablehttp_client(URL) as (r, w, _):
        async with ClientSession(r, w) as s:
            await s.initialize()
            # warm up
            for _ in range(2):
                await s.call_tool(TOOL_NAME, TOOL_ARGS)
            # measure
            times = []
            for _ in range(10):
                t0 = time.perf_counter()
                await s.call_tool(TOOL_NAME, TOOL_ARGS)
                times.append((time.perf_counter() - t0) * 1000)
            print(f"avg = {sum(times)/len(times):.2f} ms")

asyncio.run(main())
```

On my setup this prints `avg = 267.40 ms`. After the patch below it prints `avg = 7.28 ms`.

## Root cause

In [`src/mcp/client/streamable_http.py`](https://github.com/modelcontextprotocol/python-sdk/blob/main/src/mcp/client/streamable_http.py), `_handle_sse_response`:

```python
async def _handle_sse_response(self, response, ctx, is_initialization=False):
    ...
    async for sse in event_source.aiter_sse():
        ...
        is_complete = await self._handle_sse_event(...)
        if is_complete:
            await response.aclose()   # <-- offending line
            return
```

After the response event is received, the SSE stream is force-closed before reaching EOF. The connection is then returned to the keepalive pool in a "not fully drained" state. The next POST attempting to reuse this connection blocks for ~260 ms before status arrives (likely a server-side SSE idle/reconnect window — `sse_starlette.EventSourceResponse` keeps the writer task alive after sending its only event).

Confirming evidence (instrumented timings across many runs):
- Time inside `_handle_post_request` from entry to first `_handle_sse_event` call: **266 ms** (always)
- Bare `client.stream(POST)` issued on the *same* `httpx.AsyncClient` and *same* event loop, *outside* the MCP call path: **5 ms**
- Bare `client.stream(POST) + aiter_bytes() to EOF` issued *inside* MCP's `post_writer` subtask, immediately followed by the original `_handle_post_request`: bare = 266 ms, orig = 5 ms (the next call on the same connection is fast because the previous one drained to EOF)

So the penalty is paid on the request *following* every early-aclose, not on the request that did the aclose.

## Proposed fix

Drain the SSE stream to EOF instead of aborting early:

```diff
@@ async def _handle_sse_response(self, response, ctx, is_initialization=False):
-        try:
-            event_source = EventSource(response)
-            async for sse in event_source.aiter_sse():
-                ...
-                is_complete = await self._handle_sse_event(...)
-                if is_complete:
-                    await response.aclose()
-                    return
-        except Exception as e:
-            logger.debug(f"SSE stream ended: {e}")
+        try:
+            event_source = EventSource(response)
+            async for sse in event_source.aiter_sse():
+                ...
+                await self._handle_sse_event(...)
+        except Exception as e:
+            logger.debug(f"SSE stream ended: {e}")
```

(The `last_event_id` / reconnect bookkeeping below is unaffected: we still observe every event and the loop now exits naturally on EOF.)

## Caveat

This relies on the server closing the SSE stream after sending the response (which `sse_starlette.EventSourceResponse` does once `sse_writer` exits via `break` on JSONRPCResponse — see `mcp/server/streamable_http.py`). If a server is configured to keep the stream open for multi-message responses, draining will wait for those too — which is the desired behavior. A `request_read_timeout_seconds`-aware variant could be added if needed.

Happy to send a PR.

Path	Avg latency
Raw `httpx.AsyncClient.stream("POST", ...)` + `aiter_bytes()` to EOF	~5 ms
`ClientSession.call_tool(...)` (current code)	~265 ms
`ClientSession.call_tool(...)` (with `aclose()` removed)	~7 ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamable_http: early response.aclose() poisons keepalive connection, causes ~260ms latency on every subsequent tool call #2707

Summary

Environment

Symptom (numbers)

Reproducer

Root cause

Proposed fix

Caveat

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

streamable_http: early response.aclose() poisons keepalive connection, causes ~260ms latency on every subsequent tool call #2707

Description

Summary

Environment

Symptom (numbers)

Reproducer

Root cause

Proposed fix

Caveat

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions