Summary
In mcp.client.streamable_http.StreamableHTTPTransport._handle_sse_response, the client calls await response.aclose() immediately after receiving the first JSON-RPC response event. This early close leaves the underlying HTTP/1.1 keepalive connection in a state where the next request reusing the same connection blocks for ~260 ms before the server's response status arrives.
The result is that every session.call_tool(...) (and send_ping, list_tools, ...) over streamable_http pays a fixed ~260 ms penalty when calls are serial on a single connection.
Removing the early aclose() and draining the SSE stream to EOF eliminates the penalty entirely (37× speedup: 265 ms → 7 ms per call in my setup).
Environment
mcp == 1.27.1
- Python 3.12.8, Windows 11
- Server:
mcp.server.streamable_http (also 1.27.1), localhost, SSE response mode
- Transport: streamable HTTP, single long-lived client session, sequential requests
Symptom (numbers)
Same tools/call, same server, same httpx.AsyncClient, all on localhost:
| Path |
Avg latency |
Raw httpx.AsyncClient.stream("POST", ...) + aiter_bytes() to EOF |
~5 ms |
ClientSession.call_tool(...) (current code) |
~265 ms |
ClientSession.call_tool(...) (with aclose() removed) |
~7 ms |
Status code arrival timing (measured with raw httpx on the same client/headers):
- Status: 1.5 ms
- First chunk: 4.7 ms
- EOF: 5 ms
So the server replies in single-digit ms. The 260 ms appears only after MCP's early aclose() on the previous request.
Reproducer
Assuming any reachable streamable_http MCP server with one cheap read-only tool. Replace URL and TOOL_NAME:
import asyncio, time, httpx
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
URL = "http://localhost:PORT/mcp"
TOOL_NAME = "your_cheap_tool"
TOOL_ARGS = {}
async def main():
async with streamablehttp_client(URL) as (r, w, _):
async with ClientSession(r, w) as s:
await s.initialize()
# warm up
for _ in range(2):
await s.call_tool(TOOL_NAME, TOOL_ARGS)
# measure
times = []
for _ in range(10):
t0 = time.perf_counter()
await s.call_tool(TOOL_NAME, TOOL_ARGS)
times.append((time.perf_counter() - t0) * 1000)
print(f"avg = {sum(times)/len(times):.2f} ms")
asyncio.run(main())
On my setup this prints avg = 267.40 ms. After the patch below it prints avg = 7.28 ms.
Root cause
In src/mcp/client/streamable_http.py, _handle_sse_response:
async def _handle_sse_response(self, response, ctx, is_initialization=False):
...
async for sse in event_source.aiter_sse():
...
is_complete = await self._handle_sse_event(...)
if is_complete:
await response.aclose() # <-- offending line
return
After the response event is received, the SSE stream is force-closed before reaching EOF. The connection is then returned to the keepalive pool in a "not fully drained" state. The next POST attempting to reuse this connection blocks for ~260 ms before status arrives (likely a server-side SSE idle/reconnect window — sse_starlette.EventSourceResponse keeps the writer task alive after sending its only event).
Confirming evidence (instrumented timings across many runs):
- Time inside
_handle_post_request from entry to first _handle_sse_event call: 266 ms (always)
- Bare
client.stream(POST) issued on the same httpx.AsyncClient and same event loop, outside the MCP call path: 5 ms
- Bare
client.stream(POST) + aiter_bytes() to EOF issued inside MCP's post_writer subtask, immediately followed by the original _handle_post_request: bare = 266 ms, orig = 5 ms (the next call on the same connection is fast because the previous one drained to EOF)
So the penalty is paid on the request following every early-aclose, not on the request that did the aclose.
Proposed fix
Drain the SSE stream to EOF instead of aborting early:
@@ async def _handle_sse_response(self, response, ctx, is_initialization=False):
- try:
- event_source = EventSource(response)
- async for sse in event_source.aiter_sse():
- ...
- is_complete = await self._handle_sse_event(...)
- if is_complete:
- await response.aclose()
- return
- except Exception as e:
- logger.debug(f"SSE stream ended: {e}")
+ try:
+ event_source = EventSource(response)
+ async for sse in event_source.aiter_sse():
+ ...
+ await self._handle_sse_event(...)
+ except Exception as e:
+ logger.debug(f"SSE stream ended: {e}")
(The last_event_id / reconnect bookkeeping below is unaffected: we still observe every event and the loop now exits naturally on EOF.)
Caveat
This relies on the server closing the SSE stream after sending the response (which sse_starlette.EventSourceResponse does once sse_writer exits via break on JSONRPCResponse — see mcp/server/streamable_http.py). If a server is configured to keep the stream open for multi-message responses, draining will wait for those too — which is the desired behavior. A request_read_timeout_seconds-aware variant could be added if needed.
Happy to send a PR.
Summary
In
mcp.client.streamable_http.StreamableHTTPTransport._handle_sse_response, the client callsawait response.aclose()immediately after receiving the first JSON-RPC response event. This early close leaves the underlying HTTP/1.1 keepalive connection in a state where the next request reusing the same connection blocks for ~260 ms before the server's response status arrives.The result is that every
session.call_tool(...)(andsend_ping,list_tools, ...) overstreamable_httppays a fixed ~260 ms penalty when calls are serial on a single connection.Removing the early
aclose()and draining the SSE stream to EOF eliminates the penalty entirely (37× speedup: 265 ms → 7 ms per call in my setup).Environment
mcp == 1.27.1mcp.server.streamable_http(also 1.27.1), localhost, SSE response modeSymptom (numbers)
Same
tools/call, same server, samehttpx.AsyncClient, all on localhost:httpx.AsyncClient.stream("POST", ...)+aiter_bytes()to EOFClientSession.call_tool(...)(current code)ClientSession.call_tool(...)(withaclose()removed)Status code arrival timing (measured with raw httpx on the same client/headers):
So the server replies in single-digit ms. The 260 ms appears only after MCP's early
aclose()on the previous request.Reproducer
Assuming any reachable streamable_http MCP server with one cheap read-only tool. Replace
URLandTOOL_NAME:On my setup this prints
avg = 267.40 ms. After the patch below it printsavg = 7.28 ms.Root cause
In
src/mcp/client/streamable_http.py,_handle_sse_response:After the response event is received, the SSE stream is force-closed before reaching EOF. The connection is then returned to the keepalive pool in a "not fully drained" state. The next POST attempting to reuse this connection blocks for ~260 ms before status arrives (likely a server-side SSE idle/reconnect window —
sse_starlette.EventSourceResponsekeeps the writer task alive after sending its only event).Confirming evidence (instrumented timings across many runs):
_handle_post_requestfrom entry to first_handle_sse_eventcall: 266 ms (always)client.stream(POST)issued on the samehttpx.AsyncClientand same event loop, outside the MCP call path: 5 msclient.stream(POST) + aiter_bytes() to EOFissued inside MCP'spost_writersubtask, immediately followed by the original_handle_post_request: bare = 266 ms, orig = 5 ms (the next call on the same connection is fast because the previous one drained to EOF)So the penalty is paid on the request following every early-aclose, not on the request that did the aclose.
Proposed fix
Drain the SSE stream to EOF instead of aborting early:
(The
last_event_id/ reconnect bookkeeping below is unaffected: we still observe every event and the loop now exits naturally on EOF.)Caveat
This relies on the server closing the SSE stream after sending the response (which
sse_starlette.EventSourceResponsedoes oncesse_writerexits viabreakon JSONRPCResponse — seemcp/server/streamable_http.py). If a server is configured to keep the stream open for multi-message responses, draining will wait for those too — which is the desired behavior. Arequest_read_timeout_seconds-aware variant could be added if needed.Happy to send a PR.