fix(datafeed): retry ClientPayloadError and reset _running so the loo…#391
fix(datafeed): retry ClientPayloadError and reset _running so the loo…#391Alex-Nalin wants to merge 1 commit into
Conversation
…p can restart A truncated datafeed read raises aiohttp.ClientPayloadError, which was not classified as a transient error, so read_datafeed_retry re-raised it and the datafeed loop crashed. The loop then could not be restarted because AbstractDatafeedLoop._run_loop left self._running = True on an abnormal exit (only stop() reset it), so DatafeedLoopV2.start() raised "The datafeed service V2 is already started" on every restart. The combined effect turned a transient network blip into a permanent, silent event-loss outage. Changes: - strategy.is_client_timeout_error now treats ClientPayloadError as transient (also benefits datahose and auth-refresh paths). - AbstractDatafeedLoop._run_loop resets self._running = False in a finally, so the loop is restartable after any exit (stop, cancellation, or escaped exception). The concurrent-start guard in V1/V2 start() is unchanged. - Add regression tests for both behaviours. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via: Supported
Alternatively, if the co-author should not be included, remove the Please update your commit message(s) by doing |
|
Problem Net effect: a transient network blip becomes a permanent, silent event-loss outage. Observed in production as restart #1 (ClientPayloadError) followed by 500+ × "already started". Changes Compatibility |
…p can restart
A truncated datafeed read raises aiohttp.ClientPayloadError, which was not classified as a transient error, so read_datafeed_retry re-raised it and the datafeed loop crashed. The loop then could not be restarted because AbstractDatafeedLoop._run_loop left self._running = True on an abnormal exit (only stop() reset it), so DatafeedLoopV2.start() raised "The datafeed service V2 is already started" on every restart. The combined effect turned a transient network blip into a permanent, silent event-loss outage.
Changes:
Description
Closes #[ISSUE NUMBER]
Please put here the intent of your pull request.
Dependencies
List the other pull requests that should be merged before/along this one.
Checklist