Fix controller hang when submission exits early#1695
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1695 +/- ##
=======================================
Coverage 53.89% 53.89%
=======================================
Files 341 341
Lines 27977 27978 +1
=======================================
+ Hits 15077 15078 +1
Misses 12900 12900
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| # We do not close p["c_to_u"][0] -- it only risks crashing the | ||
| # controller with SIGPIPE if the submission exits early. |
There was a problem hiding this comment.
i thought "manager gets sigpipe" is an issue that can happen anyways and that properly written managers always need to signal(SIGPIPE, SIG_IGN) anyways? (at least, looking at managers from old communication tasks, most of them do that.)
i admit this isn't very well documented, but imo leaving file descriptors open in two processes is not the right way to solve this.
i suppose this is at least not a resource leak, given that interactive_keeper should also exit shortly after the solution.
Having the
u_to_c_wfd inherit to the controller means that when the submission exits, the controller will never read EOF, because it still itself has the write end of the pipe open. Easy fix is just to make that not inherit: nothing actually needs the inheritance, in fact it's rather a bit of a security issue that each spawned submission get access to extra fd's.Small additional related fixes:
c_to_u_rnot inherit means that now if the submission exits, controller writing will cause SIGPIPE unless the controller ignores that signal. We don't particularly want that to happen (it makes it impossible to detect which process exited first, which we really ought to do to give proper verdicts); better to leave the pipe unclosed.