Add statement_timeout on fetching available jobs#1263
Conversation
Follows up #1255 to add a `statement_timeout` in addition to the Go context timeout. `statement_timeout` will give us a better error message, and also minimizes the chances of accidentally locking rows that won't be work if we had an operation that ran long, succeeded, but then was immediately cancelled as Go's context timeout ran out. `statement_timeout` is Postgres only, so the code is a little gnarlier than would be desirable in that we add a `SetLocalStatementTimeout` function to driver `ExecutorTx`, but which is a no-op on some databases like SQLite. We try to clarify in documentation that it needs to be used in addition to context timeout, not instead of it, because it may no-op depending on the database. I also increased the timeout to 30 seconds. This matches our timeouts in the various maintenance modules, and seems a little safer as job locks on tables with huge numbers of dead rows could potentially take over 10 seconds, and maybe some users have this happening. IMO, it's too random still where we put this stuff in, but we'll have to figure that out on follow up changes. e.g. Why do we do a statement timeout for locking jobs but not for maintenance operations? Hard to justify.
36e3b25 to
d8bb1b6
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
bgentry
left a comment
There was a problem hiding this comment.
Having spent a bit more time thinking this over, I'm really going back and forth on whether this should be necessary. In the past we've talked about giving productionization guidance that includes bits like "always set a reasonable statement timeout on your connection pool", and as you mentioned we probably do want to have some sort of timeout for all operations just to prevent infinite hangs, right?
People will forget to apply customizations like that, though, even if they're aware of the need to do so.
| } | ||
|
|
||
| func (t *ExecutorTx) SetLocalStatementTimeout(ctx context.Context, timeout time.Duration) error { | ||
| return t.Exec(ctx, "SELECT set_config('statement_timeout', $1, true)", timeout.String()) |
There was a problem hiding this comment.
I think time.Duration.String will return Go duration values that aren't pg-compatible?
Follows up #1255 to add a
statement_timeoutin addition to the Gocontext timeout.
statement_timeoutwill give us a better errormessage, and also minimizes the chances of accidentally locking rows
that won't be work if we had an operation that ran long, succeeded, but
then was immediately cancelled as Go's context timeout ran out.
statement_timeoutis Postgres only, so the code is a little gnarlierthan would be desirable in that we add a
SetLocalStatementTimeoutfunction to driver
ExecutorTx, but which is a no-op on some databaseslike SQLite. We try to clarify in documentation that it needs to be used
in addition to context timeout, not instead of it, because it may no-op
depending on the database.
I also increased the timeout to 30 seconds. This matches our timeouts in
the various maintenance modules, and seems a little safer as job locks
on tables with huge numbers of dead rows could potentially take over 10
seconds, and maybe some users have this happening.
IMO, it's too random still where we put this stuff in, but we'll have to
figure that out on follow up changes. e.g. Why do we do a statement
timeout for locking jobs but not for maintenance operations? Hard to
justify.