You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Append commit instead of individual transactions to commitlog (#4140)
Changes the commitlog (and durability) write API, such that the caller
decides how many transactions are in a single commit, and has to supply
the transaction offsets.
This simplifies commitlog-side buffering logic to essentially a
`BufWriter` (which, of course, we must not forget to flush). This will
help throughput, but offers less opportunity to retry failed writes.
This is probably a good thing, as disks can fail in erratic ways, and we
should rather crash and re-verify the commitlog (suffix) than continue
writing.
To that end, this patch liberally raises panics when there is a chance
that internal state could be "poisoned" by partial writes, which may be
debatable.
# Motivation
The main motivation is to avoid maintaining the transaction offset in
two places in such a way that they could diverge. As ordering commits is
the responsibility of the datastore, we make it authoritative on this
matter -- the commitlog will still check that offsets are contiguous,
and refuse to commit if that's not the case.
A secondary, related motivation is the following:
A "commit" is an atomic unit of storage, meaning that a torn (partial)
write of a commit will render the entire commit corrupt. There hasn't
been a compelling case where we would want this, and have always
configured the server to write exactly one transaction per commit.
The code to handle buffering of transactions is, however, rather
complex, as it tries hard to allow the caller to retry writes at commit
boundaries. An unfortunate consequence of this is that we'd flush to the
OS very often, leaving throughput performance on the table.
So, if there is a compelling case for batching multiple transactions in
a commit, it should be the datastore's responsibility.
# API and ABI breaking changes
Breaks internal APIs only.
# Expected complexity level and risk
5 - Mostly for the risk
# Testing
Existing tests.
0 commit comments