Skip to content

feat: transparent cursor caching (ENG-2129)#11

Open
prestwich wants to merge 17 commits intomainfrom
prestwich/eng-2129-cursor-caching
Open

feat: transparent cursor caching (ENG-2129)#11
prestwich wants to merge 17 commits intomainfrom
prestwich/eng-2129-cursor-caching

Conversation

@prestwich
Copy link
Copy Markdown
Member

@prestwich prestwich commented Apr 7, 2026

Summary

  • Extend Cache trait with cursor pointer storage (take_cursor/return_cursor/drain_cursors)
  • Add cache reference to CursorDrop returns raw pointer to cache instead of calling mdbx_cursor_close
  • Tx::cursor() checks cache before allocating; reuses cached pointers without FFI overhead
  • Drain and close all cached cursors on commit and abort paths inside with_txn_ptr
  • Applies to both sync (SharedCache) and unsync (RefCell<DbCache>) transaction variants
  • No public API changes

Eliminates ~200 ns per cursor open/close cycle. On the SLOAD hot path (1,000–10,000 get_dual calls per block), this saves ~100–1000 µs per block.

Closes ENG-2129

Review fixes (evalir)

  • Soundness: close_db/drop_db now drain and close cached cursors for the closed DBI before proceeding, preventing use-after-free
  • Soundness: Cursor::new_at_position no longer constructs Self before checking mdbx_cursor_copy result, preventing cache poisoning with unbound (NULL txn) cursor pointers
  • Correctness: Tx::cursor() calls mdbx_cursor_renew on cache-hit path to reset B-tree position, so reused cursors don't retain stale state
  • Docs: Added // NB: keep in sync cross-reference comments on duplicated drain logic in drain_cached_cursors and Tx::Drop
  • Tests: Added RO cursor-cache tests (reuse + 100-cycle stress in read-only transactions)
  • Tests: Added cursor reuse across writes test (interleaved read/write/read confirming cached cursors see B-tree COW updates)

Test plan

  • 3 new dual-variant cursor cache tests (6 test functions) covering reuse, multiple cursors, and 100-cycle stress
  • 2 new RO cursor cache tests (4 test functions) covering reuse and repeated cycles in read-only transactions
  • 1 new cross-write cursor cache test (2 test functions) covering interleaved read/write/read
  • All 211 tests pass (0 failures)
  • cargo clippy clean with --all-features and --no-default-features
  • Docker Linux test (docker build -t mdbx-linux-tests . && docker run --rm mdbx-linux-tests)

🤖 Generated with Claude Code

prestwich and others added 5 commits April 7, 2026 11:14
Extend DbCache with cursor pointer storage and add take_cursor,
return_cursor, drain_cursors to the Cache trait. Both RefCell<DbCache>
and SharedCache implement the new methods.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cursor now holds a &'tx K::Cache reference. Drop returns the raw
pointer to the cache instead of calling mdbx_cursor_close. Add
from_raw constructor for cache-hit path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tx::cursor() checks the cursor cache before allocating. Commit and
drop paths drain cached cursors inside with_txn_ptr to ensure all
FFI close calls are properly serialized.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verify cursor reuse across open/drop cycles, multiple cursors on the
same DB, and repeated cycle stability.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document the unsafe Send + Sync impls on DbCache in CLAUDE.md with
full justification. Add code comment explaining why Tx::Drop inlines
drain logic instead of calling drain_cached_cursors (type system
constraint: Drop is on Tx<K, U>, helper is on Tx<K>).
@prestwich prestwich requested review from Evalir and Fraser999 April 7, 2026 15:38
Copy link
Copy Markdown
Member

@Evalir Evalir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Transparent Cursor Caching

Good optimization — pooling cursor pointers within a transaction to avoid mdbx_cursor_open/mdbx_cursor_close FFI churn is sound in principle, and the ~200ns savings per cycle compounds well on the SLOAD hot path. The lifetime model is correct: Rust's borrow checker ensures all Cursors are dropped before Tx::commit/Tx::Drop, so the drain-then-close sequence is properly ordered. The Send/Sync justification is solid and well-documented.

Two soundness concerns, one design consideration, and a few smaller items.


1. close_db / drop_db don't drain cached cursors for the closed DBI

close_db (impl.rs:232) and drop_db (impl.rs:489) call self.cache.remove_dbi(dbi) to clear the db cache, but don't touch the cursor cache. After close_db(5), a stale *mut MDBX_cursor for DBI 5 can still be returned by take_cursor(5) and wrapped via from_raw — use-after-free.

Both methods are unsafe with caller obligations, but once cursors are dropped their pointers land in the cache automatically — the caller has no way to drain this internal state. The safety docs also predate cursor caching and don't mention it.

Suggest adding a drain_cursors_for_dbi(dbi) that removes matching entries (cheap linear scan of SmallVec<8>), called from both methods.

(Can't add inline comments on close_db/drop_db since they're outside the diff hunks.)

2. Cursor::new_at_position failure path poisons the cache

See inline comment on cursor.rs.

3. Cached cursors retain their previous position

See inline comment on impl.rs cursor() method.

4. Minor items

  • Tx::Drop duplicates drain_cached_cursors logic — Well-explained comment about the type-system constraint. Consider a // NB: keep in sync with drain_cached_cursors marker at both sites.

  • Tests only exercise RW transactions — All three test functions accept _begin_ro but never use it. An RO variant (populate via RW, commit, then test cursor caching in an RO txn) would cover that path.

  • No test for cursor reuse across writes — The stated hot path is interleaved reads + writes. A test like cursor → read → drop → put → cursor → read would confirm cached cursors survive B-tree COW.


Bottom line: Items 1 and 2 are the ones I'd want addressed before merge. The rest is polish. Solid change overall — clean commit progression, good CLAUDE.md docs, and the SmallVec-based pool is a good fit for the expected cardinality.

Comment thread src/tx/cursor.rs Outdated
_kind: PhantomData,
};

mdbx_result(res)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soundness: failed mdbx_cursor_copy poisons the cursor cache.

Self is constructed at line 80 before the ? here. If mdbx_cursor_copy fails, s is dropped and Cursor::Drop pushes the unbound cursor pointer (created by mdbx_cursor_create(null_mut()) — has NULL txn) into the cache. A later txn.cursor(db) takes it via from_raw without rebinding — operations on it would dereference the NULL txn inside the MDBX cursor struct.

Before this PR the old Drop called mdbx_cursor_close, which safely frees an unbound cursor. Now it's cached and reused instead.

Fix: construct Self only on success, and close the cursor directly on the error path:

let res = ffi::mdbx_cursor_copy(other.cursor(), cursor);
if let Err(e) = mdbx_result(res) {
    ffi::mdbx_cursor_close(cursor);
    return Err(e);
}
Ok(Self { access: other.access, cache: other.cache, cursor, db: other.db, _kind: PhantomData })

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Fixed in 33507af. Adopted the approach you suggested — Self is now constructed only after mdbx_cursor_copy succeeds. On the error path we call mdbx_cursor_close directly on the raw pointer, bypassing Drop entirely.

Comment thread src/tx/impl.rs Outdated
pub fn cursor(&self, db: Database) -> MdbxResult<Cursor<'_, K>> {
Cursor::new(&self.txn, db)
if let Some(raw) = self.cache.take_cursor(db.dbi()) {
Ok(Cursor::from_raw(&self.txn, &self.cache, raw, db))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design note: cached cursors retain their previous B-tree position.

A cursor from the cache via from_raw is not reset — it keeps whatever position it had when it was last used. This means:

  • cursor.first(), cursor.set(key), etc. work fine (they navigate from scratch).
  • cursor.get_current() or cursor.next() as the first operation would silently return stale data from the previous user of the cursor, rather than the EOF/error a freshly-opened cursor would give.

Callers in practice always position before reading, so this is unlikely to bite, but it's a semantic difference from mdbx_cursor_open. Two options:

  1. Document it in this docstring: "The returned cursor may be reused from a pool; always position it before reading."
  2. Call mdbx_cursor_renew on the cache-hit path for defensive correctness (~negligible cost, just rebinds same txn+DBI and resets position).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Fixed in 9c6dc2b. Went with option 2 — Tx::cursor() now calls mdbx_cursor_renew(txn_ptr, raw) on the cache-hit path before returning the cursor. This resets the position to a clean state, matching fresh mdbx_cursor_open semantics. Cost is negligible (~single-digit ns).

prestwich and others added 6 commits April 9, 2026 09:30
Prevents use-after-free when a DBI is closed or dropped while stale
cursor pointers for it remain in the cursor cache.
Construct Cursor only after mdbx_cursor_copy succeeds. On failure,
close the raw pointer directly instead of letting Drop push an
unbound cursor into the cache.
Call mdbx_cursor_renew on the cache-hit path so reused cursors
start at a clean position rather than retaining stale state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover cursor caching in read-only transactions (reuse and repeated
cycles), complementing existing RW-only coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verify cached cursors see updated data after B-tree COW from
interleaved put operations — the primary hot path for caching.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@prestwich prestwich requested a review from Evalir April 9, 2026 13:48
@prestwich
Copy link
Copy Markdown
Member Author

[Claude Code]

Thanks for the thorough review. All items addressed:

Soundness (blocking):

  1. close_db/drop_db now drain and close cached cursors for the closed DBI before proceeding (b97eff2). Added drain_cursors_for_dbi(dbi) to the Cache trait.
  2. new_at_position no longer constructs Self before checking mdbx_cursor_copy — on failure, raw pointer is closed directly (33507af).

Design:
3. Tx::cursor() calls mdbx_cursor_renew on cache-hit path to reset position (9c6dc2b). Went with option 2 (defensive correctness).

Minor:
4. Added // NB: keep in sync cross-references on both drain sites (4144c17).
5. Added RO cursor-cache tests — reuse + 100-cycle stress (8504d37).
6. Added cursor reuse across writes test — interleaved read/write/read (426be07).

Comment thread src/tx/impl.rs Outdated
// SAFETY: txn_ptr is valid from with_txn_ptr, raw is a
// valid cursor pointer returned by a prior Cursor::drop.
let rc = unsafe { ffi::mdbx_cursor_renew(txn_ptr, raw) };
mdbx_result(rc)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An early exit here leaks the cursor pointer - I guess it should be returned to the cache?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Fixed in 14adac5. Chose defensive close over re-caching: on renew failure the raw pointer is closed via mdbx_cursor_close2 (the non-aborting variant) rather than returned to the cache, since a subsequent cursor() call would hit the same renew failure. Happy to switch to re-cache if you'd prefer.

Comment thread src/tx/impl.rs Outdated
Comment on lines +236 to +246
// Drain and close any cached cursors for this DBI before closing it.
let stale = self.cache.drain_cursors_for_dbi(dbi);
if !stale.is_empty() {
self.with_txn_ptr(|_| {
for cursor in stale {
// SAFETY: cursor pointers are valid — returned by
// Cursor::drop during the lifetime of this transaction.
unsafe { ffi::mdbx_cursor_close(cursor) };
}
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated in Tx::Drop, drop_db and drain_cached_cursors. Seems like this could be extracted to a helper?

Also, seems like there's a convention to have FFI calls exclusively in the ops module which we're not following here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Fixed in 14adac5. Extracted close_drained_cursors as a free fn (generic over TxPtrAccess so Tx::Drop's free U parameter can share it), and moved the cursor FFI calls into ops as cursor_renew_raw, cursor_close_raw, cursor_close2_raw.

Comment thread tests/cursor.rs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the tests actually verify that caching worked since the caching is transparent by design. It might be worthwhile to add a couple of #[cfg(test)]-gated helpers to the DbCache to allow these tests (or new ones) to verify expected cache usage?

It would be nice to also include a test which covers Tx::cursor erroring at mdbx_cursor_renew to ensure the pointer leak is fixed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Fixed in 14adac5. Added #[cfg(test)] cursor_count and inject_cursor to the Cache trait, plus three tests in src/tx/impl.rs:

  • test_cursor_cache_counts — verifies dropped cursors enter the cache and are reused.
  • test_cursor_cache_close_db_drains — verifies close_db drains only its own DBI.
  • test_cursor_renew_error_does_not_leak — injects a null pointer, exercises the renew error path, and asserts the cache is empty after (covers the leak fix).

…cache tests

Addresses Fraser's review on PR #11:

- Extract drain-and-close logic into `close_drained_cursors` helper, used
  by `close_db`, `drop_db`, `drain_cached_cursors`, and `Tx::Drop`.
- Move all cursor FFI calls into `ops` (`cursor_renew_raw`,
  `cursor_close_raw`, `cursor_close2_raw`), matching the crate convention.
- On `mdbx_cursor_renew` failure in `Tx::cursor`, defensively close the
  raw pointer via `mdbx_cursor_close2` instead of leaking it. Not
  re-cached: a subsequent call would hit the same failure.
- Add `#[cfg(test)]` `cursor_count` and `inject_cursor` to the `Cache`
  trait so tests can verify cache state directly.
- Add tests: cache-count assertions, `close_db` drains its DBI only,
  and a renew-failure test that injects a null pointer to exercise the
  error path and confirm no leak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prestwich prestwich requested a review from Fraser999 May 8, 2026 08:11
prestwich and others added 2 commits May 8, 2026 04:16
Satisfies clippy::manual_checked_ops (new in Rust 1.95).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Satisfies clippy::manual_checked_ops (new in Rust 1.95).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Evalir
Copy link
Copy Markdown
Member

Evalir commented May 8, 2026

Eager-drain on clone Drop closes cursors other clones still need (sync only)

Tx::Drop unconditionally drains the cursor cache. For Tx<K, Arc<PtrSync>> (RoTxSync / RwTxSync), K::Cache = SharedCache = Arc<RwLock<DbCache>> and Clone shares the inner RwLock, so dropping any clone empties the cache for all surviving clones.

Reproducer (added to the unit test module in src/tx/impl.rs, ran locally on this branch):

let txn_a = RwTxSync::begin(env.clone()).unwrap();
let db = txn_a.create_db(None, DatabaseFlags::empty()).unwrap();
drop(txn_a.cursor(db).unwrap());
assert_eq!(txn_a.cache.cursor_count(), 1); // OK

let txn_b = txn_a.clone();
drop(txn_b);
assert_eq!(txn_a.cache.cursor_count(), 1); // FAILS: count is 0

Output: cache count: before clone-drop = 1, after clone-drop = 0.

This silently defeats caching whenever a sync txn is cloned across threads — exactly the SLOAD-style workload this PR targets if any caller holds clones. Unsync is unaffected (no Clone impl).

Suggested fix: add is_last_reference() to TxPtrAccess (default true, override for Arc<T> as Arc::strong_count(self) == 1) and gate the drain in Tx::Drop on it. commit_inner's explicit drain stays as-is — it consumes self so the check isn't needed there.

Race modes are both benign: a missed drain leaves mdbx_txn_abort to close the cursors and leaks a few bytes of in-Rust SmallVec storage until the last clone drops; a spurious drain costs the surviving clone one cache-miss mdbx_cursor_open.

… drain

`Tx::Drop` previously drained the shared cursor cache unconditionally.
For `TxSync`, the cache was `Arc<RwLock<DbCache>>` shared across clones,
so dropping any clone emptied the cache for all surviving clones. The
naive `Arc::strong_count == 1` gate would race on concurrent drops and
leak the cursor pointers (MDBX requires explicit `mdbx_cursor_close`
regardless of RO/RW).

Move the cache inside `PtrSync` (widening the existing `Mutex<()>` to
`Mutex<DbCache>`) and `PtrUnsync` (`RefCell<DbCache>`). The `Arc` that
controls the txn lifetime now also controls cache lifetime, so cursors
are drained-and-closed exactly once in the inner Drop when refcount
hits 0 — race-free by construction.

Removes `SharedCache`, drops `Clone+Default` from the `Cache` trait,
and trims the `cache` field/parameter from `Tx` and `Cursor`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prestwich
Copy link
Copy Markdown
Member Author

[Claude Code]

Fixed in c78fd4b. Went with co-location rather than Arc::strong_count gating — the latter has a race when two clones drop concurrently (both observe >= 2 since each still holds an Arc in its own Tx::Drop), and per mdbx.h:5078 MDBX requires explicit mdbx_cursor_close regardless of RO/RW, so a missed drain leaks cursor handles.

Approach:

  • Cache moved inside PtrSync (widening the existing Mutex<()>Mutex<DbCache> — the same lock that already serialises txn-pointer access) and PtrUnsync (RefCell<DbCache>).
  • TxPtrAccess gains type Cache: Cache + fn cache(&self) -> &Self::Cache, forwarded through the Arc<T> impl.
  • The Arc controlling txn lifetime now also controls cache lifetime, so cursors are drained-and-closed exactly once in PtrSync::Drop / PtrUnsync::Drop when the refcount transition to 0 fires the inner Drop. Race-free by construction.
  • commit_inner keeps its explicit drain (must close cursors before mdbx_txn_commit_ex returns to the caller; the inner Drop fires later when the last Arc drops).
  • Tx::Drop is removed entirely; Tx::Clone no longer touches the cache; SharedCache is gone; K::Cache removed from SyncKind; Cursor lost its cache field and reaches it via self.access.cache().

Tests added in src/tx/impl.rs:

  • test_clone_drop_does_not_drain_shared_cache — direct regression for your reproducer (count stays at 1 after clone-drop).
  • test_concurrent_clone_drop_no_uaf — 200 iterations of two-thread concurrent clone-drop.

All existing tests pass on macOS and Linux (Docker).

prestwich and others added 2 commits May 8, 2026 09:56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants