feat: transparent cursor caching (ENG-2129) by prestwich · Pull Request #11 · init4tech/mdbx

prestwich · 2026-04-07T15:37:47Z

Summary

Extend Cache trait with cursor pointer storage (take_cursor/return_cursor/drain_cursors)
Add cache reference to Cursor — Drop returns raw pointer to cache instead of calling mdbx_cursor_close
Tx::cursor() checks cache before allocating; reuses cached pointers without FFI overhead
Drain and close all cached cursors on commit and abort paths inside with_txn_ptr
Applies to both sync (SharedCache) and unsync (RefCell<DbCache>) transaction variants
No public API changes

Eliminates ~200 ns per cursor open/close cycle. On the SLOAD hot path (1,000–10,000 get_dual calls per block), this saves ~100–1000 µs per block.

Closes ENG-2129

Review fixes (evalir)

Soundness: close_db/drop_db now drain and close cached cursors for the closed DBI before proceeding, preventing use-after-free
Soundness: Cursor::new_at_position no longer constructs Self before checking mdbx_cursor_copy result, preventing cache poisoning with unbound (NULL txn) cursor pointers
Correctness: Tx::cursor() calls mdbx_cursor_renew on cache-hit path to reset B-tree position, so reused cursors don't retain stale state
Docs: Added // NB: keep in sync cross-reference comments on duplicated drain logic in drain_cached_cursors and Tx::Drop
Tests: Added RO cursor-cache tests (reuse + 100-cycle stress in read-only transactions)
Tests: Added cursor reuse across writes test (interleaved read/write/read confirming cached cursors see B-tree COW updates)

Test plan

3 new dual-variant cursor cache tests (6 test functions) covering reuse, multiple cursors, and 100-cycle stress
2 new RO cursor cache tests (4 test functions) covering reuse and repeated cycles in read-only transactions
1 new cross-write cursor cache test (2 test functions) covering interleaved read/write/read
All 211 tests pass (0 failures)
cargo clippy clean with --all-features and --no-default-features
Docker Linux test (docker build -t mdbx-linux-tests . && docker run --rm mdbx-linux-tests)

🤖 Generated with Claude Code

Extend DbCache with cursor pointer storage and add take_cursor, return_cursor, drain_cursors to the Cache trait. Both RefCell<DbCache> and SharedCache implement the new methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Cursor now holds a &'tx K::Cache reference. Drop returns the raw pointer to the cache instead of calling mdbx_cursor_close. Add from_raw constructor for cache-hit path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tx::cursor() checks the cursor cache before allocating. Commit and drop paths drain cached cursors inside with_txn_ptr to ensure all FFI close calls are properly serialized. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Verify cursor reuse across open/drop cycles, multiple cursors on the same DB, and repeated cycle stability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Document the unsafe Send + Sync impls on DbCache in CLAUDE.md with full justification. Add code comment explaining why Tx::Drop inlines drain logic instead of calling drain_cached_cursors (type system constraint: Drop is on Tx<K, U>, helper is on Tx<K>).

Evalir

Review: Transparent Cursor Caching

Good optimization — pooling cursor pointers within a transaction to avoid mdbx_cursor_open/mdbx_cursor_close FFI churn is sound in principle, and the ~200ns savings per cycle compounds well on the SLOAD hot path. The lifetime model is correct: Rust's borrow checker ensures all Cursors are dropped before Tx::commit/Tx::Drop, so the drain-then-close sequence is properly ordered. The Send/Sync justification is solid and well-documented.

Two soundness concerns, one design consideration, and a few smaller items.

1. `close_db` / `drop_db` don't drain cached cursors for the closed DBI

close_db (impl.rs:232) and drop_db (impl.rs:489) call self.cache.remove_dbi(dbi) to clear the db cache, but don't touch the cursor cache. After close_db(5), a stale *mut MDBX_cursor for DBI 5 can still be returned by take_cursor(5) and wrapped via from_raw — use-after-free.

Both methods are unsafe with caller obligations, but once cursors are dropped their pointers land in the cache automatically — the caller has no way to drain this internal state. The safety docs also predate cursor caching and don't mention it.

Suggest adding a drain_cursors_for_dbi(dbi) that removes matching entries (cheap linear scan of SmallVec<8>), called from both methods.

(Can't add inline comments on close_db/drop_db since they're outside the diff hunks.)

2. `Cursor::new_at_position` failure path poisons the cache

See inline comment on cursor.rs.

3. Cached cursors retain their previous position

See inline comment on impl.rs cursor() method.

4. Minor items

Tx::Drop duplicates drain_cached_cursors logic — Well-explained comment about the type-system constraint. Consider a // NB: keep in sync with drain_cached_cursors marker at both sites.
Tests only exercise RW transactions — All three test functions accept _begin_ro but never use it. An RO variant (populate via RW, commit, then test cursor caching in an RO txn) would cover that path.
No test for cursor reuse across writes — The stated hot path is interleaved reads + writes. A test like cursor → read → drop → put → cursor → read would confirm cached cursors survive B-tree COW.

Bottom line: Items 1 and 2 are the ones I'd want addressed before merge. The rest is polish. Solid change overall — clean commit progression, good CLAUDE.md docs, and the SmallVec-based pool is a good fit for the expected cardinality.

Evalir · 2026-04-08T20:49:58Z

+                _kind: PhantomData,
+            };

            mdbx_result(res)?;


Soundness: failed mdbx_cursor_copy poisons the cursor cache.

Self is constructed at line 80 before the ? here. If mdbx_cursor_copy fails, s is dropped and Cursor::Drop pushes the unbound cursor pointer (created by mdbx_cursor_create(null_mut()) — has NULL txn) into the cache. A later txn.cursor(db) takes it via from_raw without rebinding — operations on it would dereference the NULL txn inside the MDBX cursor struct.

Before this PR the old Drop called mdbx_cursor_close, which safely frees an unbound cursor. Now it's cached and reused instead.

Fix: construct Self only on success, and close the cursor directly on the error path:

let res = ffi::mdbx_cursor_copy(other.cursor(), cursor); if let Err(e) = mdbx_result(res) { ffi::mdbx_cursor_close(cursor); return Err(e); } Ok(Self { access: other.access, cache: other.cache, cursor, db: other.db, _kind: PhantomData })

[Claude Code]

Fixed in 33507af. Adopted the approach you suggested — Self is now constructed only after mdbx_cursor_copy succeeds. On the error path we call mdbx_cursor_close directly on the raw pointer, bypassing Drop entirely.

Evalir · 2026-04-08T20:49:58Z

    pub fn cursor(&self, db: Database) -> MdbxResult<Cursor<'_, K>> {
-        Cursor::new(&self.txn, db)
+        if let Some(raw) = self.cache.take_cursor(db.dbi()) {
+            Ok(Cursor::from_raw(&self.txn, &self.cache, raw, db))


Design note: cached cursors retain their previous B-tree position.

A cursor from the cache via from_raw is not reset — it keeps whatever position it had when it was last used. This means:

cursor.first(), cursor.set(key), etc. work fine (they navigate from scratch).

cursor.get_current() or cursor.next() as the first operation would silently return stale data from the previous user of the cursor, rather than the EOF/error a freshly-opened cursor would give.

Callers in practice always position before reading, so this is unlikely to bite, but it's a semantic difference from mdbx_cursor_open. Two options:

Document it in this docstring: "The returned cursor may be reused from a pool; always position it before reading."

Call mdbx_cursor_renew on the cache-hit path for defensive correctness (~negligible cost, just rebinds same txn+DBI and resets position).

[Claude Code]

Fixed in 9c6dc2b. Went with option 2 — Tx::cursor() now calls mdbx_cursor_renew(txn_ptr, raw) on the cache-hit path before returning the cursor. This resets the position to a clean state, matching fresh mdbx_cursor_open semantics. Cost is negligible (~single-digit ns).

Prevents use-after-free when a DBI is closed or dropped while stale cursor pointers for it remain in the cursor cache.

Construct Cursor only after mdbx_cursor_copy succeeds. On failure, close the raw pointer directly instead of letting Drop push an unbound cursor into the cache.

Call mdbx_cursor_renew on the cache-hit path so reused cursors start at a clean position rather than retaining stale state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cover cursor caching in read-only transactions (reuse and repeated cycles), complementing existing RW-only coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Verify cached cursors see updated data after B-tree COW from interleaved put operations — the primary hot path for caching. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

prestwich · 2026-04-09T13:59:51Z

[Claude Code]

Thanks for the thorough review. All items addressed:

Soundness (blocking):

close_db/drop_db now drain and close cached cursors for the closed DBI before proceeding (b97eff2). Added drain_cursors_for_dbi(dbi) to the Cache trait.
new_at_position no longer constructs Self before checking mdbx_cursor_copy — on failure, raw pointer is closed directly (33507af).

Design:
3. Tx::cursor() calls mdbx_cursor_renew on cache-hit path to reset position (9c6dc2b). Went with option 2 (defensive correctness).

Minor:
4. Added // NB: keep in sync cross-references on both drain sites (4144c17).
5. Added RO cursor-cache tests — reuse + 100-cycle stress (8504d37).
6. Added cursor reuse across writes test — interleaved read/write/read (426be07).

Fraser999 · 2026-04-21T10:14:13Z

+                // SAFETY: txn_ptr is valid from with_txn_ptr, raw is a
+                // valid cursor pointer returned by a prior Cursor::drop.
+                let rc = unsafe { ffi::mdbx_cursor_renew(txn_ptr, raw) };
+                mdbx_result(rc)?;


An early exit here leaks the cursor pointer - I guess it should be returned to the cache?

[Claude Code]

Fixed in 14adac5. Chose defensive close over re-caching: on renew failure the raw pointer is closed via mdbx_cursor_close2 (the non-aborting variant) rather than returned to the cache, since a subsequent cursor() call would hit the same renew failure. Happy to switch to re-cache if you'd prefer.

Fraser999 · 2026-04-21T10:31:10Z

+        // Drain and close any cached cursors for this DBI before closing it.
+        let stale = self.cache.drain_cursors_for_dbi(dbi);
+        if !stale.is_empty() {
+            self.with_txn_ptr(|_| {
+                for cursor in stale {
+                    // SAFETY: cursor pointers are valid — returned by
+                    // Cursor::drop during the lifetime of this transaction.
+                    unsafe { ffi::mdbx_cursor_close(cursor) };
+                }
+            });
+        }


This is duplicated in Tx::Drop, drop_db and drain_cached_cursors. Seems like this could be extracted to a helper?

Also, seems like there's a convention to have FFI calls exclusively in the ops module which we're not following here.

[Claude Code]

Fixed in 14adac5. Extracted close_drained_cursors as a free fn (generic over TxPtrAccess so Tx::Drop's free U parameter can share it), and moved the cursor FFI calls into ops as cursor_renew_raw, cursor_close_raw, cursor_close2_raw.

Fraser999 · 2026-04-21T11:01:37Z

None of the tests actually verify that caching worked since the caching is transparent by design. It might be worthwhile to add a couple of #[cfg(test)]-gated helpers to the DbCache to allow these tests (or new ones) to verify expected cache usage?

It would be nice to also include a test which covers Tx::cursor erroring at mdbx_cursor_renew to ensure the pointer leak is fixed.

[Claude Code]

Fixed in 14adac5. Added #[cfg(test)] cursor_count and inject_cursor to the Cache trait, plus three tests in src/tx/impl.rs:

test_cursor_cache_counts — verifies dropped cursors enter the cache and are reused.

test_cursor_cache_close_db_drains — verifies close_db drains only its own DBI.

test_cursor_renew_error_does_not_leak — injects a null pointer, exercises the renew error path, and asserts the cache is empty after (covers the leak fix).

…cache tests Addresses Fraser's review on PR #11: - Extract drain-and-close logic into `close_drained_cursors` helper, used by `close_db`, `drop_db`, `drain_cached_cursors`, and `Tx::Drop`. - Move all cursor FFI calls into `ops` (`cursor_renew_raw`, `cursor_close_raw`, `cursor_close2_raw`), matching the crate convention. - On `mdbx_cursor_renew` failure in `Tx::cursor`, defensively close the raw pointer via `mdbx_cursor_close2` instead of leaking it. Not re-cached: a subsequent call would hit the same failure. - Add `#[cfg(test)]` `cursor_count` and `inject_cursor` to the `Cache` trait so tests can verify cache state directly. - Add tests: cache-count assertions, `close_db` drains its DBI only, and a renew-failure test that injects a null pointer to exercise the error path and confirm no leak. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Satisfies clippy::manual_checked_ops (new in Rust 1.95). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Evalir · 2026-05-08T11:01:53Z

Eager-drain on clone Drop closes cursors other clones still need (sync only)

Tx::Drop unconditionally drains the cursor cache. For Tx<K, Arc<PtrSync>> (RoTxSync / RwTxSync), K::Cache = SharedCache = Arc<RwLock<DbCache>> and Clone shares the inner RwLock, so dropping any clone empties the cache for all surviving clones.

Reproducer (added to the unit test module in src/tx/impl.rs, ran locally on this branch):

let txn_a = RwTxSync::begin(env.clone()).unwrap();
let db = txn_a.create_db(None, DatabaseFlags::empty()).unwrap();
drop(txn_a.cursor(db).unwrap());
assert_eq!(txn_a.cache.cursor_count(), 1); // OK

let txn_b = txn_a.clone();
drop(txn_b);
assert_eq!(txn_a.cache.cursor_count(), 1); // FAILS: count is 0

Output: cache count: before clone-drop = 1, after clone-drop = 0.

This silently defeats caching whenever a sync txn is cloned across threads — exactly the SLOAD-style workload this PR targets if any caller holds clones. Unsync is unaffected (no Clone impl).

Suggested fix: add is_last_reference() to TxPtrAccess (default true, override for Arc<T> as Arc::strong_count(self) == 1) and gate the drain in Tx::Drop on it. commit_inner's explicit drain stays as-is — it consumes self so the check isn't needed there.

Race modes are both benign: a missed drain leaves mdbx_txn_abort to close the cursors and leaks a few bytes of in-Rust SmallVec storage until the last clone drops; a spurious drain costs the surviving clone one cache-miss mdbx_cursor_open.

… drain `Tx::Drop` previously drained the shared cursor cache unconditionally. For `TxSync`, the cache was `Arc<RwLock<DbCache>>` shared across clones, so dropping any clone emptied the cache for all surviving clones. The naive `Arc::strong_count == 1` gate would race on concurrent drops and leak the cursor pointers (MDBX requires explicit `mdbx_cursor_close` regardless of RO/RW). Move the cache inside `PtrSync` (widening the existing `Mutex<()>` to `Mutex<DbCache>`) and `PtrUnsync` (`RefCell<DbCache>`). The `Arc` that controls the txn lifetime now also controls cache lifetime, so cursors are drained-and-closed exactly once in the inner Drop when refcount hits 0 — race-free by construction. Removes `SharedCache`, drops `Clone+Default` from the `Cache` trait, and trims the `cache` field/parameter from `Tx` and `Cursor`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prestwich · 2026-05-08T13:43:29Z

[Claude Code]

Fixed in c78fd4b. Went with co-location rather than Arc::strong_count gating — the latter has a race when two clones drop concurrently (both observe >= 2 since each still holds an Arc in its own Tx::Drop), and per mdbx.h:5078 MDBX requires explicit mdbx_cursor_close regardless of RO/RW, so a missed drain leaks cursor handles.

Approach:

Cache moved inside PtrSync (widening the existing Mutex<()> → Mutex<DbCache> — the same lock that already serialises txn-pointer access) and PtrUnsync (RefCell<DbCache>).
TxPtrAccess gains type Cache: Cache + fn cache(&self) -> &Self::Cache, forwarded through the Arc<T> impl.
The Arc controlling txn lifetime now also controls cache lifetime, so cursors are drained-and-closed exactly once in PtrSync::Drop / PtrUnsync::Drop when the refcount transition to 0 fires the inner Drop. Race-free by construction.
commit_inner keeps its explicit drain (must close cursors before mdbx_txn_commit_ex returns to the caller; the inner Drop fires later when the last Arc drops).
Tx::Drop is removed entirely; Tx::Clone no longer touches the cache; SharedCache is gone; K::Cache removed from SyncKind; Cursor lost its cache field and reaches it via self.access.cache().

Tests added in src/tx/impl.rs:

test_clone_drop_does_not_drain_shared_cache — direct regression for your reproducer (count stays at 1 after clone-drop).
test_concurrent_clone_drop_no_uaf — 200 iterations of two-thread concurrent clone-drop.

All existing tests pass on macOS and Linux (Docker).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prestwich and others added 5 commits April 7, 2026 11:14

feat: add cursor caching methods to Cache trait

1cc149d

Extend DbCache with cursor pointer storage and add take_cursor, return_cursor, drain_cursors to the Cache trait. Both RefCell<DbCache> and SharedCache implement the new methods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add cache field to Cursor, return pointer on drop

d5868e5

Cursor now holds a &'tx K::Cache reference. Drop returns the raw pointer to the cache instead of calling mdbx_cursor_close. Add from_raw constructor for cache-hit path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add cursor caching tests

ab5817d

Verify cursor reuse across open/drop cycles, multiple cursors on the same DB, and repeated cycle stability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

prestwich requested review from Evalir and Fraser999 April 7, 2026 15:38

Evalir reviewed Apr 8, 2026

View reviewed changes

prestwich and others added 6 commits April 9, 2026 09:30

fix: drain cached cursors on close_db and drop_db

b97eff2

Prevents use-after-free when a DBI is closed or dropped while stale cursor pointers for it remain in the cursor cache.

fix: prevent cache poisoning on mdbx_cursor_copy failure

33507af

Construct Cursor only after mdbx_cursor_copy succeeds. On failure, close the raw pointer directly instead of letting Drop push an unbound cursor into the cache.

fix: renew cached cursors to reset B-tree position

9c6dc2b

Call mdbx_cursor_renew on the cache-hit path so reused cursors start at a clean position rather than retaining stale state.

docs: add keep-in-sync comments on duplicated drain logic

4144c17

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: add RO cursor-cache tests

8504d37

Cover cursor caching in read-only transactions (reuse and repeated cycles), complementing existing RW-only coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add cursor reuse across writes test

426be07

Verify cached cursors see updated data after B-tree COW from interleaved put operations — the primary hot path for caching. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

prestwich requested a review from Evalir April 9, 2026 13:48

Fraser999 reviewed Apr 21, 2026

View reviewed changes

prestwich requested a review from Fraser999 May 8, 2026 08:11

prestwich and others added 2 commits May 8, 2026 04:16

fix(clippy): use checked_div for remaining_in_page calc

0da5858

Satisfies clippy::manual_checked_ops (new in Rust 1.95). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(clippy): use checked_div for remaining_in_page calc

73a6bb2

Satisfies clippy::manual_checked_ops (new in Rust 1.95). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prestwich and others added 2 commits May 8, 2026 09:56

docs: fix broken Tx::drop reference in close_drained_cursors

9440579

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: drop unresolvable intra-doc links on private PtrSync field

c0d5bef

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Conversation

prestwich commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review fixes (evalir)

Test plan

Uh oh!

Evalir left a comment

Choose a reason for hiding this comment

Review: Transparent Cursor Caching

1. close_db / drop_db don't drain cached cursors for the closed DBI

2. Cursor::new_at_position failure path poisons the cache

3. Cached cursors retain their previous position

4. Minor items

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prestwich commented Apr 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Evalir commented May 8, 2026

Uh oh!

prestwich commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prestwich commented Apr 7, 2026 •

edited

Loading

1. `close_db` / `drop_db` don't drain cached cursors for the closed DBI

2. `Cursor::new_at_position` failure path poisons the cache