Fix client *Dma address translation (KeyExport / NvmAdd / NvmRead) and KeyCacheDma use-after-free#403
Open
Frauschi wants to merge 2 commits into
Open
Fix client *Dma address translation (KeyExport / NvmAdd / NvmRead) and KeyCacheDma use-after-free#403Frauschi wants to merge 2 commits into
Frauschi wants to merge 2 commits into
Conversation
… buffers
KeyExportDma, KeyExportPublicDma, NvmAddObjectDma and NvmReadDma put the raw
client pointer in the wire message instead of running the DMA translation
callback -- invisible on flat-memory ports (POSIX SHM) but rejected by a
split-address-space server. Each now runs the PRE translation before sending
and the matching POST on the Response.
KeyCacheDmaRequest also ran its POST inside the Request, before the server read
the buffer (a use-after-free on ports whose callback allocates); it now defers
the POST to KeyCacheDmaResponse.
The per-call-site async structs are consolidated into whClientDmaAsyncBuf with a
shared wh_Client_DmaAsyncPost() helper that returns the POST status (so a failed
unmap/copy-back is surfaced). Each Request clears its slot(s) before the PRE, so
an unpopulated slot (e.g. metadata-only NvmAddObjectDma) cannot POST stale union
memory, and forwards data_hostaddr = 0 when there is no data buffer. Requests
fail fast with WH_ERROR_REQUEST_PENDING on a busy transport, and
wh_{Client,Server}_DmaRegisterAllowList() now accept NULL to unregister.
8faf92f to
364edf6
Compare
The existing POSIX tests never registered a non-identity DMA callback, so a *Dma API that skipped translation looked identical to a correct one -- which is why the keystore/NVM translation bugs survived CI. Add a shared "bounce-pool" harness (test/wh_test_dma.c) modeling a split-address-space port: the client callback bounces each buffer through a pool, the server callback rejects any untranslated address, freed slots are poisoned so a premature POST (use-after-free) corrupts the data, and a POST matching no live slot is recorded as a stray/double free. - wh_test_clientserver.c drives the keystore/NVM *Dma APIs (including metadata-only and injected PRE-failure cases) in the single-thread pump harness, where ordering makes the use-after-free deterministic. - wh_test_crypto.c registers the same callback on the threaded harness so the whole crypto/cert *Dma suite runs through translation. - _testDma now unregisters its stack-local allow list before returning (a pre-existing dangling pointer ASAN flags once a later test uses DMA). Runs under `make DMA=1`.
364edf6 to
b611887
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
A few client *Dma APIs wrote the raw client pointer straight into the wire message instead of routing it through the DMA address-translation callback like the rest of the family.
This is invisible on flat-memory ports (the POSIX SHM reference, where client and server share an address space), so it survived CI. It breaks on a real split-address-space port — found during DMA bring-up on one of the new hardware platforms for wolfHSM, where the server runs on a separate security CPU with no view of host RAM, so it bounds-checks the untranslated pointer and rejects the request (
WH_ERROR_BADARGS).What this fixes
KeyExportDma,KeyExportPublicDma,NvmAddObjectDma,NvmReadDma. Each now runs*_PREtranslation before sending and the matching*_POSTon the response, mirroringRngGenerateDma.KeyCacheDmaRequest, which ran itsPOSTinside the request — releasing/recycling the scratch before the server read the buffer (benign on POSIX, a real UAF on any port whose callback allocates). It now defers thePOSTtoKeyCacheDmaResponse.Notable implementation details
whClientDmaAsyncBuf(carrying thePOSTdirection) plus a two-buffernvmAddvariant, with a sharedwh_Client_DmaAsyncPost()helper that returns thePOSTstatus (a failed unmap/copy-back is surfaced, not swallowed).PRE, so a path that leaves a slot unpopulated (e.g. metadata-onlyNvmAddObjectDma) can't make the responsePOSTstale union memory;data_hostaddris 0 when there is no data buffer, so a raw client pointer is never forwarded.WH_ERROR_REQUEST_PENDINGon a busy transport (no mapping acquired then leaked).wh_{Client,Server}_DmaRegisterAllowList()now acceptNULLto unregister, symmetric withDmaRegisterCb(NULL).Tests
The reason these bugs survived CI is that no POSIX test ever registered a non-identity DMA callback — a missing translation looked identical to a correct call. This PR adds a shared bounce-pool harness (
test/wh_test_dma.c) that models a split-address-space port: the client callback bounces each buffer through a dedicated pool and hands the server a pool address; the server callback rejects any untranslated address. Freed slots are poisoned (so a prematurePOSTcorrupts the data) and aPOSTmatching no live slot is recorded as a stray/double free.wh_test_clientserver.cdrives the keystore/NVM*DmaAPIs end-to-end — including metadata-only and injectedPRE-failure cases — in the single-thread pump harness, where ordering makes the use-after-free deterministic.wh_test_crypto.cregisters the same callback on the threaded harness, so the entire crypto/cert*Dmasuite (AES/SHA/CMAC/RNG/Ed25519/ML-DSA/ML-KEM/cert, plus KeyExportPublicDma) runs through translation automatically._testDmaleft a stack-local allow list registered after returning (a dangling pointer ASAN flags once a later test uses server DMA).Runs under
make DMA=1(already set by every CI job).Validation
make DMA=1 ASAN=1: passes, 0 ASAN findings.