Skip to content

RTL8814AU: devourer TX degrades to LIBUSB_ERROR_IO after USB passthrough cycles #36

@josephnef

Description

@josephnef

Symptom

devourer-TX on RTL8814AU (0bda:8813) works reliably on a fresh chip — first cell of a test session typically sends ~250 frames with 0 submit failures. After the chip has been through one or more virsh USB passthrough cycles (host → VM → host) during a --full-matrix regression run, libusb_bulk_transfer starts failing on the majority of submits with LIBUSB_ERROR_IO (-1).

Observed pattern from PR #34 full-matrix runs (4 cells each — 1 baseline + 1 dvr-TX + 1 kernel-TX + 1 dvr-dvr — and the dvr-TX cells are where this surfaces):

[1/24] TX=RTL8814AU (kernel) → RX=RTL8812AU (kernel)
  → 99 hits / 277 TX / 11s ✓                        # works (kernel-side, no devourer)
[2/24] TX=RTL8814AU (devourer) → RX=RTL8812AU (kernel)
  → 0 hits / 2500 TX (2272 fail) / 10s ✗            # devourer-TX 8814 fails 91% of submits
[4/24] TX=RTL8814AU (devourer) → RX=RTL8812AU (devourer)
  → 0 hits / 2500 TX (2429 fail) / 10s ✗            # 97% failures
[6/24] TX=RTL8814AU (devourer) → RX=RTL8821AU (kernel)
  → 0 hits / 2500 TX (2322 fail) / 10s ✗
[8/24] TX=RTL8814AU (devourer) → RX=RTL8821AU (devourer)
  → 0 hits / 2500 TX (2321 fail) / 10s ✗

Single-pair runs of the same cell in a fresh session (PR #33 validation) get ~4000 successful TX submits, 0 failures. So the bug is not in devourer's 8814 TX path itself — it's in how the chip is left after passing through libvirt USB hot-plug back and forth.

Reproduce

On a Linux box with libvirt + the test rig from PR #34 set up (tests/setup_vm.sh + 8814AU plugged):

# Fresh state: 8814 attached to host kernel, devourer reachable
sudo tests/setup_vm.sh --status
lsusb | grep 8813   # confirms 8814 enumerated

# Single-cell baseline — works
sudo python3 tests/regress.py --tx-pid 0x8813 --rx-pid 0x8812 --channel 100 \
    --vm-name devourer-testrig --vm-ssh <user>@<VM-IP>
# Cell `devourer ↔ kernel` → ~4000 hits, 0 fail ✓

# Full matrix — third+ cell involving 8814 TX fails reproducibly
sudo python3 tests/regress.py --full-matrix --channel 100 \
    --vm-name devourer-testrig --vm-ssh <user>@<VM-IP>
# devourer-TX 8814 cells → 0 hits, ~2300+ fail per cell ✗

The triggering condition is one or more virsh attach-device + virsh detach-device cycles on the 8814AU. Each --full-matrix cell that needs the 8814 on the kernel side does a passthrough; the next cell that needs it back on host pulls it out. After ~1-2 such cycles, devourer can no longer drive it.

To isolate from --full-matrix orchestration entirely, the minimal repro is probably:

# Cycle the chip via virsh
sudo virsh attach-device devourer-testrig /tmp/usb-8814.xml --live
sleep 3
sudo virsh detach-device devourer-testrig /tmp/usb-8814.xml --live
sleep 3
# (optionally repeat 2-3 times)
# Now try devourer-TX
sudo DEVOURER_PID=0x8813 ./build/WiFiDriverTxDemo
# Expect: TX submits start failing with rc=-1

(haven't isolated the exact minimum cycle count needed — somewhere between 1 and 4 in practice.)

What's not the cause

  • Not the 8814 TX code path itself. Single-cell tests in PR tests: VM mode for kernel cells (aircrack-ng/rtl8812au on pinned kernel) #33 hit ~4000 successful submits.
  • Not USB hub bandwidth contention. 8812 and 8821 on the same hub at the same time are fine.
  • Not VM-side state. Failure is on the host side after the chip comes back to the host. The VM is no longer involved by the time devourer tries to claim it.
  • Not specific to the matrix orchestration timing. Single-pair --tx-pid 0x8813 --rx-pid <other> runs (which do 1-2 passthrough cycles internally per cell) also reproduce after the second or third cell.

Hypothesis on root cause

The 8814AU chip ends up in some post-virsh detach-device state that:

  • The aircrack-ng kernel driver (in VM, where the chip was just bound) leaves behind
  • The aircrack-ng kernel driver on the host could probably reset cleanly via its probe/remove path, but we explicitly unbind it on host so devourer can libusb_claim_interface
  • Devourer's chip-bring-up assumes a fresh chip (just plugged in, in CARDEMU state) and doesn't do the full reset cycle the kernel driver would

Specifically: PR #29 noted "8814AU rmmod/sysfs-unbind actively deinits the chip (RF off, MAC DMA off)". So kernel unbind cleans up. But virsh detach-device probably doesn't trigger that — it just pulls the USB device out of QEMU and reattaches to host bus. Host kernel may not re-probe (we unbind it explicitly), so the chip stays in whatever state QEMU's last URB left it.

Workarounds to try

  1. usbreset between cells in tests/regress.py's VM mode — after virsh detach-device, force a USB-level reset on the device before handing it to devourer. Should clear chip state.

  2. Explicit libusb_reset_device in devourer's Init() path on 8814AU, after libusb_open but before claim_interface. Devourer already does this for non-8814 paths; check if the 8814 path skips it or whether the reset isn't propagating to the chip's MAC/RF.

  3. Probe-then-unbind on host when re-attaching: instead of detaching from host kernel before passthrough, let the kernel driver fully probe (reset + init), then unbind. The probe will leave the chip in a known state.

  4. Investigate whether the chip needs _8051Reset8812() + power-cycle on attach. PR RTL8814AU: end-to-end TX via opt-in kernel-driver init replay #29 added rmmod/sysfs-unbind actively deinits — devourer might need to mirror that on attach.

Scope (what this issue is and isn't)

This isn't blocking 8814AU support — single-cell devourer-TX works. It blocks the full-matrix automated regression run from producing useful 8814 TX data after the first cell.

Possible fix audiences:

  • Devourer (add usbreset + chip-state-validation on bring-up to handle "chip in post-fwdl state from previous owner") — most robust
  • tests/regress.py (add USB reset between cells in VM mode) — cheapest if devourer fix is complex
  • Both

Refs #29 (8814AU TX bring-up), #34 (where this surfaced), #33 (VM mode that introduced the passthrough cycles).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions