Skip to content

tests: --full-matrix runs N-adapter cross-driver interop tables#34

Merged
josephnef merged 1 commit into
masterfrom
feat/regress-full-matrix
May 23, 2026
Merged

tests: --full-matrix runs N-adapter cross-driver interop tables#34
josephnef merged 1 commit into
masterfrom
feat/regress-full-matrix

Conversation

@josephnef
Copy link
Copy Markdown
Collaborator

@josephnef josephnef commented May 23, 2026

What this is

Extends tests/regress.py with --full-matrix: iterates every ordered (TX, RX) pair of plugged-in DUTs across all 4 driver-side combinations (kernel-only / dvr-TX×kernel-RX / kernel-TX×dvr-RX / devourer-only) and emits one NxN table per mode. Useful for catching cross-chipset interop regressions in PRs that touch shared HAL code.

sudo python3 tests/regress.py --full-matrix --channel 100 \
    --vm-name devourer-testrig --vm-ssh <user>@<VM-IP>

For N adapters → N×(N-1)×4 cells. ~16 min for N=3 in VM mode. Diagonal blanked (same physical adapter can't simultaneously TX and RX with one driver).

Also scrubs placeholder usernames from earlier docs/scripts (tests/setup_vm.sh now reads VM_USER from $SUDO_USER/$USER instead of hardcoding; READMEs + docstrings use <user>@<VM-IP>).

Validation on a 3-adapter rig

Ubuntu 22.04 VM with aircrack-ng/88XXau, DUTs 0bda:8812 + 0bda:8813 + 2357:0120, channel 100, 10s/cell:

Kernel-only (rig sanity / cross-chipset kernel interop)

TX \ RX RTL8814AU RTL8812AU RTL8821AU
RTL8814AU 99 hits ✓ 271 hits ✓
RTL8812AU 243 hits ✓ 270 hits ✓
RTL8821AU 260 hits ✓ 88 hits ✓

6/6 cells pass. 88XXau in the pinned-kernel VM gives full cross-chipset kernel-side interop. Rig is sound.

devourer-TX → kernel-RX (does devourer emit valid frames?)

TX \ RX RTL8814AU RTL8812AU RTL8821AU
RTL8814AU 0 ✗ (2272 fail) 0 ✗ (2322 fail)
RTL8812AU 4114 ✓ 4693 ✓
RTL8821AU 4341 ✓ 0 ✗

kernel-TX → devourer-RX (does devourer RX a known-good frame?)

TX \ RX RTL8814AU RTL8812AU RTL8821AU
RTL8814AU 100 ✓ 200 ✓
RTL8812AU 0 ✗ 0 ✗
RTL8821AU 0 ✗ 100 ✓

devourer ↔ devourer (end-to-end devourer)

All 0. Every cell hits at least one broken side (8814 RX broken or 8814 TX degraded mid-run).

Net new product signal

  1. devourer-TX 8821 is working — closes the gap PR RTL8821AU: partial bring-up — chip init OK, RX silent — WIP #30 documented as "wired but unvalidated"
  2. devourer-RX 8821 may actually be working — at least one cell got 200 hits. PR RTL8821AU: partial bring-up — chip init OK, RX silent — WIP #30's negative conclusion needs reopening.
  3. Chip state degrades through repeated passthrough — devourer-TX 8814 is reliable in single-cell tests but flakes after several virsh attach/detach cycles. Suggests we should consider a chip-reset / power-cycle between cells in the orchestrator, or accept this as a known limitation of the VM rig.

What this PR doesn't touch

  • The --full-matrix extension doesn't change any existing code path; the single-pair mode (--tx-pid/--rx-pid) still works exactly as before. Pure addition.
  • Doesn't address the 8814 chip-state degradation (that's separate work — possibly a usbreset between cells)
  • Doesn't open the PR RTL8821AU: partial bring-up — chip init OK, RX silent — WIP #30 reopen (that's a follow-up: rerun the 8821 RX investigation with this rig)

Test plan

  • --full-matrix runs end-to-end against 3 adapters in VM mode
  • All 4 NxN tables render correctly
  • Cell counts and per-pair timing match expected (~30-40s per cell)
  • Kernel-only baseline (6 cells) all pass
  • At least one cell in each non-trivial mode passes (validates harness logic)
  • Run on a different rig with different chipset combinations
  • Validate on 4+ adapters (script supports it; not exercised here)

🤖 Generated with Claude Code

Extends tests/regress.py with a --full-matrix mode that iterates every
ordered (TX, RX) pair of plugged-in DUTs across all four driver-side
combinations (kernel-only, devourer-TX/kernel-RX, kernel-TX/devourer-RX,
devourer-only) and emits one NxN table per mode instead of one 4-cell
table for a single pair. Useful for catching cross-chipset interop
regressions in PRs that touch shared HAL code.

Usage:
    sudo python3 tests/regress.py --full-matrix --channel 100 \\
        --vm-name devourer-testrig --vm-ssh <user>@<VM-IP>

For N adapters, runs N*(N-1)*4 cells total — at ~30-40s per cell in VM
mode that's ~16 min for N=3, manageable. Diagonal is blanked (same
physical adapter can't simultaneously TX and RX with one driver). The
script reuses run_cell as-is; the addition is just the outer pair loop,
result dict keyed by (tx_side, rx_side, tx_vidpid, rx_vidpid), and a
new emit_full_markdown that renders four NxN tables.

Also scrubs personal identifiers from earlier docs/scripts (PR #33):
- tests/setup_vm.sh now reads VM_USER from $SUDO_USER / $USER instead
  of hardcoding a specific username
- tests/README.md + regress.py docstrings switch to <user>@<VM-IP>
  placeholders in example commands

Validation on a 3-adapter rig (Ubuntu 22.04 VM with aircrack-ng/88XXau,
0bda:8812 + 0bda:8813 + 2357:0120, channel 100, 10s/cell):

  ## Kernel-only (rig sanity)
  All 6 cross-chipset cells pass — 88XXau handles all three chipsets
  cleanly in the pinned-kernel VM (88-271 hits per cell).

  ## devourer-TX → kernel-RX
  devourer-TX confirmed for 8812 (4114, 4693 hits) AND 8821 (4341 hits
  reaching 8814 kernel RX). 8814 TX flaky after passthrough cycles
  (chip-state degradation across cell sequencing — known sensitive).

  ## kernel-TX → devourer-RX
  Surprise — devourer-RX 8821 caught 200 frames from kernel-TX 8814,
  contradicting PR #30's "RX silent" finding. devourer-RX 8812
  confirmed (100 hits from each of 8814, 8821 TX). devourer-RX 8814
  confirmed broken (0 hits all directions — known TODO).

  ## devourer ↔ devourer
  All 0 — every cell hits at least one broken side (8814 RX or 8814 TX
  degraded mid-run).

Net new product signal from the full matrix:
- devourer-TX 8821 actually works (was unvalidated since PR #30 had no
  peer sniffer in that session — VM mode is the peer)
- devourer-RX 8821 works under at least one TX condition — reopen PR
  #30's "RX silent" conclusion
- 8814 chip state degrades through repeated host↔VM passthrough — needs
  investigation, may want a chip reset between cells

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@josephnef josephnef force-pushed the feat/regress-full-matrix branch from f06bb98 to 368a198 Compare May 23, 2026 11:23
@josephnef josephnef merged commit 7315024 into master May 23, 2026
5 checks passed
@josephnef josephnef deleted the feat/regress-full-matrix branch May 23, 2026 11:25
josephnef added a commit that referenced this pull request May 23, 2026
… RX asymmetry) (#37)

## Status: partial — register-state parity with kernel, but doesn't fix
#30's 8812-peer asymmetry

Brings devourer's `CHIP_8821` init register-write set into parity with
`aircrack-ng/88XXau`'s post-fwdl monitor-mode bring-up, based on a
usbmon-trace diff. Adds 13 missing post-fwdl writes, 17 BB/AGC value
overrides, MAC-address programming, and corrects an earlier wrong guess
at REG_USB_HRPWM.

**This does NOT fix the asymmetry** (`devourer-RX 8821` catches
kernel-TX 8814 frames but drops kernel-TX 8812 frames at 0/280) that #34
surfaced. The cheap "what's different at the register level" patches
don't cross the threshold. Filing anyway because it's honest progress +
brings the chip's baseline state into kernel parity, which the next fix
attempt can build on.

## What this changes

| Where | Change |
|---|---|
| `HalModule.cpp::rtl8812au_hal_init` | REG_USB_HRPWM for CHIP_8821:
0x84 → 0x00 (kernel writes 0; my earlier "leave LPS wake" guess was
wrong) |
| `HalModule.cpp::rtl8812au_hal_init` | CHIP_8821: program MAC address
to REG_MACID (0x0610-0x0615). Kernel always does this even in monitor;
some chip RX-path logic gates on MACID being non-zero. |
| `HalModule.cpp::rtl8812au_hal_init` | CHIP_8821: 13 trace-derived
post-fwdl writes (REG_AUTO_LLT region, REG_TX_PTCL_CTRL, NAV-related,
8821 BB high-addr block 0x1874-0x187f) |
| `HalModule.cpp::rtl8812au_hal_init` | CHIP_8821: 17 BB/AGC value
overrides (0x0830 PWED_TH, 0x0c20-0x0c44 AGC table, 0x0e90 TX power
region) forced to kernel's observed runtime values |
| `tests/inject_beacon.py` | `--rate` CLI arg (added during rate-decode
hypothesis testing; useful diagnostic knob) |

8812AU + 8814AU paths untouched.

## Why the RX gap probably remains

Aircrack-ng runs **phydm** — a runtime feedback loop that continually
adjusts AGC / RX-gain based on observed signal quality. This PR sets the
chip's *static* values to match kernel's post-init snapshot, but the
chip needs the active loop to *maintain* them during RX. Without phydm
running, the chip drifts back toward defaults that work for higher-SNR
peers (8814) but not for 8812.

Three follow-up paths to actually fix the asymmetry:

1. **Port phydm runtime** for 8821 (substantial — most of
`aircrack-ng/hal/phydm/rtl8821a/`). Days of work, cleanest fix.
2. **Loop-replay**: capture kernel's full register sequence over a
5-second RX window (not just init) and replay periodically from
devourer. Hacky but cheap.
3. **Accept devourer-RX 8821 as "works for some peers, not others"**
until phydm is in. Document the limit in the adapter inventory.

## usbmon methodology (for future trace-diff work)

Capture kernel side (in VM with `aircrack-ng/88XXau`):

```bash
# On VM:
sudo modprobe usbmon
sudo bash -c "cat /sys/kernel/debug/usb/usbmon/0u > /tmp/trace_kernel.txt &"
# Then attach DUT via virsh, bring up monitor mode, stop usbmon
```

Capture devourer side (on host):

```bash
sudo bash -c "timeout 20 cat /sys/kernel/debug/usb/usbmon/Nu > /tmp/trace_dvr.txt" &
sudo ./build/WiFiDriverDemo ...
```

Filter Realtek control writes:

```bash
grep -E "S Co:.* 40 05 " trace.txt | awk '{print $8, $13}'
```

**Important encoding gotcha:** usbmon shows wire bytes in transmission
order (= LE u32's byte sequence). To write the same value via
`rtw_write32` on a LE host, the u32 value is read-as-LE — so the trace
text `82824001` represents u32 `0x01408282` (NOT `0x82824001`). The
first attempt at the post-fwdl writes in this PR used wrong-endian
values and naturally had no effect; the fix was to byte-pair-reverse
them.

## Validation

Single-pair test on the trainer rig (Arch host, Ubuntu 22.04 VM with
aircrack-ng/88XXau, channel 100, 10s):

| Cell | Hits | Note |
|---|---|---|
| kernel-TX 8812 → kernel-RX 8821 (baseline) | 283 ✓ | rig sane |
| devourer-TX 8821 → kernel-RX 8814 | 4663 ✓ | dvr-TX 8821 fine |
| kernel-TX 8812 → devourer-RX 8821 | **0 ✗** | the asymmetry —
unresolved |
| devourer-TX 8821 → devourer-RX 8812 | **0 ✗** | same |

Other chips untouched.

## Test plan

- [x] Builds clean
- [x] CHIP_8812 + CHIP_8814A paths unchanged
- [x] CHIP_8821 init now writes the kernel-equivalent register set
- [x] devourer-TX 8821 still works → kernel-RX 8814 (4663 hits)
- [ ] dvr-RX 8821 ↔ 8812 — known *not* fixed by this PR
- [ ] phydm runtime ported (follow-up)

Refs #30 (initial 8821 port), #34 (full matrix that surfaced the
asymmetry).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant