(ignore) feat(n-vm): in-VM test infrastructure (testn absorption + e1000)#1583
Draft
daniel-noland wants to merge 38 commits into
Draft
(ignore) feat(n-vm): in-VM test infrastructure (testn absorption + e1000)#1583daniel-noland wants to merge 38 commits into
daniel-noland wants to merge 38 commits into
Conversation
Add a `dataplane-lifecycle` crate with `Shutdown` and `Subsystem` primitives, signal-handler installation, and a process-wide shutdown watchdog. `Shutdown` bundles a root `CancellationToken` and one `Subsystem` per long-lived component (workers, router, mgmt, metrics). Each subsystem exposes a per-subsystem cancel token, a `TaskTracker`, and a shared fatal flag. Subsystems drain in topological order with per-subsystem deadlines; the detached watchdog enforces an absolute upper bound on total shutdown duration. No consumers yet -- wired up in follow-on commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Plumb lifecycle Subsystems into the routing crate as the first step of the threading rewrite. The rest of `main`'s shutdown signaling (ctrlc + mpsc<i32> + start_mgmt + MetricsServer + old DriverKernel::start) stays in place; follow-on commits migrate each. - `Router::new` takes `(mgmt, mgmt_handle, router)` Subsystems + runtime handle. Plumbed through `packet_processor::start_router`. - `start_rio` takes `&Subsystem`; the IO loop observes its cancel between poll cycles (worst-case exit latency = 1s poll). Adds an ExitGuard so panic-unwind or unexpected loop exit reports fatal. - `RouterCtlMsg::Finish` removed; `RioHandle::finish` becomes idempotent. - `bmp::spawn_background` spawns onto the caller-provided runtime handle tracked under `mgmt`; no more leaked runtime. - `runtime.rs` builds a multi-thread mgmt runtime (only BMP tenants it in this commit) and a `Shutdown` for plumbing into Router::new. - `dataplane` and `mgmt` Cargo.toml gain `lifecycle` dep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Move mgmt + metrics from dedicated OS threads with private runtimes to tenants of the multi-thread mgmt runtime introduced in the prior commit. The kernel driver still uses the legacy ctrlc + mpsc<i32> path; the final unification commit migrates that. - `run_mgmt` (renamed from `start_mgmt`): synchronous init on the caller-provided handle, then spawns the three long-lived tasks (config processor, status updater, config watcher) via `Subsystem::spawn_fatal_on_exit` so their unexpected exit flips fatal. Init observes `mgmt.root_token()` so SIGINT during k8s retries returns `LaunchError::Cancelled` within cancel latency. - `LaunchError::Cancelled` is a clean-shutdown signal; the call site in `runtime.rs` forwards the existing mpsc stop channel with code 0 so the legacy shutdown path stays consistent. - `spawn_metrics` replaces `MetricsServer`: HTTP endpoint, upkeep ticker, and stats collector all spawn onto `mgmt_handle` tracked under `metrics`. Uses plain `spawn_on` (not `spawn_fatal_on_exit`) — a dead metrics endpoint should not take down the dataplane. - Drop stale `LaunchError` variants no longer constructed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Replace the last legacy shutdown signaling (ctrlc handler + mpsc<i32> exit code + dedicated controller thread) with the single `lifecycle::Shutdown` path, and migrate the kernel driver to scoped threads with cancellation observation. After this commit there is one signaling path: SIGINT/SIGTERM -> shutdown.root, or any subsystem's report_fatal -> shutdown.root, with the watchdog as the absolute upper bound. - `main`: install `spawn_signal_handler` and `spawn_shutdown_watchdog`, run everything inside `concurrency::thread::scope`, block on `root.cancelled()`, then drain subsystems in canonical order (workers -> router -> metrics -> mgmt). Exit code from `is_fatal()`. - `DriverKernel::start`: takes `&Scope` and `&Subsystem`; workers spawn via `spawn_scoped` with an `ExitGuard` Drop pattern that reports fatal on panic-unwind, early `?`-return, and unexpected normal exit. Reader loops observe cancel between reads. Supervisor joins-and-logs. - Drop `dataplane/src/drivers/tokio_util.rs` and its `run_in_local_tokio_runtime` helper (inlined where needed). - Drop `ctrlc` and `mio` from `dataplane` dependencies; drop `ctrlc` from the workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Lets callers derive the rte_acl input-buffer size requirement from a const FIELD_DEFS array at compile time -- useful for feeding into a const generic (e.g. the STRIDE on a DPDK-backed Lookup type alias) rather than rediscovering it at runtime. The runtime path through AclBuildConfig::new still calls into the same helper and caches the result; only the surface broadens from fn to const fn (with the for-loop rewritten as a while-let-i index walk that const-fn-eval can chew on). Existing FieldExtentOverflow validation is preserved; in a const context overflow becomes a compile error instead of UB. Adds one test exercising the const path: const MIN_INPUT_SIZE = AclBuildConfig::compute_min_input_size(&DEFS); plus the assertion that the runtime path returns the same value. just fmt; cargo check -p dataplane-dpdk --all-targets passes.
Introduces shared EAL test scaffolding for downstream crates that
need to exercise rte_acl, mempools, or any other DPDK runtime under
`#[test]`. Two pieces, both behind the new dpdk `test` feature so
production builds remain unchanged:
- dpdk-test-macros: proc-macro crate exposing #[with_eal], which
injects `let _eal = <dpdk_crate>::test_support::start_eal();` at
the top of a #[test] function. Resolves the dpdk crate's path via
proc-macro-crate so the macro works in-tree (`crate`), under the
workspace alias (`::dpdk`), or with the canonical name
(`::dataplane_dpdk`).
- dpdk::test_support: hosts a shared OnceLock<Eal> initialized with
--no-huge / --no-pci / --in-memory and the cpu-affinity-aware
--lcores derivation that was previously inline in acl/mod.rs.
Once-per-process by construction; safe under nextest's per-test
forking and single-process runners alike.
dpdk/src/acl/mod.rs picks up the new macro: every test in the module
loses its `let _eal = start_eal();` prologue and gains a #[with_eal]
attribute above #[test]. The inline start_eal helper and the
module-scoped OnceLock<Eal> static go away.
dpdk/Cargo.toml grows the `test` feature (turns on dpdk-test-macros
plus the id and nix runtime helpers that test_support needs) and a
self-referencing dev-dep `dataplane-dpdk = { path = ".", features =
["test"] }` so dpdk's own tests see the macro surface.
just fmt; cargo check --workspace --all-targets passes.
Introduces a no_std FixedSize trait carrying a known byte width plus a big-endian write. Lives in its own crate so value-providing crates (`net`) and downstream key-packing frameworks (a future `match-action` PR) can both depend on it without depending on each other -- `net` implements FixedSize on its own newtypes without an orphan-rule violation, and the framework doesn't need to know about any particular value supplier. fixed-size/src/lib.rs: - pub trait FixedSize: Copy, with const SIZE: usize and fn write_be(&self, out: &mut [u8]). - Blanket impls for the network-order primitives the framework cares about: u8 / u16 / u32 / u64 / u128 / Ipv4Addr / Ipv6Addr. net/src/fixed_size.rs is the consumer bridge: FixedSize impls for TcpPort, UdpPort, UnicastIpv4Addr, and Vni. Each delegates write_be to the underlying primitive's impl, so wire bytes match what writing the raw integer would produce. Vni is a 24-bit value written into a 4-byte field (high byte zero) because backends model fields in 1/2/4 widths, not 3. net/Cargo.toml grows the fixed-size workspace dep; net/src/lib.rs adds the module behind no cfg gates (it has no further dependencies of its own). just fmt; cargo check --workspace --all-targets passes.
Adds the read-only key/action lookup vocabulary downstream match-action backends will implement. Tiny crate by design (one trait pair, two collection impls, inline tests) so consumer crates depend on this without depending on each other. lookup/src/lib.rs: - pub trait Projection<T>: extracts a key of type T from self. Implemented on `&'a Source` so the lifetime threads into T when T borrows; for owned T the lifetime is unused. The same source can implement Projection<T> for many T -- the call site picks one by inference from a Lookup backend's key type. - impl<K> Projection<Option<K>> for Option<K>: identity, so classify_opt accepts a pre-computed Option<K> directly. Scoped to Option<K> (not a general impl<T> Projection<T> for T) so a backend that implements Lookup<K, A> for every K can't make classify ambiguous. - pub trait Lookup<K, A>: lookup(&K) -> Option<&A>, plus default methods classify (S: Projection<K>) and classify_opt (S: Projection<Option<K>>) that project and look up in one call. Two trait type parameters, so one backend can serve many (K, A) pairs. - Blanket Lookup impls for BTreeMap<K, V> and HashMap<K, V, S> so tests / simple consumers get a working backend out of the box. Inline tests exercise: projection-by-table-type inference at v4 2-tuple and 4-tuple widths, lifetime threading through borrowed projections, miss returning None, classify_opt short-circuit on None projection, the identity Projection<Option<K>> for Option<K>, and HashMap parity with BTreeMap. just fmt; cargo check --workspace --all-targets passes.
Lands the match-action key vocabulary plus its derive. Backends in downstream PRs consume the result via a structural FIELD_SPECS view and a byte-packing as_key_into(); the bolero property-test generators land next as their own feature gate. match-action/src/: - lib.rs publishes the FieldKind enum (Prefix / Mask / Range / Exact), the FieldSpec layout record (name / kind / size / offset), and the MatchKey trait (N, KEY_SIZE, field_specs(), as_key_into()). Trait methods take slices rather than Self::N-sized arrays because Self::N in a trait fn needs unstable generic_const_exprs; the derive emits sized inherent helpers on concrete types instead. - field.rs is a thin re-export of FixedSize from the fixed-size crate, so consumers can refer to it via match_action::FixedSize. - predicate.rs holds the type-erased FieldPredicate form (Prefix, Mask, Range, Exact variants over FieldBytes) plus the Erased backend marker used by the reference oracle. - rule.rs defines the four kind-typed *Spec wrappers (PrefixSpec, MaskSpec, RangeSpec, ExactSpec), the Backend / Accepts / IntoBackendField / IsUniversal traits, and the RuleField envelope carrying name / spec. match-action-derive (proc-macro crate) emits, from a struct annotated with #[prefix] / #[mask] / #[range] / #[exact]: - the MatchKey impl, - a parallel <Name>Rule struct holding the typed *Spec data, - an inherent FIELD_SPECS const for compile-time inspection by downstream backends, and - (for non-generic structs) an inherent as_key() -> [u8; KEY_SIZE] for ergonomic byte packing. tests/derive_roundtrip.rs exercises a 5-tuple-shaped key covering all four predicate kinds: asserts N, KEY_SIZE, the field_specs() content, and that as_key_into / as_key produce the expected big-endian byte layout (with declared field ordering preserved). just fmt; cargo check --workspace --all-targets passes.
Adds property-test generators over the rule wrappers, behind a new `bolero` feature gate so the bolero dep stays out of the default build graph. Downstream consumers (acl property tests, future cascade Upsert-laws harnesses) enable the feature in their dev-deps and draw random matching / non-matching packets for any single rule. match-action/src/generator.rs: - FieldHit / FieldMiss TypeGenerators yielding bytes that satisfy / violate a given field predicate. Cover all four predicate kinds: Prefix (any address sharing the rule's high-order bits / one with flipped bits in that prefix), Mask (value bits match / a flipped in-mask bit), Range (uniform draw in [min, max] / outside the bounds), Exact (the value / a different value). - Miss generators skip universal predicates (range covering all values, mask of all zeros, prefix length zero, etc.); the derive-emitted IsUniversal check on <Name>Rule lets callers detect whole rules that have an empty miss set. match-action/src/lib.rs picks up #[cfg(feature = "bolero")] gates on the generator mod + its FieldHit / FieldMiss re-exports. match-action/Cargo.toml grows the bolero optional dep and the `bolero` feature, which also turns on the matching forwarded feature on match-action-derive (a no-op today, kept for future emit changes in the derive). match-action-derive/Cargo.toml mirrors the `bolero` feature declaration so cargo can forward through. just fmt; cargo check --workspace --all-targets and cargo check -p dataplane-match-action --features bolero pass.
Introduces the dataplane-acl crate with the software reference classifier. The DPDK rte_acl backend lands behind a feature gate in a follow-up PR. The reference backend is a linear-scan software classifier built on the canonical FieldPredicate form from match-action (rule.into_backend_fields::<Erased>()), so it speaks the same four predicate kinds (Prefix / Mask / Range / Exact) as every other backend. Two roles: 1. Differential-testing oracle against rte_acl (a future PR's differential property tests pit both backends against the same random rule + packet draws). 2. Non-lossy substrate for a small-delta cascade front over a slow tail backend. Layout: - src/lib.rs declares the crate-level docs and re-exports the reference module. The dpdk feature gate and dpdk_table_alias! macro land alongside the rte_acl backend itself in the next PR. - src/reference/table.rs is the typed surface: ReferenceTable<K, A> parameterised by a MatchKey and an action; RefRule wraps the lowered Erased predicates plus an action. Inline unit tests cover positional precedence (first match wins) and the four predicate kinds. - src/reference/dyn_table.rs is the runtime-shape twin: DynReferenceTable carries its FieldSpec layout at runtime so property tests can fuzz the schema itself. Returns DynShapeError on shape mismatch. just fmt; cargo check --workspace --all-targets passes.
Lands the static type machinery for the DPDK `rte_acl` backend
behind the new `dpdk` feature gate: how a MatchKey's FIELD_SPECS maps
into rte_acl's per-field FieldDef array, and how the four
match-action *Spec predicate kinds lower into IntoBackendField for
the `Dpdk` backend marker. The runtime install / classify path
(install.rs, lookup.rs) and the dpdk_table_alias! macro land next.
acl/src/dpdk/:
- mod.rs declares the two submodules; carries a temporary
#![allow(dead_code)] because the layout's `stride` field and the
rule.rs RuleSpec fields are consumed only once install / lookup
arrive in the next PR. The allow goes away then.
- layout.rs has the rte_acl field planner: group fields by
input_index (rte_acl requires the first field to be one byte,
remaining fields grouped into <= 4-byte buckets), insert padding
for gaps, and yield a DpdkLayout { field_defs: [FieldDef; N],
stride, user_to_dpdk }. const_extents() is const fn so a const
alias can derive N / STRIDE from K::FIELD_SPECS without unstable
generic_const_exprs. Wide fields (Ipv6Addr, u128) decompose into
four u32 sub-fields the way l3fwd-acl does.
- rule.rs holds the Dpdk backend marker, the AclWord trait (blanket
impl over FixedSize via chunks()), the IntoBackendField impls
carrying each *Spec into a backend-typed AclField group, the
RuleSpec rule-field envelope, and splice_user_fields_to_dpdk for
reordering user-declared fields into rte_acl's layout-driven
ordering.
acl/src/lib.rs picks up the #[cfg(feature = "dpdk")] gate on
pub mod dpdk; (no macro yet -- the dpdk_table_alias! macro lands with
its lookup-side referent next PR).
acl/Cargo.toml grows the dpdk feature and the optional dpdk
workspace dep. No dev-deps yet.
just fmt; cargo check --workspace --all-targets and
cargo clippy -p dataplane-acl --features dpdk -- -D warnings pass.
Wires the layout planner and rule lowering from the previous PR into a working DPDK backend: build an AclContext from a MatchKey plus its rules, wrap it in a DpdkAclLookup, and classify packets through it. First EAL-touching PR in the acl stack. src/dpdk/: - install.rs is the from-K-plus-rules constructor: take a MatchKey, call plan_layout to get the rte_acl FieldDefs, build an AclContext, splice each user RuleSpec through layout's user_to_dpdk map into rte_acl's column order, hand the rules to the context, build, and wrap the built context in a DpdkAclLookup<K, N, STRIDE, A>. - lookup.rs is DpdkAclLookup itself: stack-packed key bytes (MAX_USER_KEY_BYTES sentinel feeds the compile-time guard in dpdk_table_alias!), the impl Lookup<K, A> single-shot path, and a batched classify_batch over a slice of K returning aligned actions. - mod.rs picks up pub mod install / pub mod lookup and drops the temporary #![allow(dead_code)] from the previous PR -- RuleSpec fields and DpdkLayout.stride now have readers. src/lib.rs gains the dpdk_table_alias! macro: dpdk_table_alias!(pub type FiveTupleTable<Verdict> = FiveTuple); yields a DpdkAclLookup<K, N, STRIDE, A> with N / STRIDE derived from K::FIELD_SPECS via const_extents. A const _: () = assert!(KEY_SIZE <= MAX_USER_KEY_BYTES) guards against keys that wouldn't fit the stack scratch buffer. The hidden __match_action module re-exports MatchKey so the macro resolves without a caller-side import. tests/eal_install_classify.rs is the smoke: derive a MatchKey, install two rules with priority precedence, classify via the single-shot path and the batch path, assert userdata. acl/Cargo.toml grows a single dev-dep -- self-overriding dpdk with the `test` feature on so #[with_eal] from dpdk-test-macros works. just fmt; cargo check --workspace --all-targets and cargo clippy -p dataplane-acl --features dpdk -- -D warnings pass.
Adds the runtime-shape twin of DpdkAclLookup -- DynDpdkLookup carries its FieldSpec layout at runtime instead of in const generics -- and the shape-fuzz oracle that proves the byte-level pipeline agrees between the reference oracle and rte_acl over an unconstrained schema. src/dpdk/dyn_table.rs is DynDpdkLookup<A>: - new(name, max_rule_num, field_specs) plans the rte_acl layout from a Vec<FieldSpec> at runtime, builds an empty AclContext, and returns a typed lookup keyed by an Erased FieldPredicate vector. - add_rules takes Vec<DynRuleSpec> -- the runtime-shape rule carrier (priority, category_mask, lowered fields, action) -- and splices each rule's field bytes through the user_to_dpdk map into rte_acl's column order, then builds. - impl Lookup<Vec<FieldBytes>, A>: pack the probe bytes onto the stack scratch buffer in the layout's column order, hand them to rte_acl_classify, and translate the userdata hit back to &A. src/dpdk/mod.rs picks up pub mod dyn_table; alongside the typed path. tests/property_dyn_shape.rs is the schema fuzz: - bolero TypeGenerator yields a random Vec<FieldSpec>, a single rule matching that shape, and packet seeds. - For each shape: install the same rule into a DynReferenceTable (oracle) and a DynDpdkLookup, then probe both with both a hit-byte seed and a miss-byte seed. Assert agreement on every probe. - No MatchKey types involved -- exercises the byte-level pipeline end-to-end and catches drift in layout planning, the splice map, and rte_acl's per-predicate semantics simultaneously. acl/Cargo.toml gains bolero + match-action[bolero] dev-deps the test needs. just fmt; cargo check --workspace --all-targets and cargo clippy -p dataplane-acl --features dpdk -- -D warnings pass.
Pure test broadening; no src changes. Adds the single-rule v4/v6 differential against the reference oracle and three Headers / metadata projection demos that exercise classify / classify_opt against real net::HeadersView packets. tests/property_predicate.rs is the differential. For a random 5-tuple rule + random hit/miss byte seeds drawn via match-action's FieldHit / FieldMiss generators, both the reference oracle and the DPDK backend must accept every hits() draw and reject every misses() draw. Parameterised over the address width via a sealed IpAddress trait so a single body covers v4 (Ipv4Addr) and v6 (Ipv6Addr) -- the DPDK wide-field split (one 16-byte address -> four 4-byte sub-fields) is exercised end-to-end by the v6 invocation. Single rule only; multi-rule differential is deferred (positional precedence vs numeric Priority). tests/eal_classify_via_projection.rs is the end-to-end projection demo: a real packet -> HeadersView -> Projection<FiveTuple> -> DPDK Lookup<FiveTuple, _> -> action. Shows Lookup::classify runs the projection and the lookup as a single call -- the call site reads table.classify(\&headers) and doesn't see the intermediate key construction. tests/metadata_projection.rs is the partial-projection demo. Header fields live in Headers; VRF / VNI live in PacketMeta. A projection source bundles &HeadersView with &PacketMeta and projects to Option<K>: the header part is total (shape proves presence), the metadata part narrows from its Option with ?. Missing metadata projects to None and Lookup::classify_opt turns that into a table miss with no explicit branch in user code. tests/net_field_types.rs uses net wire newtypes (TcpPort, UdpPort, Vni, UnicastIpv4Addr) directly as MatchKey fields with no acl-side AclWord impl, leaning on net's FixedSize impls (PR 2a) and the DPDK backend's blanket AclWord-over-FixedSize impl. acl/Cargo.toml grows the net[test_buffer, builder] dev-dep these projection demos need. just fmt; cargo check --workspace --all-targets and cargo clippy -p dataplane-acl --features dpdk -- -D warnings pass.
Adds criterion benchmarks for both backends at v4 and v6 widths, plus the nix / just plumbing to produce bench binaries from a sandboxed build. acl/benches/: - reference_five_tuple.rs sweeps a deep miss (full per-rule scan) and an early hit through the reference's O(rules * fields) linear scan. Both widths. - dpdk_five_tuple.rs is the rte_acl companion: trie walk cost (close to flat in rule count), miss vs hit, single-shot vs SIMD batch. v6 exercises the wide-field split (one 16-byte address -> four 4-byte sub-fields). Requires a live EAL. - table_build.rs measures construction cost vs rule count: reference (lower + Vec wrap) and DPDK (rte_acl_build, the update-latency cost). Both widths. iter_batched so teardown is excluded. acl/Cargo.toml gets the criterion dev-dep and three harness = false [[bench]] entries. Workspace Cargo.toml gets the criterion = 0.5.1 shared dep entry. default.nix adds a bench-builder derivation: cargo bench --no-run under the profile-appropriate DPDK sysroot, then copies each compiled benchmark into $out/bin (stripping cargo's -<hash> suffix). Linked against the optimized DPDK when profile = release. justfile adds a bench recipe that builds the benches package and runs every binary under results/benches/bin/ in turn. just fmt; cargo check --workspace --all-targets and cargo clippy -p dataplane-acl --features dpdk -- -D warnings pass.
Introduces dataplane-cascade. Models a small in-memory LSM: writes land in a concurrent multi-writer head, periodically frozen into immutable intermediate layers, eventually compacted into an immutable tail. Readers walk head -> frozen[] -> tail and stop at the first definitive answer. The same primitive serves three problems via one Upsert trait: match-action table updates (atomic publish under load), hardware offload programming (drain output feeds the HW backend), and active-active state replication (serialized drain output ships to peer dataplanes). freeze / fuse / compact / merge are all the same Upsert operation applied to different operands. Layout: - lib.rs publishes the surface: Cascade / DrainEvent / FrozenEntry / Snapshot, the Upsert / MergeInto / MutableHead traits, Generation, and re-exports Lookup / Projection from the lookup crate. - cascade.rs is the central type plus the rotate / drain / publish state machine. Cascade::subscribe is feature-gated under `subscribe` (tokio broadcast). - head.rs / merge.rs / upsert.rs are the trait definitions plus LastWriteWins as a stock blanket-friendly Upsert impl. - generation.rs carries the monotonic Generation counter that orders layers within a cascade. - property_tests.rs is a reusable bolero harness (under the bolero feature) consumer crates use to verify their own Upsert impls against the cascade's algebraic laws. Exercised by a self-test in the next PR. tests/smoke.rs is the trait-shape end-to-end with a trivial concrete implementation: head shadows sealed shadows tail, tombstones in the head suppress lower-layer hits, rotate seals and publishes. Uses the shared tests/common/mod.rs helpers that subsequent tests also share. Cascade is independent of every other PR in this stack: no acl dep, no match-action dep. Real-shape consumer pressure plus bolero / subscribe integration tests land in the next PR. just fmt; cargo check --workspace --all-targets passes.
Three integration tests that round out the cascade's test surface -- each exercises a different part of the design pressure that shaped the trait surface in the previous PR. tests/acl_consumer.rs is the first real-shaped consumer: a minimal ACL classifier built on top of the cascade with no upfront ACL crate dep -- the toy ACL is built inline using only Cascade / MergeInto / MutableHead. Purpose: surface design pressure on the trait surface against a use case that is NOT exact-match-keyed. ACL classification looks up packets by header match expressions, not by a single key, and rules carry their own identity (priority) separate from the lookup input. If the cascade trait shape works for ACL it almost certainly works for anything simpler. Scope intentionally tight: rules install-only (no shadow-rule removal yet), match expressions src/dst IPv4 with optional single port, head Lookup always returns None (writes visible only after a rotation seals the head). Comment block at the end captures the open question about removal under cascade semantics. tests/upsert_properties.rs is the self-test of the property harness landed in the previous PR. Exercises check_upsert_order_independent against the provided LastWriteWins -- known correct by construction. If it fails, the harness itself is broken; if it passes, it's a useful black-box check for downstream Upsert impls. tests/subscribe.rs is the drain-subscription integration under tokio. Exercises Cascade::subscribe end-to-end -- gated by the cascade `subscribe` feature (enabled in dev-deps via the self-path override on Cargo.toml). just fmt; cargo check --workspace --all-targets passes.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
eda6318 to
5394823
Compare
Absorb the `n-vm`, `n-it`, `n-vm-macros`, and `n-vm-protocol` crates from the external `testn` repository into the workspace and add the supporting infrastructure for running tests inside a QEMU/cloud-hypervisor guest. This is a squashed revival of the `n-vm-again` branch, rebased onto the current ACL/dpdk line. The branch had forked before the ACL module existed; its stale fork of `dpdk/`, `dpdk-sys/`, and `dpdk/src/acl/*` has been dropped in favour of the authoritative versions on the base, keeping only the orthogonal VM test infrastructure. Contents: - `n-vm` / `n-vm-protocol`: QEMU and cloud-hypervisor backends, dynamic vsock allocation, hugepage and NIC-model configuration, scratch-only container mode with nix-store bind mounts. - `n-vm-macros`: the `#[in_vm]` attribute and companion attributes (`#[network]`, guest/hypervisor config), compile-fail trybuild tests. - `n-it`: in-guest init system (PID 1) with mount-table-driven teardown. - `nix`: `linux-fancy` kernel built from config fragments (VFIO, IOMMU, virtio, e1000/e1000e), `testroot`/`vmroot` derivations, `merge-config.nix`. - `hardware` / `dpdk-sys`: e1000/e1000e NIC binding and `rte_net_e1000` PMD linkage; `driver()` returns `Ok(None)` for unbound devices. - `mgmt`: re-enable `test_sample_config` under `#[in_vm]`. - `justfile`: export `N_VM_TEST_ROOT`/`N_VM_VM_ROOT` for `#[in_vm]` tests. The `hardware/tests/dpdk_in_vm.rs` integration test (virtio-net/e1000/ e1000e) is intentionally NOT carried forward: it was written against the pre-rewrite dpdk device API (`StartedDev` queue handles, `RxOffloadConfig`, public `Headers` fields) and needs a port to the current API. It is preserved on the `backup/n-vm-again` branch for a clean revival. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the cross-architecture `#[in_vm]` pipeline run end-to-end and remove the x86-centric framing of architecture-divergent VM config. - container: run the foreign test binary under user-mode QEMU inside the container (mirroring `scripts/test-runner.sh`) rather than relying on a host `binfmt_misc` handler. The `qemu-<arch>` interpreter is resolved on PATH to its `/nix/store` path, reachable via the bind-mounted store. - cloud-hypervisor: set `mergeable = false` in the memory config; current cloud-hypervisor rejects `mergeable` together with `shared`, and `shared` is required for virtiofs. - Arch is now an explicit dimension threaded through the QEMU arg builders and `build_kernel_cmdline` (not `Arch::current()` buried in each), so the per-ISA lowering is unit-tested for both ISAs on any build host. - vIOMMU is folded into `Arch::virtual_iommu_device()`; an `iommu = true` request on an ISA with no lowering now resolves to a graceful skip in the host tier (reusing the cloud-hypervisor-cross skip path) instead of a launch panic. - boot spike: also validate virtio-net / e1000 / e1000e driver binding in the guest. Validated: x86 `just test n-vm` 184/184; aarch64 `just platform=aarch64 libc=musl test n-vm` 185/186. The lone aarch64 failure is the `#[should_panic]` control test, whose deliberate panic comes out of the TCG guest as SIGSEGV -- a separate runtime issue, deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`#[should_panic]` cannot compose with `#[in_vm]`. The generated function runs under libtest at all three dispatch tiers (host, container, guest), and a panic is absorbed at whichever tier produces it, so the semantics are incoherent: when a guest test fails, the `run_container_tier_for` "VM test failed" panic is caught by the *container-tier* libtest's own `#[should_panic]` (so the container reports "1 passed"), the host then sees a clean exit, and the host-tier `#[should_panic]` fails with "did not panic as expected". On aarch64 this is reliably hit because the guest's panic faults (SIGSEGV) before it can be caught in-guest. Reject the combination in the macro with a `compile_error!` that directs the author to assert the failure condition in the body instead, add a trybuild compile-fail case, and drop the broken `test_which_runs_in_vm_control` (the harness's failure-detection is covered by the verdict-decode unit tests). With this, `just platform=aarch64 libc=musl test n-vm` is fully green (185/185). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The vIOMMU realization was spread across three `Arch` methods -- the QEMU `-device` string, the `kernel-irqchip=split` machine option, and the guest kernel IOMMU params -- so adding a new ISA's vIOMMU meant remembering to touch all three, and they could drift apart. Replace them with a single `Arch::virtual_iommu() -> Option<VIommuLowering>` carrying `device` + `machine_opts` + `kernel_params` as one object; `supports_virtual_iommu()` derives from it. Adding aarch64 SMMUv3 (a follow-up) is now filling in one struct rather than editing several methods. Behavior-preserving: the emitted QEMU args and guest kernel command line are byte-identical on x86_64, and aarch64 still has no vIOMMU (`None`). Verified by the per-ISA builder and cmdline unit tests plus a new coherence property test (an ISA has a complete lowering or none -- the pieces can't half-apply). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
aarch64's vIOMMU is QEMU's `virt` SMMUv3, lowered as a `-machine iommu=smmuv3` option (not a `-device` like x86 intel-iommu) and auto-probed from the device tree (no kernel command-line opt-in). Add it as the aarch64 `VIommuLowering` (`device: None`, `machine_opts: "iommu=smmuv3"`), make `VIommuLowering.device` an `Option` so a machine-option-only IOMMU is expressible, and add `CONFIG_ARM_SMMU_V3` to the guest kernel. `supports_virtual_iommu()` is now true for aarch64, so the host-tier skip flips off and `#[hypervisor(iommu)]` tests run on the cross path instead of skipping. Validated under TCG (scripts/n-vm-aarch64-smmu-spike.sh): the SMMUv3 probes, PCI devices (incl. e1000) land in their own IOMMU groups so vfio-pci/DPDK is viable, and `iommu_platform=on,ats=on` is accepted without faults. The three `#[hypervisor(iommu)]` integration tests now run and pass on aarch64 -- the vhost-vsock verdict channel and vhost-user-fs root both work behind the SMMU -- with the suite at 186/186. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The in_vm test VMs always run with `-nographic`, so QEMU's GUI display
backends (gtk/sdl/vnc/spice/...) are dead weight that dragged
gtk4/gtk3/cairo/pango/vte/libepoxy/SDL into every test/dev root. This
was invisible on the native path (the cached `qemu_kvm` simply fetched
that closure) but became loud on the cross path, where building
`qemu-system-aarch64` from source meant compiling all of GTK.
Select nixpkgs' headless `nixosTestRunner` QEMU profile instead:
- Native guest: the prebuilt, cache-hit `qemu_test`
(= `qemu_kvm` + `nixosTestRunner`) -- host-cpu-only, KVM, no GUI.
Drops GTK from the common devroot with no loss of the binary cache.
- Cross guest: the base `qemu`, headless and restricted to just the
build-host + guest `*-softmmu` targets (the build-host target keeps
QEMU's `qemu-kvm` compat symlink from dangling and tripping the
`noBrokenSymlinks` install check). A genuine-cross `pkgsBuildHost`
qemu is never in the binary cache regardless (Hydra does not build
that derivation), so trimming targets + GUI shrinks that unavoidable
build from a 2247 MB closure to 582 MB.
`nixosTestRunner`'s only non-GUI effect is a 9p uid0 patch we never
exercise (we mount the guest root via vhost-user-fs, not `-virtfs`).
Validated gtk-free, full suites green: x86 186/186 (`qemu_test`, KVM)
and aarch64 186/186 (slim cross qemu, TCG).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `compile_fail` trybuild test shells out to `cargo` at run time to
compile each case for the build target. Under cross emulation
(`--cfg emulated`, set by nix/profiles.nix when the test arch differs
from the host) the test binary runs via qemu-user and the build target's
std/core is unavailable, so every case fails with E0463 ("can't find
crate for `core`") instead of the diagnostics it asserts -- which failed
`just libc=musl platform=bluefield3 test`.
Gate the test with `#[cfg_attr(emulated, ignore)]`, matching the repo's
existing convention (routing/nat/mgmt). The macro's compile-time
diagnostics are arch-independent, so the native run is full coverage.
n-vm-macros' only test is then skipped, so an isolated
`just test n-vm-macros` cross run would trip nextest's no-tests-run
error (exit 4); add `--no-tests pass` to the interactive `test` recipe to
match what `test-each` already does for per-package archives.
Validated: native runs + passes (1/1), cross skips (exit 0).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5394823 to
a64c165
Compare
ContainerGuard::into_result disarmed both safety nets (the defused flag and the background CleanupThread) before calling collect_and_cleanup. That helper can return early on inspect failure or missing state, before remove_container runs -- and at that point Drop is a no-op, so the container leaks on exactly the error path the guard exists to cover. Collect and remove first; mark defused / defuse the thread only after the removal succeeds, so an early `?` propagates with the guard still armed and Drop triggers emergency removal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
waitpid(-1, WNOHANG) reports "no children remain" as Err(ECHILD), not WaitStatus::StillAlive. The reap loop fell through to the catch-all error arm and logged `unexpected errno from waitpid in init: ECHILD` on every shutdown round (and once per SIGTERM round in terminate_remaining_ processes). Match ECHILD explicitly and break cleanly; keep the warning for genuinely unexpected errnos. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…efix TestResult::parse stripped WIRE_PREFIX with no boundary check, so a line like `n-it-resultpass` stripped to a `pass` verdict. For a pass/fail parser whose contract is "garbled verdict means failure", a spurious pass is the dangerous direction. Require whitespace immediately after the prefix (the separator the wire form always emits) and add a regression test. Also replaces the box-drawing section dividers in this file with ASCII per the repo's ASCII-only source convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A bare `#[in_vm] fn foo() {}` with neither `#[test]` nor `#[tokio::test]`
compiled to an ordinary function that libtest never collects, so the test
silently never ran -- the worst failure mode for a test harness. Emit a
clear compile error instead, checked after the more specific diagnostics
(bad backend, params, return type, NIC/backend mismatch) so those still
take precedence. Adds a missing_test_attr compile-fail fixture.
Also makes the migrated-option and multiple-backends diagnostics ASCII
(em-dash and ellipsis -> `--` / `...`), re-blessing their .stderr.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The repo convention is ASCII-only source. Replaces box-drawing section dividers (U+2500) with `-`, the multiplication sign (U+00D7) in topology comments and the hugepage error message with `x`, and em-dashes (U+2014) in doc comments with `--`. No behavioral change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The npm-distributed ckb (`npx @tastehub/ckb`) is upstream's `build-fast` variant: CGO_ENABLED=0, no `-tags cartographer`. That compiles out the tree-sitter AST analyzers, so `ckb review` reports "Complexity analysis not available (tree-sitter not built)" -- the complexity/bug-pattern code is `//go:build cgo` with a `!cgo` stub. Build ckb from source with CGO on so complexity analysis works. It is Rust-aware (internal/complexity imports go-tree-sitter's rust grammar and dispatches LangRust); verified against n-vm/src/qemu/qmp.rs, which the npm binary refuses with "requires CGO". Pinned via npins at v9.2.0 to match the existing index.scip schema (v11). Bug-pattern detection stays Go-only (Go-specific node types) -- harmless on this Rust repo. The cartographer backend (layers/arch-health), which links a Rust static lib via `-tags cartographer`, is deferred: it is a much heavier build and we want to gauge the complexity analysis first. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4915824 to
e5d8a01
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Absorb the
n-vm,n-it,n-vm-macros, andn-vm-protocolcrates from the externaltestnrepository into the workspace and add the supporting infrastructure for running tests inside a QEMU/cloud-hypervisor guest.This is a squashed revival of the
n-vm-againbranch, rebased onto the current ACL/dpdk line. The branch had forked before the ACL module existed; its stale fork ofdpdk/,dpdk-sys/, anddpdk/src/acl/*has been dropped in favour of the authoritative versions on the base, keeping only the orthogonal VM test infrastructure.Contents:
n-vm/n-vm-protocol: QEMU and cloud-hypervisor backends, dynamic vsock allocation, hugepage and NIC-model configuration, scratch-only container mode with nix-store bind mounts.n-vm-macros: the#[in_vm]attribute and companion attributes (#[network], guest/hypervisor config), compile-fail trybuild tests.n-it: in-guest init system (PID 1) with mount-table-driven teardown.nix:linux-fancykernel built from config fragments (VFIO, IOMMU, virtio, e1000/e1000e),testroot/vmrootderivations,merge-config.nix.hardware/dpdk-sys: e1000/e1000e NIC binding andrte_net_e1000PMD linkage;driver()returnsOk(None)for unbound devices.mgmt: re-enabletest_sample_configunder#[in_vm].justfile: exportN_VM_TEST_ROOT/N_VM_VM_ROOTfor#[in_vm]tests.The
hardware/tests/dpdk_in_vm.rsintegration test (virtio-net/e1000/ e1000e) is intentionally NOT carried forward: it was written against the pre-rewrite dpdk device API (StartedDevqueue handles,RxOffloadConfig, publicHeadersfields) and needs a port to the current API. It is preserved on thebackup/n-vm-againbranch for a clean revival.