Skip to content

Big, beautiful CAN/UDP/core cleanup: dedup constants, fix TX/RX defects, harden node lifecycle#388

Merged
pavel-kirienko merged 2 commits into
masterfrom
dev
Jun 19, 2026
Merged

Big, beautiful CAN/UDP/core cleanup: dedup constants, fix TX/RX defects, harden node lifecycle#388
pavel-kirienko merged 2 commits into
masterfrom
dev

Conversation

@pavel-kirienko

Copy link
Copy Markdown
Member

Folks, this is a BIG, BEAUTIFUL cleanup. Maybe the best cleanup this transport stack has ever seen.

We took the CAN, the UDP, and the core session layer — tremendous code, but it had problems, BAD problems — and we fixed them. Bigly. Nobody dedups constants like we do. Believe me.

The WINS (and there are so many wins)

Constants — we drained the swamp. The 29-bit CAN ID mask was copy-pasted all over town. Four times! Sad. Now it lives in ONE place (can/_interface.py), with its 11-bit friend, and everybody imports from there. And that socketcan "MTU" of 16? That was never the MTU, folks — that was the frame-struct size wearing a disguise. We gave it an honest name and derived it from struct.size. No more fake numbers.

Correctness — we fixed the rigged behavior:

  • Redundant sends now WIN when they win. UDP subject and unicast: if the transfer goes out on at least one interface, that's a SUCCESS — not a phony ExceptionGroup that makes you retry and send duplicates. Just like the CAN transport. Just like the reference. Tremendous.
  • python-can TX retries keep their place in line. They used to cut the line with the global counter — total chaos, out-of-order frames. Not anymore.
  • socketcan never gets wedged again. An oversized frame is dropped, honestly, with a ValueError — it does not silently kill the TX task. And the worker tasks now report their crashes instead of disappearing.
  • The receive loop is UNKILLABLE. A handler that raises? The loop keeps going. Strong borders.
  • The node knows when it's closed. No more using a closed node. No more late arrivals sneaking in. Detached sends are tracked. And a pending async for gets unblocked on close — nobody gets left hanging.
  • Gossip names get VETTED at the border. Garbage wire names — non-normalized, wildcards, homeful, pinned — do NOT get to become local topics. We check the papers.

Simplification — we cut the waste: dead code GONE (deadline_ns, data, _now_mono, send_scout, _arm_reorder_timeout, _Fragment.crc), duplicate deadline checks collapsed, the ACK-window helper de-duplicated, python-can's twin retry branches merged. Smaller. Cleaner. Faster to read at 3am.

The REVIEW. Very thorough. The best people.

We ran a world-class review operation — Codex (xhigh) and Claude (ultrathink), fresh context, three rounds. They found THREE real defects I'd have shipped (the incomplete unicast fix, a weak name guard, and one hollow test that passed on broken code — disgraceful, now fixed). Every behavior change ships with a regression test that FAILS on the old code. That's how you do it.

One reviewer kept insisting on a "fix" to on_gossip name handling. We checked. Twice. It was a FALSE POSITIVE — those paths key on the hash, not the name, and the fix would've broken the intentional monitor diagnostics. We refuted it with the code and a wire-path test. We don't fix things that aren't broken.

The numbers don't lie

Full nox is GREEN — Python 3.11, 3.12, 3.13, plus mypy, lint, and format. 657 tests passing. No public API changes. Zero.

20 files, +523 / −141. A clean, beautiful diff.

🤖 Generated with Claude Code

pavel-kirienko and others added 2 commits June 20, 2026 01:44
… lifecycle

Mask/MTU deduplication:
- Centralize CAN_EXT_ID_MASK and add CAN_STD_ID_MASK in can/_interface.py
  (the dependency-free leaf); import from there in _wire/pythoncan/_media_slcan.
- Replace socketcan's misnamed _CAN_CLASSIC_MTU/_CAN_FD_MTU (frame-struct sizes,
  not payload MTUs) with _CLASSIC_FRAME_SIZE/_FD_FRAME_SIZE derived from struct.size.

Correctness:
- UDP subject and unicast send: a transfer delivered on >=1 redundant interface is
  a success (warn), not an ExceptionGroup raise, matching the CAN transport and the
  reference cy_udp_posix push semantics.
- pythoncan TX retry re-enqueues with the original seq (was the global counter),
  preserving intra-transfer frame order.
- socketcan _encode raises ValueError (not ClosedError) for an oversized Classic
  frame; the TX loop drops it instead of letting it wedge the task; add TX/RX-task
  done-callbacks so unexpected crashes surface via _fail.
- UDP Interface.mtu_link validated in __post_init__ (not a -O-strippable assert).
- UDP RX subject/unicast handlers invoked inside a fault boundary so a raising
  handler cannot kill the receive loop.
- Node rejects use-after-close on advertise/subscribe/scout/monitor/remap and drops
  post-close arrivals; detached ACK/gossip sends tracked via _spawn_detached;
  node.close() closes subscribers so a pending `async for` is unblocked.
- Gossip names are validated (normalized, verbatim, printable, non-homeful, pin-free)
  before creating an implicit topic, so untrusted wire input cannot.

Simplification:
- Delete dead code (PublishTracker.deadline_ns/data, _now_mono, send_scout,
  _arm_reorder_timeout, _Fragment.crc); collapse duplicate deadline checks in the CAN
  TX loops; merge pythoncan's identical retry branches; de-duplicate the reliable
  ACK-window helper into _node (ack_window/ack_is_last_attempt).

Tests: add regression coverage for every behavior change (each fails on the unfixed
code); full nox green on Python 3.11/3.12/3.13 + mypy + lint + format.

Docs: add the multi-agent "Review team" process to CLAUDE.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pavel-kirienko pavel-kirienko merged commit 73718bb into master Jun 19, 2026
22 checks passed
@pavel-kirienko pavel-kirienko deleted the dev branch June 19, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant