Skip to content

feat(dronecan): on-demand GetNodeInfo, GetSet, ExecuteOpcode, RestartNode via async slot#11683

Draft
daijoubu wants to merge 95 commits into
iNavFlight:maintenance-10.xfrom
daijoubu:feature/dronecan-param-getset
Draft

feat(dronecan): on-demand GetNodeInfo, GetSet, ExecuteOpcode, RestartNode via async slot#11683
daijoubu wants to merge 95 commits into
iNavFlight:maintenance-10.xfrom
daijoubu:feature/dronecan-param-getset

Conversation

@daijoubu

@daijoubu daijoubu commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds an on-demand async service slot (dronecanAsyncSlot) that serialises GetNodeInfo, param GetSet, ExecuteOpcode, and RestartNode requests through a single IDLE→PENDING→READY/ERROR state machine, avoiding per-service response queues
  • Exposes all four services over MSP (MSP2_INAV_DRONECAN_ASYNC_REQUEST / MSP2_INAV_DRONECAN_ASYNC_RESULT) so the configurator can read/write node parameters and trigger node restarts without custom DroneCAN tooling
  • Extends the node table MSP response with elapsed_ms (time since last NodeStatus)
  • 29 unit tests covering node table management, shouldAcceptTransfer filter, all four guard-rejection paths, all five param.Value types (INT/FLOAT/BOOL/STRING/EMPTY) with min/max range fields, ExecuteOpcode ok/fail, RestartNode, and the re-entry guard

Dependencies

⚠️ Stacked on #11607 (DroneCAN H7/F7 driver fix) — please do not merge until that lands. The diff includes those commits; this PR's unique changes start at 42dabdc1a.

Configurator branch: daijoubu:feature/dronecan-configurator-tab covers the UI for GetNodeInfo and param GetSet.

Test plan

  • Build matrix: F4 (MATEKF405), F7 (KAKUTEF7), H7 (KAKUTEH7WING), AT32 (IFLIGHT_BLITZ_ATF435), SITL — all pass, no warnings
  • Unit tests: 29/29 pass
  • Hardware: KAKUTEH7WING — DNA server node allocation confirmed working; node params readable via configurator
  • Code review: 4 passes, all findings resolved

daijoubu added 30 commits July 1, 2026 08:02
SJW=8 was overly conservative (80% of bit time at 1Mbps with 10 quanta).
SJW=3 is the standard value also used by the F7 driver.

Tested with 6037 arm/disarm cycles at 500kbps: TEC=0, REC=0, zero errors.
…DCAN

PLL2Q was 3 (266 MHz, invalid for FDCAN ≤ 80 MHz). Fix to 10 (80 MHz).
Extend PLL2 guard from USE_SDCARD_SDIO to USE_SDCARD_SDIO || USE_DRONECAN
so H7 boards with CAN but no SD card get PLL2 configured. Adopt upstream
PLL2M/N formula (VCI=1.6 MHz, VCO=800 MHz) and error check on
HAL_RCCEx_PeriphCLKConfig.
…pport

Remove redundant PeriphClkInitStruct clock config from canardSTM32CAN1_Init.
system_stm32h7xx.c already configures FDCAN to use PLL2Q (80 MHz) when
USE_DRONECAN is defined; duplicating it in the driver overwrites with PLL1.

Also add CAN1 pin definitions and USE_DRONECAN to KAKUTEH7WING target
(PD0/PD1, CAN1_STANDBY PD3 disabled by default).
Use HAL_RCCEx_GetPeriphCLKFreq(RCC_PERIPHCLK_FDCAN) instead of
HAL_RCC_GetPCLK1Freq() for bit timing calculation. FDCAN is clocked
from PLL2Q (80 MHz) configured in system_stm32h7xx.c; using PCLK1
(100 MHz) produced a ~25% baud rate error causing immediate bus-off.

Restore SJW to 3 for better synchronisation tolerance.
Remove high-frequency LOG_DEBUG messages from GNSS Fix/Fix2/Auxiliary
handlers, onTransferReceived, dronecanInit, and gps_dronecan HDOP path
that fired at 25 Hz and flooded the log.

Fix PLL2 VCO input to target 1.6 MHz (PLL2M = HSE/1600000, PLL2N = 500)
rather than 2.0 MHz, keeping the operating point clearly within
VCIRANGE_0 (1-2 MHz) as the original SDCARD-only code did with PLL2M=5.
VCO output remains 800 MHz; FDCAN (80 MHz via PLL2Q=10) and SDMMC
(200 MHz via PLL2R=4) outputs are unchanged.
Drop high-frequency and verbose-but-low-value LOG_DEBUG(CAN messages:
- dronecan.c: Battery Info (x2), GetNodeInfo, NodeStatus, TX success,
  RX loop, commented-out debug blocks
- canard_stm32h7xx_driver.c: timing computation intermediates
  (Baudrate, Max Quanta, Prescaler BS, Prescaler, Timings summary)
- canard_stm32f7xx_driver.c: same timing intermediates, TX success,
  In CAN Init, commented-out clock and RX blocks

Retain error-path messages (decode failed, TX/RX error, init failures)
and the single-line Prescaler/SJW/BS summary logged at init.
… operation

Tested at 1 Mbps on KAKUTEH7WING hardware and confirmed bus operational.
…ivers

Remove CubeMX boilerplate markers, commented-out dead code, and
development-time question comments from both drivers.
Fix: DroneCAN GNSS messages were being applied to gpsSolDRV regardless
of the configured GPS provider. Guard added in gps_dronecan.c where
it belongs, keeping CAN transport layer unaware of GPS config.
…g difference

F7 bxCAN HAL writes SJW directly to BTR register where hardware adds 1,
so stored value 3 gives 4 tq. This wider SJW is needed for reliable bus
operation on F7 targets and is different from H7 where SJW=1 is actual tq.
Prevents state machine from continuing in INIT state when the CAN
peripheral fails to initialize.
Prevents out-of-bounds access when STATE_DRONECAN_FAILED is active.
Prevents stale pre-bus-off frames from storming the bus on recovery.
With AutoRetransmission=ENABLE, frames that fail on a degraded bus
occupy FIFO slots indefinitely. All 32 slots fill, HAL_FDCAN_AddMessage
returns HAL_ERROR, and all outgoing traffic stalls permanently with no
indication until full bus-off. DroneCAN reliability is handled at the
application layer via periodic republishing.
Matches the H7 driver pattern. Previously the return value was silently
discarded; if timing computation failed, uninitialized stack bytes were
passed to HAL_CAN_Init.
The H7 FDCAN 128x11 recessive-bit recovery sequence takes up to 11.264ms
at 125kbps. The 1ms delay was restarting the counter before it could
complete, preventing the node from ever exiting bus-off. 20ms gives safe
margin above worst-case and allows time to detect immediate re-entry.
Guard against non-DroneCAN GPS provider at the transport boundary
(handle_GNSS* functions) rather than in each leaf function in
gps_dronecan.c. Also adds the guard to handle_GNSSRCTMStream which
had none. Removes stale UNUSED(pgnssAux) and placeholder comment
from dronecanGPSReceiveGNSSAuxiliary.
canardSTM32GetProtocolStatus() was called on every dronecanUpdate()
invocation (~500Hz) to detect bus-off. Moved into the existing 1Hz
task block — bus-off detection latency of up to 1s is acceptable.
Adds LOG_DEBUG to report BusOff and ErrorPassive flags each second
for bench diagnostics.
AutoBusOff=ENABLE handles the 128x11 recovery sequence automatically,
but ESR.BOFF is a sticky read-only flag that is NOT cleared when hardware
recovery completes. GetProtocolStatus() reads this flag, so the state
machine was permanently stuck in STATE_DRONECAN_BUS_OFF after any
bus-off event on F7 targets. Stop/Start re-enters init mode which
clears ESR.BOFF, allowing recovery detection to work correctly.
Comment incorrectly stated '25MHz' as a supported HSE value — 25MHz
fails the assert. CMake always provides HSE_VALUE per-target via
-DHSE_VALUE=<n> so the stm32h7xx_hal_conf.h fallback of 25MHz is never
used. Current targets use 8MHz (default) or 16MHz (KAKUTEH7WING).
…adence

GetProtocolStatus() was called every dronecanUpdate() cycle (~500Hz) in
BUS_OFF state. Moved inside the 20ms recovery timer block so it runs at
the same cadence as RecoverFromBusOff() — still detects recovery within
20ms but reduces MMIO reads from ~500/sec to ~50/sec.
HAL_CAN_Stop/Start called from the scheduler context with CAN interrupts
active caused a full FC lockup. Reverted to empty stub pending investigation
of a safe mechanism to clear the sticky ESR.BOFF flag on F7.
Unconditional 1Hz LOG_DEBUG was flooding the bootlog with healthy status
messages. Now only logs when an error condition is actually present.
HAL_CAN_AddTxMessage returns non-OK when all mailboxes are busy — a
normal transient condition at startup. The log was noise. Matches the
H7 driver which already handles this path silently.
DroneCAN float16 optional fields encode NaN when unpopulated. Without a
guard, NaN * 100 converts to 0 on Cortex-M (ARM VCVT saturation),
permanently blocking the HDOP fallback path. Also passes values through
gpsConstrainHDOP() to prevent uint16_t overflow for extreme DOP values.
…ompatible

gpsSolDRV has hdop but no vdop field. VDOP and EPV are not interchangeable
(different units, conversion requires receiver UERE). lastVDOP was a dead
store with no valid consumer.
…ames size

Guards the CLI state name array against future enum additions — if a new
state is added without updating the array, the build fails immediately.
daijoubu added 29 commits July 1, 2026 08:02
The buffer size check for MSP2_INAV_DRONECAN_NODE_INFO now uses an
explicit per-field sum matching the writes that follow, so the check
stays in sync when fields are added or removed.
…FAILED

Previously busoffTimeUs was reset unconditionally, causing endless 20ms
recovery attempts on a permanently-faulted bus with no exit path.

After 50 attempts (~1 second at 20ms cadence) without clearing BusOff,
transition to STATE_DRONECAN_FAILED and log the failure. The retry counter
resets on successful recovery so a transient fault is handled correctly.
… lookups

Add a private findNodeByID() helper in dronecan.c and a public
dronecanGetNodeByID() accessor in the driver API. Use them to
consolidate three independent linear-search loops that all performed
the same nodeID lookup:

- handle_NodeStatus: replace loop with findNodeByID()
- handle_GetNodeInfoResponse: replace loop with findNodeByID()
- MSP2_INAV_DRONECAN_NODE_INFO handler: replace loop+found flag with
  dronecanGetNodeByID(), simplifying to a flat early-exit structure
…SIZE constant

Define the constant near the DroneCAN include with a breakdown comment
showing each field's contribution. The check site now reads clearly and
the single definition is the one place to update when the wire format changes.
…nav.h

The size constant belongs alongside the message code definition, not
inside the implementation file. Removed from fc_msp.c, added with
field breakdown comment next to MSP2_INAV_DRONECAN_NODE_INFO.
Replace eight individual zero-assignments with memset() on the whole
dronecanNodeInfo_t slot before setting non-zero fields. Future struct
additions are zeroed automatically.

Update coverage comment in dronecan_application_unittest.cc — GetNodeInfo
response acceptance has been implemented since Phase 3; remove the stale
'Phase 3 will change this' note.
- Extend dronecanNodeInfo_t name field from 32 to 80 bytes to match
  the full UAVCAN GetNodeInfo name length (spec allows up to 80 bytes)
- Use sizeof(node->name) in truncation logic so it tracks the struct
- Update MSP2_INAV_DRONECAN_NODE_INFO wire format: 71 -> 119 bytes
- Log when node table is full and a new node is silently dropped
…ase artifact

Orphaned blob (old non-static TX for-loop + stray closing brace + duplicate
non-static process1HzTasks) slipped through the 27-commit getnodeinfo rebase.
The correct versions already exist: processCanardTxQueue() for ISR-driven TX and
static process1HzTasks() at line 426 with NVIC masking.
…t symbols

Two fixes found by running unit tests after the getnodeinfo rebase:

1. shouldAcceptTransfer response block was empty — GetNodeInfo responses were
   silently dropped by canard before handle_GetNodeInfoResponse could run.
   Restores the UAVCAN_PROTOCOL_GETNODEINFO_RESPONSE_SIGNATURE case that was
   present on the pre-rebase branch but clobbered during conflict resolution.

2. UNIT_TEST build: expose handle_NodeStatus, shouldAcceptTransfer, and
   onTransferReceived as non-static so dronecan_application_unittest.cc can
   call them directly. Add txErrCount/busOffCount to UNIT_TEST block.
   Fix stub typo canardSTM32Recieve→canardSTM32Receive and add missing
   canardSTM32GetAndClearRxDropCount stub.

All 13 dronecan_application_unittest + 13 dronecan_getnodeinfo_unittest pass.
- Add transfer ID guard in handle_GetNodeInfoResponse to reject stale
  or replayed frames (uses per-node getNodeInfo_transfer_id counter)
- Use sizeof(node->hw_unique_id) instead of magic number 16 in memcpy
- Fix comment label: "Canard Senders" -> "Canard Handlers and Senders"
- Remove extra blank line before #endif at end of file
- Rename MSP field last_seen_ms -> elapsed_ms to match notes wording
- Fix TwoDistinctNodesStoredSeparately test: interleave encode+dispatch
  so node 10 is not inserted with node 20's payload data
Replace auto-fetch-on-discovery with a single shared async slot that
serves any DroneCAN service request (GetNodeInfo, param GetSet) via
two new MSP commands:

  MSP2_INAV_DRONECAN_ASYNC_REQUEST (0x2043) — fires any service request
  MSP2_INAV_DRONECAN_ASYNC_RESULT  (0x2044) — polls result with seq guard

Design changes:
- Strip node table to runtime fields only (nodeID, health, mode,
  uptime_sec, vendor_status_code, last_seen_ms); saves ~3.5 KB RAM
- Remove automatic GetNodeInfo fetch on node discovery
- Add dronecanAsyncRequest() as single entry point for all service calls
- Add handle_AsyncServiceResponse() dispatching on service_id
- Add 2-second timeout with seq generation counter to detect stale polls
- Add UAVCAN_PROTOCOL_PARAM_GETSET_ID to shouldAcceptTransfer response block
- Extend MSP2_INAV_DRONECAN_NODES wire format with uptime_sec and
  vendor_status_code (13 bytes/record, was 7)
…expiry

- Add ExecuteOpcode (service 10) and RestartNode (service 5) to async
  slot framework: request builder, response decoder, shouldAccept filter,
  MSP ASYNC_REQUEST/RESULT handlers
- Fix GetSet write: zero-initialise encode buffer so reserved bits are
  not contaminated by stack garbage
- Fix GetSet write: include parameter name in request; ArduPilot nodes
  require the name field to accept writes
- Prune node table entries silent for >10s so stale nodes (e.g. after
  a node ID change) are removed automatically
- Change service_id to uint8_t throughout (slot, function sig, fc_msp local)
- Replace obvious buffer zero-init comment with explanation of why (UAVCAN reserved bits)
- Fix extra indentation on timeout expiry assignment
- MSP reads service_id as u16 for protocol compat; casts to u8 on use
… comments

- GetNodeInfo name buffer: clamp to sizeof-1, add null terminator (matches
  param result pattern); grow name[80] -> name[81] to hold the terminator
- Add DRONECAN_STATE_NOT_READY 0xFF constant to header; use it in fc_msp.c
  instead of a bare magic number
- Update accepted/busy comment to note 1 also covers unrecognised service_id
…t, logging

- Add UAVCAN-spec transfer_id match in handle_AsyncServiceResponse to reject
  stale frames after bus-off recovery (in-flight id is (transfer_id-1) & 0x1F)
- Add comment clarifying buf_ptr=NULL is intentional for zero-length payloads
- Add LOG_DEBUG for ExecuteOpcodeResponse decode failure (matches GetNodeInfo
  and ParamGetSet pattern)
- PARAM_GETSET minimum-size guard: < 2 → < 3 (index+is_write required);
  remove fallback ternary for is_write
- handle_AsyncServiceResponse: update stale comment to name service_id
  specifically rather than vague "parameter"
Decode NumericValue min_value and max_value from UAVCAN param.GetSet
response and thread them to the configurator. The FC now serializes
min/max type and value bytes after the param value in the async result
MSP payload.
…e field zeroing, named constants, struct alignment

- dronecan.c: map NUMERICVALUE_* → DRONECAN_PARAM_TYPE_* explicitly in
  min/max decode switch; zero inactive fields before each switch to
  prevent stale values across consecutive GetSet calls
- dronecan.h: reorder min/max struct fields (int64_t first) to eliminate
  7-byte ARM alignment padding per min/max group
- fc_msp.c: replace bare 1/2 with DRONECAN_PARAM_TYPE_INT/FLOAT; add
  braces to else-if blocks in min/max serializer
…mResult_t

Reorder min/max fields to match existing value field convention (type,
int64, float). Fix comment that incorrectly claimed the prior ordering
eliminated ARM alignment padding — both orderings produce identical
struct size.
Replace the removed MSP2_INAV_DRONECAN_NODE_INFO (0x2043) entry with
documentation for the new async slot pattern introduced in the
dronecan-param-getset feature:

- MSP2_INAV_DRONECAN_NODES (0x2042): fix per-node record size 7→13 bytes
  (added uptime_sec u32 + vendor_status_code u16)
- MSP2_INAV_DRONECAN_ASYNC_REQUEST (0x2043): document all four services
  (GetNodeInfo, RestartNode, ExecuteOpcode, ParamGetSet read/write)
  including param type encoding and sequence number correlation
- MSP2_INAV_DRONECAN_ASYNC_RESULT (0x2044): document state machine,
  all service-specific result payloads, and slot-reset behaviour
Replace stale MSP2_INAV_DRONECAN_NODE_INFO entry and fix
MSP2_INAV_DRONECAN_NODES record size to match the current
dronecan-param-getset implementation:

- MSP2_INAV_DRONECAN_NODES (8258): expand per-node record from 7 to
  13 bytes — add uptime_sec(u32) and vendor_status_code(u16) fields
- MSP2_INAV_DRONECAN_ASYNC_REQUEST (8259): replace removed NODE_INFO
  entry; documents common header + service-specific request fields for
  all four services (GetNodeInfo, RestartNode, ExecuteOpcode, ParamGetSet)
- MSP2_INAV_DRONECAN_ASYNC_RESULT (8260): new entry documenting the
  5-byte common header and all service-specific result payloads
Re-run gen_msp_md.py after updating msp_messages.json to keep the
generated Markdown in sync with the JSON source of truth.
…rRespond

dronecanAsyncRequest() is called from the main loop (via MSP handler).
canardRequestOrRespond() mutates the libcanard TX queue, which is also
accessed by the ISR-driven processCanardTxQueue(). Without masking, a
TX-complete ISR could corrupt the queue mid-enqueue.
The struct dronecanNodeInfo_t was stripped of GetNodeInfo fields in
96f8a4b (shared async slot) to save ~3.5 KB RAM. GetNodeInfo results
now go to dronecanAsyncSlot.result.node_info (DRONECAN_ASYNC_READY)
instead of the node table.

Update the test to:
- Prime the async slot (PENDING + service_id + node_id) before dispatch,
  since handle_AsyncServiceResponse guards on all three
- Assert slot state transitions to DRONECAN_ASYNC_READY
- Assert result fields via dronecanAsyncSlot.result.node_info
- Reset dronecanAsyncSlot in SetUp() for test isolation
- Replace hardcoded 2000 with DRONECAN_ASYNC_TIMEOUT_MS in timeout check
  and use >= to align with dronecanAsyncRequest guard
- Add DRONECAN_NODE_STALE_TIMEOUT_MS constant; replace magic 10000
- Fix double-indented body in timeout check
- Fix brace alignment in shouldAcceptTransfer response cases
- Clear async slot to IDLE after reading ERROR state in MSP result handler
- Add comment to dronecanParamResult_t.name[93] explaining DSDL capacity
- Document 0xFF (bus not ready) sentinel in MSP async request JSON/README
…ART_NODE response decode

Extends dronecan_application_unittest.cc with 13 new tests covering:
- handle_AsyncServiceResponse guard rejections (IDLE state, wrong node_id,
  wrong transfer_id, wrong service_id)
- PARAM_GETSET response decode for all five Value types (INT, FLOAT, BOOL,
  STRING, EMPTY) including integer and float min/max range fields
- EXECUTE_OPCODE response decode (ok=true and ok=false)
- RESTART_NODE response decode (ok=true)
- dronecanAsyncRequest re-entry guard (slot PENDING → returns false)

26/26 tests pass.
- dronecan.c: clamp name and string-value lengths before memcpy in
  dronecanAsyncRequest() — safe today because MSP pre-clamps, but the
  function is part of a public API and must not rely on caller preconditions
- dronecan.c: update stale block comment on handle_AsyncServiceResponse
- fc_msp.c: replace manual 32-bit split/join of int64 wire values with
  memcpy through uint64_t, eliminating signed left-shift and
  implementation-defined right-shift on negative values
- fc_msp.c: remove trailing whitespace on USE_DRONECAN #endif
- unittest: narrow GAP-S7 re-entry guard test to RESTART_NODE so the
  null-payload check cannot mask a missing re-entry guard
- fc_msp.c: add missing sbufAdvance after sbufReadData in STRING write
  arm of PARAM_GETSET decoder — without it, req_name_len read the first
  byte of the string value, silently targeting the wrong param on the node
- fc_msp.c: add sbufAdvance after name read for consistency with INT path
- fc_msp.c: replace signed right-shift with memcpy/uint64_t for min_int
  and max_int range encoding, consistent with the value_int fix in pass 2
- dronecan.c, fc_msp.c: remove trailing whitespace on blank lines in
  new code
- dronecan.c: expand handle_AsyncServiceResponse comment to explain the
  single-slot design rationale
- dronecan.c: normalise shouldAcceptTransfer to consistent 4-space
  indentation (was mixed tabs/spaces from prior edits)
- unittest: add shouldAcceptTransfer tests for PARAM_GETSET, EXECUTE_OPCODE,
  and RESTART_NODE response acceptance (29 tests total)
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Test firmware build ready — commit b2f2507

Download firmware for PR #11683

240 targets built. Find your board's .hex file by name on that page (e.g. MATEKF405SE.hex). Files are individually downloadable — no GitHub login required.

Development build for testing only. Use Full Chip Erase when flashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant