fix(aggregation): hierarchical/leaf Component routing with provenance and warnings by bburda · Pull Request #374 · selfpatch/ros2_medkit

bburda · 2026-04-13T19:58:14Z

Summary

Make the aggregation layer treat Components with the same symmetry as Areas, so collision-merged peer Components work correctly for every deployment shape the project supports (single peer, multi-peer, hierarchical parent, daisy chain).

EntityMerger::merge_components now records a provisional routing entry on collision. Every Component ID refers to at most one ECU when it is a leaf, and the peer owns the runtime state.
A new pure classify_component_routing free function (aggregation/classification.hpp) runs after all peers have been merged. A Component is classified as a hierarchical parent if any other Component in the merged set references it via parent_component_id; those are removed from the routing table and served locally from the merged cache. Leaves stay routed to the owning peer.
Multi-peer leaf collisions are reported as structured LeafCollisionWarning objects. AggregationManager surfaces them on MergedPeerResult.leaf_warnings, writes an RCLCPP_WARN line listing every claimant, and exposes the list via a new /health.warnings array. Routing falls back to last-writer-wins because rejection does not fix a deployment anomaly.
Every merged entity (Area, Component, App, Function) now carries an optional x-medkit.contributors list populated during merge (local seeded before the peer loop, peer:<name> appended by EntityMerger with de-duplication). Clients can distinguish local-only, peer-only, and merged views without relying on the single-valued source field. In daisy-chain topologies each hop surfaces only its direct upstream by design.
Pre-existing doc comment referencing a third-party commercial product was stripped from sovd_service_interface.hpp.
Aggregation design doc now documents the symmetric classification, the multi-peer warning behaviour, and the contributors provenance, with an updated PlantUML merge diagram plus a new post-merge classification diagram.

SOVD API impact

No changes to SOVD-defined request paths, response schemas, or error codes.
Additive x-medkit extensions only: new optional x-medkit.contributors field on entity responses and a new optional warnings array on /health. Both are emitted only when non-empty / when aggregation is active, so existing clients ignoring unknown fields are unaffected.

Issue

closes [BUG] Collision-merged peer components return empty runtime data on primary #373

Type

Bug fix
New feature or tests (classification seam, /health.warnings, contributors, daisy-chain integration test)
Breaking change
Documentation only (design doc updates)

Testing

All local runs green:

./scripts/test.sh unit --packages-select ros2_medkit_gateway - 1976 tests, 0 failures. Includes:
- 6 new AggregationClassification tests covering leaf vs hierarchical, single/multi-peer, subcomponents, and warning emission.
- 4 new EntityMerger contributor tests (local/peer append, dedup, app-prefix-only-peer).
- Updated merged_components_route_to_peer plus the synthetic-collision test from the previous iteration.
./scripts/test.sh test_peer_aggregation --packages-select ros2_medkit_integration_tests - prior integration coverage still green, 82 tests.
./scripts/test.sh test_daisy_chain_aggregation --packages-select ros2_medkit_integration_tests - new integration test with 3 gateways in isolated DDS domains (primary -> peer_B -> peer_C). Verifies hierarchical parent served locally, 1-hop and 2-hop leaf forwarding, contributors, and empty /health.warnings.
./scripts/test.sh lint --packages-select ros2_medkit_gateway - clean.
Sphinx docs build with no new warnings.

Checklist

Breaking changes are clearly described (there are none; only additive x-medkit extensions)
Tests were added or updated if needed
Docs were updated if behavior or public API changed

Copilot

Pull request overview

Fixes aggregation routing so that when a peer Component collides (same ID) with a locally-present Component and gets merged, sub-resource requests are forwarded to the peer that owns the runtime state—preventing empty local responses on endpoints like /logs, /hosts, /data, and /operations.

Changes:

Add routing table entry for collision-merged Components in EntityMerger::merge_components.
Update/extend EntityMerger unit tests to assert merged Components route to the peer (including a hybrid synthetic-collision scenario).
Refresh aggregation design documentation and remove an unrelated third-party product reference from a plugin doc comment.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`src/ros2_medkit_gateway/src/aggregation/entity_merger.cpp`	Route collision-merged Component IDs to the peer in the routing table.
`src/ros2_medkit_gateway/test/test_entity_merger.cpp`	Update routing-table expectations and add a hybrid synthetic-collision test.
`src/ros2_medkit_gateway/design/aggregation.rst`	Align merge/routing documentation with ECU-ownership routing behavior for Components.
`src/ros2_medkit_plugins/ros2_medkit_sovd_service_interface/src/sovd_service_interface.hpp`	Remove unrelated third-party reference from the class doc comment.

…ification Add test_aggregation_classification with 6 failing tests that codify the target semantics before the implementation exists (TDD red phase): - leaf_collision_keeps_routing_to_peer - hierarchical_parent_drops_routing_when_local_subcomponents_exist - hierarchical_parent_drops_routing_when_remote_subcomponents_exist - hierarchical_parent_drops_routing_across_multiple_peers - subcomponents_of_hierarchical_parent_still_route_to_peer - leaf_collision_across_multiple_peers_emits_warning The tests call a pure classify_component_routing() free function that takes the fully merged Component vector plus per-peer entity claims and returns a filtered routing table (hierarchical parents removed) together with LeafCollisionWarning entries for leaf IDs claimed by >1 peer. Build will fail on missing classification.hpp - intentional; the implementation commit introduces the header and source. Part of PR #374.

Introduce a pure free function that takes the fully-merged Component vector plus per-peer Component claims and returns a routing table with hierarchical parents removed plus a list of multi-peer leaf-collision warnings. Hierarchical parent = any Component referenced as parent_component_id by another Component in the set. Leaves stay routed to their peer (last-writer-wins for routing on collision; warning enumerates every claiming peer). Fixes the 6 red tests added in the previous commit. Part of PR #374.

…nager Track per-peer Component IDs during fetch_and_merge_peer_entities and run classify_component_routing once the merged Component set is final. Hierarchical-parent routing entries are stripped (served locally with the merged view, like Areas/Functions); leaves keep routing to their peer. Multi-peer leaf collisions surface as RCLCPP_WARN log entries listing every claiming peer, and the structured warnings are returned on MergedPeerResult.leaf_warnings for downstream consumers (e.g. the upcoming /health.warnings field). Existing unit (1976) and peer-aggregation integration (82) suites pass unchanged - the no-collision and apps-only paths still match the prior behaviour because classify_component_routing is a no-op for those. Part of PR #374.

…warnings When more than one peer announces the same leaf Component ID, AggregationManager stores the collision as a structured warning. The /health endpoint now surfaces these via a top-level warnings array (x-medkit extension on our own endpoint, not SOVD core). Empty array when no warnings are active. Operators can pick up deployment anomalies without tailing logs; routing still falls back to last-writer-wins. Each warning carries a machine-readable code, human message, the affected entity_ids (currently a single-element array for forward compatibility), and the full list of claiming peer_names. Part of PR #374.

Single-string source cannot express that an entity was contributed by both the local gateway and one or more peers. Add a contributors field on Area, Component, App, and Function models, serialised in each entity's x-medkit block (only when non-empty, so the field is a pure additive extension for existing clients). Single-origin entities get a one-element list for consistency. Population: - AggregationManager seeds every local entity with ["local"] before the per-peer merge loop. - EntityMerger appends "peer:<name>" on collisions (Areas, Components, Functions) and on remote-only additions; deduplication protects against repeated merges with the same peer. - App collision-prefixed entities receive only the peer contributor since the prefixed ID is a distinct entity. Daisy-chain note: each hop surfaces only its direct upstream ("peer:<name>") - a primary aggregating peer_B sees peer_B's entities as contributed by peer_B regardless of whether peer_B itself merged them from peer_C. This is intentional and can be extended later. Part of PR #374.

…enance Refresh the aggregation design document for the extended merge model: - Expand the Components merge-rules entry to distinguish hierarchical parents (served locally with merged view, like Areas) from leaf ECUs (routed to the owning peer on collision), with the classification happening post-merge so sub-components arriving from any peer unlock parent behaviour. - Describe multi-peer leaf-collision warnings, last-writer-wins routing, and the /health.warnings surface. - Update the routing-table description to match the new taxonomy. - Add a new section describing x-medkit.contributors provenance, how it is populated, and how it behaves in daisy-chain topologies. - Add a post-merge Component classification activity diagram alongside the per-peer merge rules diagram; the per-peer diagram now shows the "provisional routing entry" and contributor-append steps. Part of PR #374.

…egation Launch three gateways in separate DDS domains (primary -> peer_B -> peer_C), each with a manifest that declares a shared hierarchical parent 'robot-x' plus a single leaf ECU sub-component (ecu-a / ecu-b / ecu-c). Manifests are written to tmp files at import time so launch descriptions reference stable paths. Validates end-to-end: - Primary sees all three ECU sub-components under robot-x, including ecu-c which arrives via transitive aggregation through peer_B. - GET /components/robot-x returns a locally-served response with x-medkit.contributors listing "local" and "peer:peer_b" (each hop surfaces only its direct upstream; peer_c's contribution is opaque to the primary). - GET /components/ecu-b succeeds via 1-hop forwarding to peer_B. - GET /components/ecu-c succeeds via 2-hop forwarding primary -> peer_B -> peer_C. - GET /health on primary returns a peers list plus an empty warnings array (no leaf-id collisions in this topology). Also surfaces Component contributors in the detail handler's x-medkit block, which was missing because handle_get_component builds its x-medkit via XMedkit rather than the Component::to_json() path used by list/reference endpoints. Part of PR #374.

…nistic output Review follow-ups for PR #374 addressing correctness and determinism gaps found in deep-review: - Previously x-medkit.contributors was emitted only on GET /components/{id} because the Component detail handler builds x-medkit manually via the XMedkit builder. Area/App/Function detail handlers silently dropped the field despite the models carrying it. Introduce XMedkit::contributors() helper and call it from all four detail handlers so list and detail endpoints expose provenance consistently. - Classification output is now deterministic: leaf_warnings are sorted by entity_id, each warning's peer_names list is sorted alphabetically, and per-peer claim iteration is driven by a sorted copy of the unordered_set. Operator logs and /health.warnings no longer flap between runs or after hashmap rehashes. - Contributors are presented in stable order ("local" first when present, then peer:<name> entries alphabetically) via a sorted_contributors() helper in discovery/models/common.hpp. Called from every entity to_json() and from XMedkit::contributors(), so list and detail responses agree regardless of how peers were merged. - Introduce warning_codes.hpp with WARN_LEAF_ID_COLLISION constant mirroring the error_codes.hpp pattern. Inline string literal in health_handlers.cpp replaced with the constant. - Leaf-collision warning message now names the colliding peers and includes a concrete remediation hint (rename on one side, or model as hierarchical parent) instead of only describing the symptom. - Add aggregation capability flag to handle_root so clients can feature-detect that /health.warnings and x-medkit.contributors may be present. /health.warnings is always an array when aggregation is enabled (empty when no anomalies), so clients avoid tri-state absent/empty/populated branching. - OpenAPI health_schema() now declares top-level peers and warnings fields (with structured warning object schema for code/message/ entity_ids/peer_names). Previously these were emitted but absent from the spec - breaking client generators. Tests: 1980 unit + daisy-chain integration still green. Deeper integration coverage (leaf collision path, capability assertion, contributors on non-Component entities) follows in a separate commit. Part of PR #374.

… refresh docs Integration test additions (daisy chain): - capability flag assertions on primary (aggregation: true) and on peer_c (aggregation: false) - contributors assertion on a leaf Component detail (ordering starts with 'local' per the new stable-order rule) - contributors assertion on an App detail endpoint (regression guard for the handler-level fix that went into the previous commit) - addClassCleanup registers os.unlink for each tmp manifest so CI no longer leaks three YAML files per run Requirement traceability: - @verifies REQ_INTEROP_003 on all 6 AggregationClassification tests, on the 4 new EntityMerger contributors tests, and on the 4 main daisy-chain integration assertions (docstring-based for Python so launch_testing introspection preserves them) Design doc refresh: - Clarify the Provenance section: EntityMerger appends contributors unconditionally on Component collisions - classification happens later and only rewrites the routing table - Document contributors stable ordering (local first, then peer:*) - Expand Health Monitoring subsection with /health.warnings and peers shapes, cross-link to classification, point operators at warning_codes.hpp - Add classify_component_routing to the Key Classes section Part of PR #374.

The EntityDetail OpenAPI component is shared by Area, Component, App, and Function detail endpoints. Previously it had no x-medkit field declared at all, which meant generated clients silently dropped the new x-medkit.contributors aggregation provenance array. Add an x-medkit object schema on EntityDetail with an explicit contributors property (typed array of strings, with a description covering the "local" vs "peer:<name>" semantics and the stable sort order). additionalProperties remains true so that existing x-medkit fields not yet typed at the schema level (entityType, namespace, source, etc.) continue to round-trip cleanly. Part of PR #374.

When a peer component collides with a local one (e.g. hybrid mode creates a synthetic component that matches a peer's real component), the merged entity was not added to the routing table. This caused sub-resource requests (logs, hosts, data) to be handled locally instead of forwarded to the peer, returning empty results. Add routing_table_[merged.id] = peer_name_ in the collision branch of merge_components so that validate_entity_for_route forwards requests to the owning peer.

Replace the test that asserted merged Components stay out of the routing table (which encoded the prior incorrect "shared ownership" model) with one that asserts the opposite: merged Components must route to the peer that owns the ECU's runtime state. Add a dedicated test for the hybrid-mode synthetic-collision scenario that motivates the fix. Also refine the inline comment in merge_components to spell out the ECU-ownership rationale instead of describing symptoms.

Update the merge-rules summary, the merge-logic PlantUML diagram, and the routing-table description to reflect that a Component ID identifies a single physical ECU. On collision the peer is always the authoritative owner, so every request for the merged Component - including the detail endpoint and all sub-resources - is forwarded to that peer. The primary's local view is limited to aggregating the Component's presence in discovery listings.

…e_interface doc comment