Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions demos/mosaico_integration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Mosaico integration demo

End-to-end pipeline demonstrating that **medkit fault snapshots flow into Mosaico as queryable, structured forensic data** - with no robot hardware and no recording 24/7.

> Robot stack runs in Docker. Inject high noise on the simulated LiDAR. The medkit gateway detects it via the standard `/diagnostics` ROS topic, confirms the fault, and flushes its 15-second pre-fault + 10-second post-fault ring buffer to an `.mcap` file. A small Python bridge listens on the gateway's `/faults/stream` SSE endpoint, downloads the bag via REST, and ingests it into mosaicod via Apache Arrow Flight using Mosaico's own Python SDK. From `docker compose up` to a queryable `Sequence` in mosaicod takes about a minute.

There are two variants of the stack:

- **Single-robot** (`docker-compose.yml`): one sensor-demo + one bridge. Good for stepping through the pipeline the first time and for the notebook.
- **Fleet** (`docker-compose.fleet.yml`): three sensor-demos (warehouse-A, warehouse-B, outdoor-yard) each with its own bridge, all ingesting into one shared mosaicod. All three robots produce a `LIDAR_SIM` fault snapshot, but robot-02 is rotating (IMU `drift_rate = 0.3 rad/s`) during its fault window while robot-01 and robot-03 stay stationary. That way the metadata filter (Step 1 in the notebook) matches 3 of 3 and the compound IMU `.Q` stationarity query (Step 2) is the one doing the actual triage, excluding the rotating robot.

## Quick start

```bash
git clone https://github.com/selfpatch/selfpatch_demos.git
cd selfpatch_demos/demos/mosaico_integration

# Bring up postgres + mosaicod + sensor-demo + bridge
docker compose up -d

# Wait until everything is healthy (sensor-demo healthcheck takes ~30s on first run)
docker compose ps

# Inject the LiDAR HIGH_NOISE fault. The bridge picks up the SSE event,
# downloads the bag and ingests it into mosaicod within ~5s.
./scripts/trigger-fault.sh

# Watch the bridge do its thing
docker compose logs -f bridge

# Open the notebook (locally or via VS Code)
jupyter notebook notebooks/mosaico_demo.ipynb
```

The notebook connects to `localhost:16726` (mosaicod Arrow Flight) and runs four queries against your freshly-ingested fault snapshot, ending with a three-panel time-series plot showing the LiDAR noise spike alongside a stationary IMU - exactly the kind of cross-topic forensic correlation Mosaico is designed for.

### Fleet variant

```bash
# Three robots + three bridges + shared mosaicod. Ports 18081/2/3 for the
# robots, 16726 for mosaicod (same as single-robot).
docker compose -f docker-compose.fleet.yml up -d

# Wait ~25s after all containers are healthy so the ring buffer has a
# pre-fault baseline, then inject three LIDAR_SIM faults with different
# signatures and motion states:
# robot-01 warehouse-A : LiDAR noise_stddev = 0.5 , stationary (range std spike)
# robot-02 warehouse-B : LiDAR noise_stddev = 0.5 , rotating ω=0.3 (Step 2 excludes)
# robot-03 outdoor-yard: LiDAR drift_rate = 0.5 , stationary (range mean shift)
./scripts/trigger-fleet-faults.sh

# Each bridge independently ingests its robot's snapshot. After ~45s you
# will have three Sequences in mosaicod with distinct robot_id metadata.
docker compose -f docker-compose.fleet.yml logs -f bridge-01 bridge-02 bridge-03

docker compose -f docker-compose.fleet.yml down -v
```

## Architecture

```
docker compose stack (network: mosaico_net)

┌─────────────┐ ┌─────────────────────────┐
│ postgres │◄────────┤ mosaicod │
│ :5432 │ │ ghcr.io/mosaico-labs/ │
│ (internal) │ │ mosaicod:v0.3.0 │
└─────────────┘ │ --local-store /data │
│ port 6726 ──> host:16726
└─────────────────────────┘
│ Arrow Flight
│ (RosbagInjector via mosaicolabs SDK)
┌─────────────────────────┐ ┌────────┴────────┐
│ sensor-demo │ │ bridge │
│ built from sibling │◄─►│ python:3.11 │
│ ../sensor_diagnostics/ │ │ + mosaicolabs │
│ │ │ pinned b3867be
│ - ros2_medkit gateway │ │ + httpx │
│ - lidar/imu/gps/camera │ │ │
│ sim nodes │ │ Subscribes: │
│ - diagnostic_bridge │ │ GET /api/v1/ │
│ - fault_manager │ │ faults/ │
│ - rosbag ring buffer │ │ stream │
│ (15s pre + 10s post) │ │ (SSE) │
│ │ │ │
│ port 8080 ──> host:18080│ │ Downloads: │
└─────────────────────────┘ │ GET /api/v1/ │
│ apps/ │
│ diagnostic- │
│ bridge/ │
│ bulk-data/ │
│ rosbags/... │
└─────────────────┘
```

**License-safe**: mosaicod runs as the unmodified upstream Docker image. The bridge is a separate Python process that talks Apache Arrow Flight (a public Apache standard) to mosaicod via Mosaico's own Python SDK. We never link or modify mosaicod or its Rust crates.

## What lands in Mosaico (verified end to end on this stack)

| ROS topic | ROS message type | Mosaico ontology | Status |
|------------------------|----------------------------------------|---------------------------------|-------------------------------------|
| `/sensors/scan` | `sensor_msgs/msg/LaserScan` | `LaserScan` (`futures.laser`) | ✅ via [Mosaico PR #368][pr368] |
| `/sensors/imu` | `sensor_msgs/msg/Imu` | `IMU` | ✅ shipped adapter |
| `/sensors/fix` | `sensor_msgs/msg/NavSatFix` | `GPS` | ✅ shipped adapter |
| `/sensors/image_raw` | `sensor_msgs/msg/Image` | `Image` | ✅ adapter ships in SDK, not captured in this demo |
| `/diagnostics` | `diagnostic_msgs/msg/DiagnosticArray` | (none) | ⚠️ silently dropped, no adapter yet |

The `/diagnostics` drop is the only adapter gap. We use it on the medkit side to flag the fault (via diagnostic_bridge → fault_manager), but it does not reach Mosaico storage. A natural next step is to ship a `DiagnosticArray` adapter or define a dedicated `MedkitFault` ontology and write its adapter.

## Mosaico SDK pin: PR #368 merged on 2026-04-13

The `LaserScanAdapter` we need landed in [mosaico-labs/mosaico#368][pr368] on 2026-04-13 as commit `b3867be`. We validated end to end against that PR while it was in draft (the `issue/367` branch at commit `8e090cd`); post-merge the code path is identical, so no demo-side change was required.

The subsequent `mosaicolabs==0.3.2` PyPI wheel (2026-04-15) is missing the `futures` subpackage from the distributed wheel despite it being present in source on main, so `pip install mosaicolabs==0.3.2` cannot import `LaserScan`. Until a release with correct packaging ships, the bridge `Dockerfile` installs the SDK directly from the upstream repo pinned to `b3867be`. Once a fixed release is published this becomes a one-line change in the `Dockerfile` (swap the git install for `pip install mosaicolabs==<fixed_version>`).

On the **read** side the consumer still needs to `import mosaicolabs.models.futures.laser` to register the `laser_scan` ontology in the global registry. The notebook does this in its first cell.

[pr368]: https://github.com/mosaico-labs/mosaico/pull/368

## Smart snapshots, not 24/7 recording

This is the value prop: each entry in the Mosaico catalog is **only the 25 seconds around a confirmed fault** (15 s pre-fault baseline + 10 s post-fault), not hours of "nothing is happening" telemetry. A single fleet snapshot weighs roughly 2 MB of MCAP (LaserScan at 10 Hz, IMU at 100 Hz, GPS at 1 Hz, `/diagnostics`); the same four topics recorded 24/7 would produce approximately 6 GB per robot per day. At the demo's assumption of five confirmed faults per robot per day (so ~10 MB/day of smart snapshots), the catalog stays at ~600× less storage than naive always-on recording while keeping 100% of the frames that actually matter for forensics. `/sensors/image_raw` (30 Hz raw camera) is intentionally excluded from snapshot capture - that single topic would dominate the bag at ~27 MB/s; swap it for a `CompressedImage` topic if vision forensics is part of the workflow.

## Files in this directory

| Path | What |
|---|---|
| `docker-compose.yml` | Single-robot stack: postgres + mosaicod + sensor-demo + bridge |
| `docker-compose.fleet.yml` | Fleet stack: postgres + mosaicod + 3×(sensor-demo + bridge) |
| `bridge/bridge.py` | Subscribes SSE, downloads bag, calls `RosbagInjector`. Honors `POST_FAULT_WAIT_SEC` (default 12s) before download so the post-fault ring segment is finalized |
| `bridge/Dockerfile` | python:3.11-slim + Mosaico SDK pinned to PR #368 merge commit `b3867be` |
| `bridge/requirements.txt` | `httpx>=0.27,<0.30` |
| `medkit_overrides/medkit_params.yaml` | Sensor-demo medkit config: 15s pre + 10s post ring buffer, single 2 GB bag cap, `auto_cleanup: false` |
| `notebooks/mosaico_demo.ipynb` | Connect, list, query, plot - 7 sections covering single-robot and fleet variants |
| `scripts/trigger-fault.sh` | Single-robot: inject high noise on lidar-sim on `localhost:18080` |
| `scripts/trigger-fleet-faults.sh` | Fleet: inject three LIDAR_SIM faults with different signatures (noise/drift) and motion states (one robot rotating for the Step 2 filter) |

## Verified end-to-end

| What | Status |
|---|---|
| `docker compose build bridge` (LaserScan ontology sanity import passes) | ✅ |
| `docker compose up -d` brings four containers healthy | ✅ |
| medkit gateway responds at `localhost:18080/api/v1/health` | ✅ |
| `./scripts/trigger-fault.sh` injects fault, gateway returns CONFIRMED | ✅ |
| Bridge SSE connects, picks up `fault_confirmed` event | ✅ |
| Bridge resolves entity `apps/diagnostic-bridge` and downloads ~2 MB MCAP (25 s of LaserScan + IMU + GPS + /diagnostics) | ✅ |
| `RosbagInjector` finalizes 3 TopicWriters (`/sensors/{scan,imu,fix}`; `/diagnostics` silently dropped - no adapter yet) | ✅ |
| `MosaicoClient.list_sequences()` shows the new sequence within ~25 s of fault confirmation | ✅ |
| Notebook reads back `LaserScan` data with `range_min`, `range_max`, `ranges`, `intensities`, `frame_id` populated | ✅ |
| `IMU.Q.acceleration.z.between(...)` filter returns sequences | ✅ |

## Known surprises (we hit them so you don't have to)

1. **Medkit gateway path prefix**: SSE lives at `GET /api/v1/faults/stream`, not `GET /faults/stream`. Same for `/api/v1/components/...`. The bridge bakes the prefix into every URL.
2. **`reporting_sources` ≠ SOVD entity ID**: medkit reports the ROS publisher node name (`/bridge/diagnostic_bridge`), not the SOVD entity that owns the bag. The bridge enumerates apps + components via the gateway and HEAD-probes for `bulk-data/rosbags/{fault_code}` until one returns 200.
3. **Faults from the legacy diagnostic path land under `apps/diagnostic-bridge`**, not `apps/lidar-sim`. The diagnostic_bridge node is what owns the snapshot bag in this demo.
4. **Mosaico read-side registry**: even with PR #368 installed, you must `import mosaicolabs.models.futures.laser` before reading `LaserScan` data. Otherwise the topic reader raises `No ontology registered with tag 'laser_scan'`. The bridge does not need this (write side resolves adapters by ROS msg type) but the notebook does.
5. **Not all 5 listed topics actually land in Mosaico**: `/diagnostics` drops silently because no adapter is registered - the medkit ring buffer captures it, Mosaico just does not know what to do with it. `/sensors/image_raw` is a different kind of drop: the adapter ships in the SDK but the demo's `medkit_params.yaml` excludes that topic from snapshot capture on purpose (30 Hz uncompressed camera would dominate the bag). See the table and "Smart snapshots" section.
6. **Initial post-fault wait**: medkit holds the rosbag2 writer open for `duration_after_sec` (10s in this config) after `fault_confirmed`. The bridge waits `POST_FAULT_WAIT_SEC` seconds (default 12) before downloading so the trailing ring segment is finalized.
7. **Gateway port conflict on dev boxes**: the single-robot stack publishes on `18080` and `16726`; the fleet stack uses `18081/18082/18083` for the three gateways with `16726` shared by mosaicod. Adjust if you prefer defaults.
8. **`rosbag2` file splitting**: if `max_bag_size_mb` is hit mid-recording, `rosbag2` splits into `_0.mcap`, `_1.mcap`, ... and the medkit gateway serves only the first split. The 2 GB cap in `medkit_params.yaml` is there to prevent splitting for any realistic 25 s snapshot.

## Troubleshooting

- `docker compose build bridge` fails on the import sanity check → the pinned Mosaico commit `b3867be` is no longer fetchable, or the upstream source layout has changed. Update the `MOSAICO_PIN` build arg in `bridge/Dockerfile` to a current `main` commit that still contains `mosaicolabs/models/futures/laser.py`.
- `./scripts/trigger-fault.sh` returns curl error 22 → the gateway is up but needs `{"execution_type": "now"}` in the POST body. The script already does that; verify your gateway is actually `localhost:18080`.
- Bridge logs `Could not resolve entity for fault_code=X` → enumerate `/api/v1/apps` and `/api/v1/components` manually with `curl` and check whether any of them list the fault under `bulk-data/rosbags`. If none do, the gateway has not registered the bag yet (post-fault timer hasn't fired). Wait a few seconds and re-trigger.
- Notebook raises `No ontology registered with tag 'laser_scan'` → the `import mosaicolabs.models.futures.laser` cell did not run. Re-run it.
- `docker compose pull mosaicod` is slow on first run → the upstream image is ~110 MB, distroless. Expect 30-90s on a slow link.

## Cleanup

```bash
docker compose down -v # removes containers + named volumes
```
47 changes: 47 additions & 0 deletions demos/mosaico_integration/bridge/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Bridge container for the Mosaico ingestion demo.
#
# Subscribes to medkit /faults/stream SSE, downloads each fault snapshot
# bag from the gateway REST API, and ingests it into mosaicod via the
# mosaicolabs Python SDK over Apache Arrow Flight.
#
# License-safe pattern: bridge is a separate Python process that talks
# the public Apache Arrow Flight protocol to an unmodified mosaicod
# Docker image. We do not link or modify mosaicod or its Rust crates.
#
# SDK pinning: PR #368 (ROS adapters for futures ontology, including
# LaserScan) merged on 2026-04-13 as commit b3867be. The subsequent
# mosaicolabs==0.3.2 PyPI wheel (2026-04-15) ships with the
# `futures` subpackage missing from the wheel despite being in source
# on main, so `pip install mosaicolabs==0.3.2` cannot import LaserScan.
# We therefore install from the upstream repo at the PR #368 merge
# commit until a PyPI release with correct packaging ships.

FROM python:3.11-slim

ENV PYTHONUNBUFFERED=1
ENV PIP_NO_CACHE_DIR=1
ENV PIP_DISABLE_PIP_VERSION_CHECK=1

RUN apt-get update \
&& apt-get install -y --no-install-recommends git ca-certificates \
&& rm -rf /var/lib/apt/lists/*

ARG MOSAICO_REPO=https://github.com/mosaico-labs/mosaico.git
ARG MOSAICO_PIN=b3867be
RUN git clone "${MOSAICO_REPO}" /opt/mosaico \
&& cd /opt/mosaico \
&& git checkout "${MOSAICO_PIN}" \
&& git rev-parse HEAD > /opt/mosaico/.pinned_sha

COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt \
&& pip install /opt/mosaico/mosaico-sdk-py

# Sanity check at build time: import the SDK and the LaserScan ontology
# so we fail fast if a future pin regresses on what we need.
RUN python -c "from mosaicolabs import MosaicoClient; from mosaicolabs.ros_bridge import RosbagInjector, ROSInjectionConfig; from mosaicolabs.models.futures.laser import LaserScan; print('mosaicolabs SDK + LaserScan ontology import OK')"

WORKDIR /app
COPY bridge.py /app/bridge.py

CMD ["python", "-u", "/app/bridge.py"]
Loading
Loading