[kernel-1116] Add CDP Monitor#213
[kernel-1116] Add CDP Monitor#213archandatta wants to merge 4 commits intoarchand/kernel-1116/cdp-pipelinefrom
Conversation
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR adds new CDP monitor functionality but does not modify API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) as specified in the filter. To monitor this PR anyway, reply with |
| return true | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Suffix-based MIME check inconsistent between capture functions
Medium Severity
isCapturedMIME restricts the +json/+xml/+csv suffix check to only application/vnd.* types, while bodyCapFor applies the same suffix check to all MIME types. This means common structured content types like application/ld+json, application/hal+json, and application/problem+json will be rejected by the isCapturedMIME gatekeeper and never have their bodies captured — even though bodyCapFor was clearly designed to give them the full 8 KB cap. The bodyCapFor comment also incorrectly claims it matches isCapturedMIME's allow-list.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d73793c. Configure here.
hiroTamada
left a comment
There was a problem hiding this comment.
solid foundation overall — the computed event state machine and timer invalidation logic are well designed. one blocking concern around sensitive data capture in interaction.js.
| })); | ||
| }, true); | ||
|
|
||
| document.addEventListener('keydown', function(e) { |
There was a problem hiding this comment.
the keydown listener captures every keystroke with no filtering — this is effectively a keylogger. since this gets injected into all browser VMs (including sessions where users type passwords via managed auth), sensitive input like passwords, credit card numbers, and SSNs will flow through the event pipeline and get persisted. at minimum, keystrokes in <input type="password"> fields should be redacted or skipped. worth also considering fields with autocomplete="cc-number", name="ssn", etc. the click listener's textContent capture has a similar (lesser) exposure. this has PCI/SOC2/GDPR implications if the data hits storage.
rgarcia
left a comment
There was a problem hiding this comment.
can you combine all the prs into one? I'm finding it hard to review this since most of the code is unused
| // on the root session (sessionID="") before navigation has been recorded. | ||
| const mainSessionUnset = "\x00unset" | ||
|
|
||
| // Event type constants for all events published by the cdpmonitor. |
There was a problem hiding this comment.
i think this constant block would be a lot easier to reason about with some cosmetic structure: group the names into direct-ish CDP-derived monitor events, computed/synthetic events from computed.go, injected page-side interaction events, and monitor lifecycle/internal events, and add short godoc comments for each group (or for the less obvious constants). right now they're all in one flat list, which makes it hard to tell which names are grounded in protocol counterparts vs introduced locally by the monitor.
| addedAt time.Time // for TTL eviction | ||
| } | ||
|
|
||
| // cdpConsoleArg is a single Runtime.consoleAPICalled argument. |
There was a problem hiding this comment.
i'd like for this first decode layer to be much closer to the CDP protocol / PDL types, and then have a second translation step into monitor-specific event shapes. right now things like cdpConsoleArg / cdpConsoleParams are custom projections of Runtime.RemoteObject / Runtime.consoleAPICalled, which makes it harder to audit against the source of truth and easier to accidentally drop fields we may want later. i think mirroring the protocol more directly at the boundary would make this foundation easier to reason about, with the domain-specific simplification happening after decode. the upstream source of truth here is the CDP spec / PDL, e.g. js_protocol.pdl.
Second layer of the CDP monitor split. Adds the full Monitor struct, all lifecycle machinery, and the test infrastructure. dispatchEvent is a no-op stub - event handlers land in #213 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Adds substantial new WebSocket lifecycle/reconnect and command routing logic with multiple goroutines/locks; regressions could impact capture stability and leak resources. Tests mitigate but concurrency and reconnect edge cases remain moderately risky. > > **Overview** > Implements the core `cdpmonitor.Monitor` lifecycle: establishes a CDP WebSocket connection, routes command responses via `send()`, starts the init sequence (`Target.setAutoAttach` + attach to existing targets), and adds a `Health()` snapshot for operational visibility. > > Adds automatic reconnect on upstream URL changes with backoff, state reset/unblocking of in-flight commands, and lifecycle events (`monitor_disconnected`, `monitor_reconnected`, `monitor_reconnect_failed`, `monitor_init_failed`). > > Introduces screenshot capture/publishing with rate limiting and optional ffmpeg downscaling, plus a comprehensive test harness (fake WS server/upstream, event collector) and new unit tests covering lifecycle, reconnect, screenshot behavior, and pending-command unblocking. Domain enablement/script injection is factored into `domains.go`, and `dispatchEvent` is stubbed pending the follow-up handlers PR. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit bfb6cd7. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 348243a. Configure here.
| key: e.key, | ||
| selector: sel(t), tag: t.tagName || '' | ||
| })); | ||
| }, true); |
There was a problem hiding this comment.
Keydown listener captures sensitive input like passwords
High Severity
The keydown listener sends e.key for every keystroke with no filtering of sensitive contexts. When injected into browser sessions where users type passwords (e.g. <input type="password">), credit card numbers, or other PII, the actual key values flow through the event pipeline and get persisted. This effectively acts as a keylogger. At minimum, keystrokes targeting sensitive input types need to be redacted or skipped.
Reviewed by Cursor Bugbot for commit 348243a. Configure here.
| m.computed.onLoadingFinished() // keep netPending consistent | ||
| } | ||
| } | ||
| m.pendReqMu.Unlock() |
There was a problem hiding this comment.
Sweep calls onLoadingFinished while holding pendReqMu
Medium Severity
sweepPendingRequests calls m.computed.onLoadingFinished() inside a loop while holding pendReqMu. Each call acquires computed.mu, decrements netPending, and may restart the network idle timer. When multiple entries expire in one sweep, netPending can be decremented below the actual count (since the sweep-evicted requests were never tracked via onRequest), and the timer is needlessly restarted on each iteration. The onLoadingFinished calls here assume a 1:1 pairing with prior onRequest calls that may never have occurred for stale entries.
Reviewed by Cursor Bugbot for commit 348243a. Configure here.


Introduces the foundational layer of the CDP monitor as a standalone reviewablechunk. No Monitor struct wiring, just the primitives that everything else builds on.
types.go: CDP wire format (cdpMessage), all event type constants, internal state structs (networkReqState, targetInfo, CDP param shapes).
util.go: Console arg extraction, MIME allow-list (isCapturedMIME), resource type filter (isTextualResource), per-MIME body size caps (bodyCapFor), UTF-8-safe body truncation (truncateBody).
computed.go: State machine for the three derived events: network_idle (500ms debounce after all requests finish), layout_settled (1s after page_load with no layout shifts), navigation_settled (fires once all three flags converge). Timer invalidation via navSeq prevents stale AfterFunc callbacks from publishing for a previous navigation.
domains.go: isPageLikeTarget predicate (pages and iframes get Page.* / PerformanceTimeline.*; workers don't), bindingName constant, interaction.js embed.
interaction.js: Injected script tracking clicks, keydowns, and scroll-settled events via the __kernelEvent CDP binding.
Note
Medium Risk
Moderate risk because it introduces new concurrent websocket/reconnect logic and publishes potentially sensitive network payloads (headers/body), which could impact stability and data exposure if mishandled.
Overview
Adds a full
cdpmonitor.Monitorimplementation: websocket dial/read loop,send()request/response plumbing, auto-attach + attach-existing-targets init, periodic pending-request eviction, and robust Stop/clear-state behavior with pending-command unblocking.Expands event capture by dispatching CDP domains into typed events (console, exceptions, navigation/lifecycle, targets, network request/response with optional truncated bodies), injecting
interaction.jsvia a runtime binding (with per-session rate limiting), taking rate-limited screenshots, and emitting derived meta-events (network_idle,layout_settled,navigation_settled).Adds restart handling via upstream URL subscription with backoff reconnect and lifecycle events (
monitor_disconnected,monitor_reconnected,monitor_reconnect_failed,monitor_init_failed), updates API wiring to pass asloglogger intocdpmonitor.New, and introduces extensive unit tests covering lifecycle, reconnect, handlers, computed-state timing, redirects, and subframe edge cases.Reviewed by Cursor Bugbot for commit 348243a. Bugbot is set up for automated code reviews on this repo. Configure here.