Skip to content

[kernel-1116] Add CDP Monitor#213

Open
archandatta wants to merge 4 commits intoarchand/kernel-1116/cdp-pipelinefrom
archand/kernel-1116/cdp-foundation
Open

[kernel-1116] Add CDP Monitor#213
archandatta wants to merge 4 commits intoarchand/kernel-1116/cdp-pipelinefrom
archand/kernel-1116/cdp-foundation

Conversation

@archandatta
Copy link
Copy Markdown
Contributor

@archandatta archandatta commented Apr 13, 2026

Introduces the foundational layer of the CDP monitor as a standalone reviewablechunk. No Monitor struct wiring, just the primitives that everything else builds on.

  • types.go: CDP wire format (cdpMessage), all event type constants, internal state structs (networkReqState, targetInfo, CDP param shapes).

  • util.go: Console arg extraction, MIME allow-list (isCapturedMIME), resource type filter (isTextualResource), per-MIME body size caps (bodyCapFor), UTF-8-safe body truncation (truncateBody).

  • computed.go: State machine for the three derived events: network_idle (500ms debounce after all requests finish), layout_settled (1s after page_load with no layout shifts), navigation_settled (fires once all three flags converge). Timer invalidation via navSeq prevents stale AfterFunc callbacks from publishing for a previous navigation.

  • domains.go: isPageLikeTarget predicate (pages and iframes get Page.* / PerformanceTimeline.*; workers don't), bindingName constant, interaction.js embed.

  • interaction.js: Injected script tracking clicks, keydowns, and scroll-settled events via the __kernelEvent CDP binding.


Note

Medium Risk
Moderate risk because it introduces new concurrent websocket/reconnect logic and publishes potentially sensitive network payloads (headers/body), which could impact stability and data exposure if mishandled.

Overview
Adds a full cdpmonitor.Monitor implementation: websocket dial/read loop, send() request/response plumbing, auto-attach + attach-existing-targets init, periodic pending-request eviction, and robust Stop/clear-state behavior with pending-command unblocking.

Expands event capture by dispatching CDP domains into typed events (console, exceptions, navigation/lifecycle, targets, network request/response with optional truncated bodies), injecting interaction.js via a runtime binding (with per-session rate limiting), taking rate-limited screenshots, and emitting derived meta-events (network_idle, layout_settled, navigation_settled).

Adds restart handling via upstream URL subscription with backoff reconnect and lifecycle events (monitor_disconnected, monitor_reconnected, monitor_reconnect_failed, monitor_init_failed), updates API wiring to pass a slog logger into cdpmonitor.New, and introduces extensive unit tests covering lifecycle, reconnect, handlers, computed-state timing, redirects, and subframe edge cases.

Reviewed by Cursor Bugbot for commit 348243a. Bugbot is set up for automated code reviews on this repo. Configure here.

@archandatta archandatta marked this pull request as ready for review April 13, 2026 14:56
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR adds new CDP monitor functionality but does not modify API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) as specified in the filter.

To monitor this PR anyway, reply with @firetiger monitor this.

return true
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suffix-based MIME check inconsistent between capture functions

Medium Severity

isCapturedMIME restricts the +json/+xml/+csv suffix check to only application/vnd.* types, while bodyCapFor applies the same suffix check to all MIME types. This means common structured content types like application/ld+json, application/hal+json, and application/problem+json will be rejected by the isCapturedMIME gatekeeper and never have their bodies captured — even though bodyCapFor was clearly designed to give them the full 8 KB cap. The bodyCapFor comment also incorrectly claims it matches isCapturedMIME's allow-list.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d73793c. Configure here.

Copy link
Copy Markdown
Contributor

@hiroTamada hiroTamada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solid foundation overall — the computed event state machine and timer invalidation logic are well designed. one blocking concern around sensitive data capture in interaction.js.

}));
}, true);

document.addEventListener('keydown', function(e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the keydown listener captures every keystroke with no filtering — this is effectively a keylogger. since this gets injected into all browser VMs (including sessions where users type passwords via managed auth), sensitive input like passwords, credit card numbers, and SSNs will flow through the event pipeline and get persisted. at minimum, keystrokes in <input type="password"> fields should be redacted or skipped. worth also considering fields with autocomplete="cc-number", name="ssn", etc. the click listener's textContent capture has a similar (lesser) exposure. this has PCI/SOC2/GDPR implications if the data hits storage.

Copy link
Copy Markdown
Contributor

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you combine all the prs into one? I'm finding it hard to review this since most of the code is unused

// on the root session (sessionID="") before navigation has been recorded.
const mainSessionUnset = "\x00unset"

// Event type constants for all events published by the cdpmonitor.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this constant block would be a lot easier to reason about with some cosmetic structure: group the names into direct-ish CDP-derived monitor events, computed/synthetic events from computed.go, injected page-side interaction events, and monitor lifecycle/internal events, and add short godoc comments for each group (or for the less obvious constants). right now they're all in one flat list, which makes it hard to tell which names are grounded in protocol counterparts vs introduced locally by the monitor.

addedAt time.Time // for TTL eviction
}

// cdpConsoleArg is a single Runtime.consoleAPICalled argument.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like for this first decode layer to be much closer to the CDP protocol / PDL types, and then have a second translation step into monitor-specific event shapes. right now things like cdpConsoleArg / cdpConsoleParams are custom projections of Runtime.RemoteObject / Runtime.consoleAPICalled, which makes it harder to audit against the source of truth and easier to accidentally drop fields we may want later. i think mirroring the protocol more directly at the boundary would make this foundation easier to reason about, with the domain-specific simplification happening after decode. the upstream source of truth here is the CDP spec / PDL, e.g. js_protocol.pdl.

Second layer of the CDP monitor split. Adds the full Monitor struct, all
lifecycle machinery, and the test infrastructure. dispatchEvent is a
no-op stub - event handlers land in #213

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds substantial new WebSocket lifecycle/reconnect and command routing
logic with multiple goroutines/locks; regressions could impact capture
stability and leak resources. Tests mitigate but concurrency and
reconnect edge cases remain moderately risky.
> 
> **Overview**
> Implements the core `cdpmonitor.Monitor` lifecycle: establishes a CDP
WebSocket connection, routes command responses via `send()`, starts the
init sequence (`Target.setAutoAttach` + attach to existing targets), and
adds a `Health()` snapshot for operational visibility.
> 
> Adds automatic reconnect on upstream URL changes with backoff, state
reset/unblocking of in-flight commands, and lifecycle events
(`monitor_disconnected`, `monitor_reconnected`,
`monitor_reconnect_failed`, `monitor_init_failed`).
> 
> Introduces screenshot capture/publishing with rate limiting and
optional ffmpeg downscaling, plus a comprehensive test harness (fake WS
server/upstream, event collector) and new unit tests covering lifecycle,
reconnect, screenshot behavior, and pending-command unblocking. Domain
enablement/script injection is factored into `domains.go`, and
`dispatchEvent` is stubbed pending the follow-up handlers PR.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
bfb6cd7. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 348243a. Configure here.

key: e.key,
selector: sel(t), tag: t.tagName || ''
}));
}, true);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keydown listener captures sensitive input like passwords

High Severity

The keydown listener sends e.key for every keystroke with no filtering of sensitive contexts. When injected into browser sessions where users type passwords (e.g. <input type="password">), credit card numbers, or other PII, the actual key values flow through the event pipeline and get persisted. This effectively acts as a keylogger. At minimum, keystrokes targeting sensitive input types need to be redacted or skipped.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 348243a. Configure here.

m.computed.onLoadingFinished() // keep netPending consistent
}
}
m.pendReqMu.Unlock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweep calls onLoadingFinished while holding pendReqMu

Medium Severity

sweepPendingRequests calls m.computed.onLoadingFinished() inside a loop while holding pendReqMu. Each call acquires computed.mu, decrements netPending, and may restart the network idle timer. When multiple entries expire in one sweep, netPending can be decremented below the actual count (since the sweep-evicted requests were never tracked via onRequest), and the timer is needlessly restarted on each iteration. The onLoadingFinished calls here assume a 1:1 pairing with prior onRequest calls that may never have occurred for stale entries.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 348243a. Configure here.

@archandatta archandatta changed the title [kernel-1116] Setup CDP Monitor Foundation [kernel-1116] Add CDP Monitor Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants