Skip to content

Speculative context (checkpoint/rewind/summarize) and forked context for sub-agents #2720

@neerajsi-msft

Description

@neerajsi-msft

Describe the feature or problem you'd like to solve

Systems programming (or really debugging and developing any large system) benefits from large verbatim log reads or code traces with prior context to form new hypotheses. But these reads pollute the context window permanently.

Proposed solution

Problem Description

During long debugging sessions (e.g. implementing and debugging a binary wire protocol against an emulator), the agent needs to read large
diagnostic outputs — logs, hex dumps, protocol traces — to understand system state. This creates a fundamental tension:

  • Reading full logs is essential: You need broad context to notice unexpected states, ordering anomalies, and connection lifecycle issues
    that narrow grep/tail would miss. Filtering to your current hypothesis is confirmation bias in tool form.
  • Full logs permanently bloat context: Every diagnostic read stays in the context window for the rest of the session, degrading performance
    and eventually forcing a compaction that loses hard-won understanding.

A human debugger naturally handles this — they scan logs, extract observations, close the log viewer, and work from notes. The raw log
doesn't consume their working memory indefinitely. The agent currently lacks this ability.

Proposed Features

  1. Speculative Context (checkpoint-rollback-summarize)

Allow the agent to:

  1. Checkpoint the current context state before a diagnostic read
  2. Read large outputs with full context — see everything, form hypotheses
  3. Roll back to the checkpoint and append only structured findings

This gives the agent "scratch space" for investigation without permanent context cost. The existing checkpoint machinery could support this
with finer granularity and the ability to attach a summary on rollback.

  1. Forked Context for Sub-Agents

Currently, sub-agents (explore, task) start with a blank context. They lack the latent understanding the main agent has accumulated —
protocol quirks, naming conventions, architectural decisions, failure modes discovered over hundreds of turns. This makes them ineffective
for tasks that require that accumulated knowledge.

Allow spawning a sub-agent that forks the main agent's current context, so it inherits all accumulated understanding. The sub-agent does its
work (e.g. reading and analyzing logs) in a disposable context copy, then returns only the distilled findings to the parent. This combines
full contextual understanding with disposable scratch space.

Use Case

Any long-running systems debugging session where the agent iteratively:

  • Reads diagnostic output to understand failures
  • Makes code changes based on findings
  • Re-runs tests and reads new diagnostics
  • Repeats over many cycles

Each diagnostic read is individually valuable but collectively consumes context that crowds out the understanding needed to interpret future
diagnostics. The two features above would break this tradeoff.

Example prompts or workflows

 Example Prompts and Workflows

  1. Debugging a binary protocol implementation Agent implements a network protocol handler, runs integration tests, reads multi-KB log output
    showing connection lifecycle + PDU exchanges + error codes. Needs to correlate events across multiple connections to find a framing error.
    After fixing, reads new logs to verify — but the old logs are still in context, wasting tokens. Speculative context would let it read,
    conclude "the response header length field was wrong," roll back, and keep only that finding.

  2. Investigating a flaky test failure Agent needs to compare logs from a passing run vs. a failing run to spot the difference. Reading both
    full logs doubles the context cost, but diffing them with grep loses the surrounding state that reveals why the difference matters.
    Speculative context lets it read both, summarize "the failing run shows connection X closing before message Y is acked," and discard the raw
    logs.

  3. Sub-agent for build error triage Main agent has spent 200+ turns understanding a codebase's architecture, type system quirks, and build
    configuration. A build fails with 500 lines of compiler errors. A sub-agent could triage them, but a cold sub-agent doesn't know which
    errors are real vs. cascading, or which modules matter. A forked sub-agent inherits the main agent's understanding and can effectively
    categorize: "3 real errors in module X, the rest are cascading from a missing import."

  4. Performance profiling analysis Agent reads a large perf/flamegraph/profiler output to identify hotspots in code it's been optimizing. The
    profiler output is thousands of lines, but the agent needs the full picture to distinguish expected hot paths from surprising ones.
    Speculative context lets it read the full profile, extract "function F is 40% of wall time due to an unnecessary allocation in the inner
    loop," and roll back.

  5. Multi-service distributed system debugging Agent is debugging a request failure across 3 services. It needs to read logs from each
    service, correlate by request ID, and understand the causal chain. A forked sub-agent per service could each read that service's logs with
    full understanding of the system architecture, then the parent merges their findings: "Service A sent the request, Service B rejected it due
    to a schema mismatch, Service C never saw it."

Additional context

You can contact me internally and I can give more information or try out prototypes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions