Describe the feature or problem you'd like to solve
Systems programming (or really debugging and developing any large system) benefits from large verbatim log reads or code traces with prior context to form new hypotheses. But these reads pollute the context window permanently.
Proposed solution
Problem Description
During long debugging sessions (e.g. implementing and debugging a binary wire protocol against an emulator), the agent needs to read large
diagnostic outputs — logs, hex dumps, protocol traces — to understand system state. This creates a fundamental tension:
- Reading full logs is essential: You need broad context to notice unexpected states, ordering anomalies, and connection lifecycle issues
that narrow grep/tail would miss. Filtering to your current hypothesis is confirmation bias in tool form.
- Full logs permanently bloat context: Every diagnostic read stays in the context window for the rest of the session, degrading performance
and eventually forcing a compaction that loses hard-won understanding.
A human debugger naturally handles this — they scan logs, extract observations, close the log viewer, and work from notes. The raw log
doesn't consume their working memory indefinitely. The agent currently lacks this ability.
Proposed Features
- Speculative Context (checkpoint-rollback-summarize)
Allow the agent to:
- Checkpoint the current context state before a diagnostic read
- Read large outputs with full context — see everything, form hypotheses
- Roll back to the checkpoint and append only structured findings
This gives the agent "scratch space" for investigation without permanent context cost. The existing checkpoint machinery could support this
with finer granularity and the ability to attach a summary on rollback.
- Forked Context for Sub-Agents
Currently, sub-agents (explore, task) start with a blank context. They lack the latent understanding the main agent has accumulated —
protocol quirks, naming conventions, architectural decisions, failure modes discovered over hundreds of turns. This makes them ineffective
for tasks that require that accumulated knowledge.
Allow spawning a sub-agent that forks the main agent's current context, so it inherits all accumulated understanding. The sub-agent does its
work (e.g. reading and analyzing logs) in a disposable context copy, then returns only the distilled findings to the parent. This combines
full contextual understanding with disposable scratch space.
Use Case
Any long-running systems debugging session where the agent iteratively:
- Reads diagnostic output to understand failures
- Makes code changes based on findings
- Re-runs tests and reads new diagnostics
- Repeats over many cycles
Each diagnostic read is individually valuable but collectively consumes context that crowds out the understanding needed to interpret future
diagnostics. The two features above would break this tradeoff.
Example prompts or workflows
Example Prompts and Workflows
-
Debugging a binary protocol implementation Agent implements a network protocol handler, runs integration tests, reads multi-KB log output
showing connection lifecycle + PDU exchanges + error codes. Needs to correlate events across multiple connections to find a framing error.
After fixing, reads new logs to verify — but the old logs are still in context, wasting tokens. Speculative context would let it read,
conclude "the response header length field was wrong," roll back, and keep only that finding.
-
Investigating a flaky test failure Agent needs to compare logs from a passing run vs. a failing run to spot the difference. Reading both
full logs doubles the context cost, but diffing them with grep loses the surrounding state that reveals why the difference matters.
Speculative context lets it read both, summarize "the failing run shows connection X closing before message Y is acked," and discard the raw
logs.
-
Sub-agent for build error triage Main agent has spent 200+ turns understanding a codebase's architecture, type system quirks, and build
configuration. A build fails with 500 lines of compiler errors. A sub-agent could triage them, but a cold sub-agent doesn't know which
errors are real vs. cascading, or which modules matter. A forked sub-agent inherits the main agent's understanding and can effectively
categorize: "3 real errors in module X, the rest are cascading from a missing import."
-
Performance profiling analysis Agent reads a large perf/flamegraph/profiler output to identify hotspots in code it's been optimizing. The
profiler output is thousands of lines, but the agent needs the full picture to distinguish expected hot paths from surprising ones.
Speculative context lets it read the full profile, extract "function F is 40% of wall time due to an unnecessary allocation in the inner
loop," and roll back.
-
Multi-service distributed system debugging Agent is debugging a request failure across 3 services. It needs to read logs from each
service, correlate by request ID, and understand the causal chain. A forked sub-agent per service could each read that service's logs with
full understanding of the system architecture, then the parent merges their findings: "Service A sent the request, Service B rejected it due
to a schema mismatch, Service C never saw it."
Additional context
You can contact me internally and I can give more information or try out prototypes.
Describe the feature or problem you'd like to solve
Systems programming (or really debugging and developing any large system) benefits from large verbatim log reads or code traces with prior context to form new hypotheses. But these reads pollute the context window permanently.
Proposed solution
Problem Description
During long debugging sessions (e.g. implementing and debugging a binary wire protocol against an emulator), the agent needs to read large
diagnostic outputs — logs, hex dumps, protocol traces — to understand system state. This creates a fundamental tension:
that narrow grep/tail would miss. Filtering to your current hypothesis is confirmation bias in tool form.
and eventually forcing a compaction that loses hard-won understanding.
A human debugger naturally handles this — they scan logs, extract observations, close the log viewer, and work from notes. The raw log
doesn't consume their working memory indefinitely. The agent currently lacks this ability.
Proposed Features
Allow the agent to:
This gives the agent "scratch space" for investigation without permanent context cost. The existing checkpoint machinery could support this
with finer granularity and the ability to attach a summary on rollback.
Currently, sub-agents (explore, task) start with a blank context. They lack the latent understanding the main agent has accumulated —
protocol quirks, naming conventions, architectural decisions, failure modes discovered over hundreds of turns. This makes them ineffective
for tasks that require that accumulated knowledge.
Allow spawning a sub-agent that forks the main agent's current context, so it inherits all accumulated understanding. The sub-agent does its
work (e.g. reading and analyzing logs) in a disposable context copy, then returns only the distilled findings to the parent. This combines
full contextual understanding with disposable scratch space.
Use Case
Any long-running systems debugging session where the agent iteratively:
Each diagnostic read is individually valuable but collectively consumes context that crowds out the understanding needed to interpret future
diagnostics. The two features above would break this tradeoff.
Example prompts or workflows
Example Prompts and Workflows
Debugging a binary protocol implementation Agent implements a network protocol handler, runs integration tests, reads multi-KB log output
showing connection lifecycle + PDU exchanges + error codes. Needs to correlate events across multiple connections to find a framing error.
After fixing, reads new logs to verify — but the old logs are still in context, wasting tokens. Speculative context would let it read,
conclude "the response header length field was wrong," roll back, and keep only that finding.
Investigating a flaky test failure Agent needs to compare logs from a passing run vs. a failing run to spot the difference. Reading both
full logs doubles the context cost, but diffing them with grep loses the surrounding state that reveals why the difference matters.
Speculative context lets it read both, summarize "the failing run shows connection X closing before message Y is acked," and discard the raw
logs.
Sub-agent for build error triage Main agent has spent 200+ turns understanding a codebase's architecture, type system quirks, and build
configuration. A build fails with 500 lines of compiler errors. A sub-agent could triage them, but a cold sub-agent doesn't know which
errors are real vs. cascading, or which modules matter. A forked sub-agent inherits the main agent's understanding and can effectively
categorize: "3 real errors in module X, the rest are cascading from a missing import."
Performance profiling analysis Agent reads a large perf/flamegraph/profiler output to identify hotspots in code it's been optimizing. The
profiler output is thousands of lines, but the agent needs the full picture to distinguish expected hot paths from surprising ones.
Speculative context lets it read the full profile, extract "function F is 40% of wall time due to an unnecessary allocation in the inner
loop," and roll back.
Multi-service distributed system debugging Agent is debugging a request failure across 3 services. It needs to read logs from each
service, correlate by request ID, and understand the causal chain. A forked sub-agent per service could each read that service's logs with
full understanding of the system architecture, then the parent merges their findings: "Service A sent the request, Service B rejected it due
to a schema mismatch, Service C never saw it."
Additional context
You can contact me internally and I can give more information or try out prototypes.