Managing Context Windows in Kindo Agents

Prev Next

Large Language Models (LLMs) are powerful, but their biggest practical constraint is the context window: the maximum number of tokens a model can accept in a single request. In Kindo, “context” includes your instructions, the conversation history, and (for agents) tool calls and tool outputs. As that grows, you can hit hard context limits, see quality degrade, and incur higher inference cost because more tokens get processed on every turn.

This guide covers practical techniques you can use today in Kindo to keep agents reliable when working with large inputs.


How Kindo helps today

Kindo mitigates large tool outputs by writing them into the sandbox and passing the model a reference to the file rather than injecting the full payload into the prompt. That enables a critical pattern: instead of asking a model to read massive blobs, you can use the sandbox to search, filter, and transform data first—then only feed the model the high-signal subset.


Technique 1: Reduce data volume at the source (query smarter upstream)

The best way to prevent context blowups is to avoid retrieving unnecessary data in the first place. Most systems you integrate with provide query parameters to narrow results; using them effectively reduces tool output size, speeds up downstream processing, and improves agent reliability.

Common query levers (vary by system):

  • Time bounds: “last 15m,” “incident window,” “since alert fired”
  • Severity / level: errors only, warnings+errors, critical only
  • Scope: service, environment, host, region, tenant
  • Identifiers: alert ID, incident ID, correlation/trace ID
  • Pattern filters: keywords, error codes, regex-like filters (where supported)
  • Pagination / limits: cap result size per call; fetch more only if needed

The habit to develop is: start narrow, validate you’re seeing the right signal, then widen intentionally rather than starting broad and hoping the model can sift it.

Example prompts

Pull logs for only the incident window (10:12–10:47 UTC) and only error-level entries for service=api-gateway in prod. If that returns nothing, expand to warnings+errors.

Query the alerting system for alert_id=12345 and fetch only the related events and the top 50 most recent associated log lines, not the entire day.

Fetch only authentication-related events for tenant=acme during the last 2 hours, and include only fields: timestamp, user, source_ip, action, result.


Technique 2: Use the sandbox to sift large payloads (Unix tools + scripts)

Even with upstream filtering, you’ll often end up with outputs that are still too large to read comfortably in a model. When the data is in the sandbox, you can lean on traditional computing to perform the “map” phase far faster and cheaper than any model can by reading raw text. The model is then used where it excels: interpretation and synthesis over a compact, structured summary.

In Kindo, instruct the agent to process sandbox files using command-line tools and scripts such as:

  • grep / rg (ripgrep): find matching lines quickly (error signatures, IDs, keywords)
  • find: locate files by name/pattern (e.g., chunk outputs, log files)
  • sed / awk: extract ranges, clean up text, isolate fields
  • Bash scripts: chain tools together for repeatable extraction pipelines
  • Python scripts: parse JSON/CSV, group/aggregate, de-duplicate, compute top-N clusters

Common patterns that work well:

  • Search for a signature (error code / exception / correlation ID), then extract a small window around each match
  • Group events by correlation ID or error code and output a cluster summary (counts, first/last seen, representative examples)
  • Parse structured logs (JSON) to produce a compact report (top error types, top affected hosts, timeline)

Example prompts

The tool output was written to a file in the sandbox. Do not read the full file into context.
Use grep/rg to find all lines containing "Exception" or "ERROR", group by the most common error message prefix,
and output:
- top 5 error clusters with counts
- first/last timestamp per cluster
- 2 representative log excerpts per cluster (5–10 lines each)

Find the sandbox file containing the raw response (use find). If it’s JSON, write a short Python script to:
- parse it
- group by correlation_id
- output the 10 largest groups with a one-line summary each and representative samples.
Then only summarize the script’s output.


Technique 3: MapReduce-style processing (often with model tiering)

When you truly need to process large bodies of content (multi-hour logs, many tickets, large document sets), a single-pass approach is usually brittle. A MapReduce-style workflow is more reliable: first produce smaller structured outputs per chunk (“map”), then synthesize (“reduce”).

How it typically works:

  • Map phase (chunk-level extraction):
    • Split input by natural boundaries (time slices, per file, per service, per ticket)
    • For each chunk, extract minimal signal:
      • key events + timestamps
      • recurring patterns/clusters
      • entities (hosts/users/services/IDs)
      • a few representative examples
    • Write each chunk’s results to the sandbox as small artifacts (JSON/markdown)
  • Reduce phase (final synthesis):
    • Read only the chunk artifacts
    • Merge and de-duplicate
    • Produce the final narrative, RCA, recommended actions, etc.

This pairs well with model tiering:

  • Use a smaller/cheaper model for the extraction/map phase when tasks are mostly pattern spotting and structuring
  • Use a larger model for the reduce phase where deep reasoning and synthesis matter most

Note that Technique 2 (sandbox scripting) can often serve as a “map” phase even more efficiently than an LLM. In other words, you can choose between:

  • Traditional compute map → LLM reduce, or
  • Small-model map → large-model reduce, depending on the task.

Example prompts

Agent step 1. Model: small model

We have 12 log chunks in the sandbox under /sandbox/log_chunks/.

For each file, produce a compact JSON summary with this schema:
{
  "file": "...",
  "time_range": "...",
  "top_errors": [
    {"signature": "...", "count": 0, "first_seen": "...", "last_seen": "...", "examples": ["...", "..."]}
  ],
  "notable_events": ["...", "..."],
  "correlation_ids": ["...", "..."]
}

Write one JSON file per chunk to /sandbox/log_summaries/ using the same base filename (e.g., chunk_01.json).
Keep examples short (5–10 lines max per example). Do not include the raw logs.

Agent step 2. Model: larger reasoning model

Read only the JSON summaries in /sandbox/log_summaries/ (do not read the raw logs).

Synthesize:
1) A concise incident timeline (with timestamps where available)
2) The most likely root cause (and 1–2 plausible alternatives if evidence is mixed)
3) The top 3 remediation steps
4) The key supporting evidence for each conclusion (cite the specific chunk file + field you used)

If you believe additional raw log lines are required, specify exactly what to grep for and why, rather than asking to read everything.

Future Kindo Platform Improvements

Over time, Kindo can make context management more automatic and seamless. Potential improvements include:

  • Skills / dynamic prompt loading so only relevant prompt sections are included per task
  • Deeper prefix caching optimizations to further reduce repeated-prefix cost
  • Automatic compaction of long conversations (summarize older turns and keep references to sandbox artifacts)
  • Memory mechanisms (file-based conventions or a first-class tool) for durable agent state
  • Self-compaction helpers to offload or summarize irrelevant context while preserving the full user-visible transcript