Skip to content

Working with Large Context

Large language models are powerful, but their biggest practical constraint is the context window: the maximum amount of text and tool state they can process in a single request. In Kindo, context includes your instructions, conversation history, and — for agents — tool calls and tool outputs.

As that grows, you can hit hard limits, degrade output quality, and increase inference cost because more tokens are processed on every turn.

This guide covers practical techniques you can use today in Kindo to keep large-context workflows reliable.

How Kindo Helps Today

Kindo mitigates large tool outputs by writing them into the sandbox and passing the model a reference to the file rather than injecting the full payload into the prompt. That enables an important pattern: instead of asking the model to read massive blobs, use the sandbox to search, filter, and transform data first — then feed the model only the high-signal subset.

Technique 1: Reduce Data Volume at the Source

The best way to prevent context blowups is to avoid retrieving unnecessary data in the first place. Most systems you integrate with provide filters that narrow results before they ever reach the model.

Common query levers include:

  • Time bounds: last 15 minutes, incident window, since alert fired
  • Severity: errors only, warnings + errors, critical only
  • Scope: service, environment, host, region, tenant
  • Identifiers: incident ID, trace ID, correlation ID
  • Pattern filters: keywords or error codes
  • Pagination and limits: fetch small batches first, then expand only if needed

The habit to build is: start narrow, validate the signal, then widen intentionally.

Example Prompts

Pull logs for only the incident window (10:12–10:47 UTC) and only error-level entries for service=api-gateway in prod. If that returns nothing, expand to warnings+errors.
Query the alerting system for alert_id=12345 and fetch only the related events and the top 50 associated log lines, not the entire day.
Fetch only authentication-related events for tenant=acme during the last 2 hours and include only timestamp, user, source_ip, action, and result.

Technique 2: Use the Sandbox to Sift Large Payloads

Even with upstream filtering, outputs can still be too large for a model to read efficiently. When the data is in the sandbox, let traditional tools do the heavy lifting.

Useful tools include:

  • grep / rg for searching by signature, ID, or keyword
  • find for locating generated files
  • sed / awk for extracting ranges or fields
  • Bash scripts for repeatable extraction pipelines
  • Python scripts for grouping, aggregation, parsing JSON/CSV, and top-N summaries

Common patterns:

  • Search for an error signature and extract a small context window around each match
  • Group events by correlation ID or error code and output counts plus representative examples
  • Parse structured logs into a compact report with top errors, affected hosts, and a short timeline

Example Prompts

The tool output was written to a file in the sandbox. Do not read the full file into context.
Use grep/rg to find all lines containing "Exception" or "ERROR", group by the most common error prefix, and output:
- top 5 error clusters with counts
- first/last timestamp per cluster
- 2 representative excerpts per cluster
Find the sandbox file containing the raw response. If it is JSON, write a short Python script to:
- parse it
- group by correlation_id
- output the 10 largest groups with one-line summaries
Then summarize only the script output.

Technique 3: Use MapReduce-Style Processing

When you truly need to process large bodies of content — such as multi-hour logs, many tickets, or large document sets — a single-pass prompt is often brittle. A MapReduce-style workflow is more reliable.

Map Phase

Split the input by a natural boundary such as:

  • time slice
  • file
  • service
  • ticket

For each chunk, extract only the minimum useful signal:

  • key events and timestamps
  • recurring patterns or clusters
  • important entities (services, users, IDs)
  • a few representative examples

Write those chunk summaries back to the sandbox as small JSON or Markdown artifacts.

Reduce Phase

Then have a model read only the chunk summaries and synthesize:

  • timeline
  • root cause hypothesis
  • remediation steps
  • supporting evidence

This pattern works especially well when combined with model tiering:

  • use a smaller/cheaper model for extraction
  • use a stronger reasoning model for the final synthesis

Example Two-Step Workflow

Step 1 — extraction / chunk summaries

We have 12 log chunks in /sandbox/log_chunks/.
For each file, produce a compact JSON summary with:
- file
- time_range
- top_errors (signature, count, first_seen, last_seen, short examples)
- notable_events
- correlation_ids
Write one JSON file per chunk to /sandbox/log_summaries/.
Do not include raw logs in the output.

Step 2 — final synthesis

Read only the JSON summaries in /sandbox/log_summaries/.
Produce:
1. A concise incident timeline
2. The most likely root cause
3. The top 3 remediation steps
4. The key evidence supporting each conclusion
If more raw data is required, specify exactly what to grep for and why instead of asking to read everything.

When working with large context in Kindo:

  1. Narrow the query upstream first
  2. Use the sandbox for search, filtering, and summarization
  3. Break large tasks into chunked phases
  4. Reserve the strongest model for synthesis, not raw parsing
  5. Avoid feeding large raw payloads into the model unless you have already reduced them substantially

Future Platform Improvements

Over time, Kindo can make context management more automatic. Areas that could improve this experience include:

  • more dynamic prompt loading so only relevant instructions are included per task
  • deeper prompt-prefix caching optimizations
  • automatic compaction for long conversations
  • better durable memory patterns for agents
  • first-class self-compaction helpers that preserve the user-visible transcript while reducing active working context

For now, the most reliable pattern is still: use traditional compute to reduce the data, then use the model to interpret the reduced result.