Working with Large Context

Large language models are powerful, but their biggest practical constraint is the context window: the maximum amount of text and tool state they can process in a single request. In Kindo, context includes your instructions, conversation history, and — for agents — tool calls and tool outputs.

As that grows, you can hit hard limits, degrade output quality, and increase inference cost because more tokens are processed on every turn.

This guide covers practical techniques you can use today in Kindo to keep large-context workflows reliable.

How Kindo Helps Today

Kindo mitigates large tool outputs by writing them into the sandbox and passing the model a reference to the file rather than injecting the full payload into the prompt. That enables an important pattern: instead of asking the model to read massive blobs, use the sandbox to search, filter, and transform data first — then feed the model only the high-signal subset.

Technique 1: Reduce Data Volume at the Source

The best way to prevent context blowups is to avoid retrieving unnecessary data in the first place. Most systems you integrate with provide filters that narrow results before they ever reach the model.

Common query levers include:

Time bounds: last 15 minutes, incident window, since alert fired
Severity: errors only, warnings + errors, critical only
Scope: service, environment, host, region, tenant
Identifiers: incident ID, trace ID, correlation ID
Pattern filters: keywords or error codes
Pagination and limits: fetch small batches first, then expand only if needed

The habit to build is: start narrow, validate the signal, then widen intentionally.

Example Prompts

Pull logs for only the incident window (10:12–10:47 UTC) and only error-level entries for service=api-gateway in prod. If that returns nothing, expand to warnings+errors.

Query the alerting system for alert_id=12345 and fetch only the related events and the top 50 associated log lines, not the entire day.

Fetch only authentication-related events for tenant=acme during the last 2 hours and include only timestamp, user, source_ip, action, and result.

Technique 2: Use the Sandbox to Sift Large Payloads

Even with upstream filtering, outputs can still be too large for a model to read efficiently. When the data is in the sandbox, let traditional tools do the heavy lifting.

Useful tools include:

grep / rg for searching by signature, ID, or keyword
find for locating generated files
sed / awk for extracting ranges or fields
Bash scripts for repeatable extraction pipelines
Python scripts for grouping, aggregation, parsing JSON/CSV, and top-N summaries

Common patterns:

Search for an error signature and extract a small context window around each match
Group events by correlation ID or error code and output counts plus representative examples
Parse structured logs into a compact report with top errors, affected hosts, and a short timeline

Example Prompts

The tool output was written to a file in the sandbox. Do not read the full file into context.
Use grep/rg to find all lines containing "Exception" or "ERROR", group by the most common error prefix, and output:
- top 5 error clusters with counts
- first/last timestamp per cluster
- 2 representative excerpts per cluster

Find the sandbox file containing the raw response. If it is JSON, write a short Python script to:
- parse it
- group by correlation_id
- output the 10 largest groups with one-line summaries
Then summarize only the script output.

Technique 3: Use MapReduce-Style Processing

When you truly need to process large bodies of content — such as multi-hour logs, many tickets, or large document sets — a single-pass prompt is often brittle. A MapReduce-style workflow is more reliable.

Map Phase

Split the input by a natural boundary such as:

time slice
file
service
ticket

For each chunk, extract only the minimum useful signal:

key events and timestamps
recurring patterns or clusters
important entities (services, users, IDs)
a few representative examples

Write those chunk summaries back to the sandbox as small JSON or Markdown artifacts.

Reduce Phase

Then have a model read only the chunk summaries and synthesize:

timeline
root cause hypothesis
remediation steps
supporting evidence

This pattern works especially well when combined with model tiering:

use a smaller/cheaper model for extraction
use a stronger reasoning model for the final synthesis

Example Two-Step Workflow

Step 1 — extraction / chunk summaries

We have 12 log chunks in /sandbox/log_chunks/.

For each file, produce a compact JSON summary with:
- file
- time_range
- top_errors (signature, count, first_seen, last_seen, short examples)
- notable_events
- correlation_ids

Write one JSON file per chunk to /sandbox/log_summaries/.
Do not include raw logs in the output.

Step 2 — final synthesis

Read only the JSON summaries in /sandbox/log_summaries/.

Produce:
1. A concise incident timeline
2. The most likely root cause
3. The top 3 remediation steps
4. The key evidence supporting each conclusion

If more raw data is required, specify exactly what to grep for and why instead of asking to read everything.

Recommended Working Style

When working with large context in Kindo:

Narrow the query upstream first
Use the sandbox for search, filtering, and summarization
Break large tasks into chunked phases
Reserve the strongest model for synthesis, not raw parsing
Avoid feeding large raw payloads into the model unless you have already reduced them substantially

Future Platform Improvements

Over time, Kindo can make context management more automatic. Areas that could improve this experience include:

more dynamic prompt loading so only relevant instructions are included per task
deeper prompt-prefix caching optimizations
automatic compaction for long conversations
better durable memory patterns for agents
first-class self-compaction helpers that preserve the user-visible transcript while reducing active working context

For now, the most reliable pattern is still: use traditional compute to reduce the data, then use the model to interpret the reduced result.