Skip to content

Messages API request shape

POST /v1/messages accepts the Anthropic Messages request body. Kindo validates the core fields and forwards the rest verbatim, which keeps the endpoint forward-compatible with new Anthropic fields.

FieldTypeNotes
modelstringA model ID from GET /v1/models.
messagesarrayRoles must be user or assistant, must alternate, and the first message must be user. Use the top-level system field for system prompts.
max_tokensintegerPositive integer.
FieldTypeForwarded?Notes
streambooleanConsumed by the route handler to switch into SSE mode.
systemstring | arrayYesOptional system prompt. Use array of content blocks to attach cache_control (see Prompt caching).
toolsarrayYesAnthropic tool definitions (see Tool use).
tool_choiceobjectYesAnthropic’s standard tool-choice format.
temperaturenumberYesSampling control.
top_pnumberYesNucleus sampling.
top_kintegerYesTop-K sampling.
stop_sequencesarrayYesStop sequences.

Anything not listed above passes through Kindo’s schema verbatim because the validator uses passthrough(). Refer to Anthropic’s Messages spec for full field semantics.

Kindo forwards these headers to the upstream provider when present:

  • anthropic-version
  • anthropic-beta
  • x-claude-code-session-id

Anthropic SDKs and Claude Code set the required protocol headers for you. Raw HTTP clients should include anthropic-version.

Kindo strips these fields from the outgoing upstream request:

  • metadata
  • litellm_metadata
  • proxy_server_request
FieldRequiredNotes
roleYesuser or assistant. Must alternate, must start with user.
contentYesString or array of content blocks. Blocks include text, tool_use, tool_result, image, etc. Forwarded verbatim.

System prompts go in the top-level system field, not in the messages array.

Block-level cache_control works through Kindo. Add cache_control to Anthropic content blocks and verify cache hits from the response usage object.

{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"system": [
{
"type": "text",
"text": "You are a release-notes assistant. Always answer in bullet points.",
"cache_control": { "type": "ephemeral" }
}
],
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Summarize the last three deploys for service api."
}
]
}
]
}

Look for these fields in usage:

  • cache_creation_input_tokens — tokens written to cache on this request.
  • cache_read_input_tokens — tokens read from cache on subsequent requests.

Top-level cache_control is not currently forwarded by LiteLLM’s Anthropic TypedDict layer. Use block-level cache_control instead.

If your client requires an Anthropic beta header for prompt caching, use a header value supported by your target model. Internal end-to-end verification confirmed anthropic-beta: prompt-caching-2024-07-31 works with Kindo’s proxy.

POST /v1/messages/count_tokens counts tokens for a Messages request without generating a completion.

Terminal window
curl https://api.kindo.ai/v1/messages/count_tokens \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $KINDO_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"system": "You are a concise assistant.",
"messages": [
{ "role": "user", "content": "Summarize the principle of least privilege." }
]
}'
FieldRequired for count_tokensNotes
modelYes
messagesYes
max_tokensOptional
streamNot supportedOmit it or set it to false.

Response:

{ "input_tokens": 20 }

/v1/messages is stock Anthropic — Kindo does not accept a kindo request block on this surface. Kindo’s opt-in extensions (curated system prompt, hosted tools, stateful conversations) are available on /v1/responses; see the Chat Actions guide.