Skip to content

Messages API

The Kindo Messages API provides Anthropic-compatible /v1/messages and /v1/messages/count_tokens endpoints on api.kindo.ai. Use it when you want Claude Code or Anthropic SDKs to route through Kindo’s governance pipeline instead of calling Anthropic directly.

With the Messages API, requests still use Anthropic’s native request and response shapes, while Kindo adds API-key auth, model access control, audit logging, DLP enforcement, and credit metering.

Why use the Messages API

  • Anthropic-compatible — Works with Claude Code, the Anthropic Python SDK, the Anthropic TypeScript SDK, and raw HTTP clients.
  • Governed by Kindo — Requests pass through Kindo auth, model access checks, DLP, and usage metering.
  • Supports Anthropic features — Streaming, tool use, and block-level prompt caching work with Kindo’s proxy.
  • Uses your Kindo model registry — Use the same model IDs returned by GET /v1/models.

Base URL and authentication

Use the api.kindo.ai domain for the Messages API:

EndpointMethodPurpose
https://api.kindo.ai/v1/messagesPOSTCreate a message
https://api.kindo.ai/v1/messages/count_tokensPOSTCount tokens for a request

For self-hosted installations, replace api.kindo.ai with your deployment’s API base URL.

Authentication

Both auth formats work:

HeaderExampleNotes
Authorization: BearerAuthorization: Bearer YOUR_API_KEYPreferred for raw HTTP clients
x-api-keyx-api-key: YOUR_API_KEYCommon for Anthropic-compatible clients

When both headers are present, Authorization: Bearer takes precedence. If the Authorization header is present but malformed, the request is rejected instead of falling back to x-api-key.

Anthropic headers

Kindo forwards these Anthropic-specific headers to the upstream provider when present:

  • anthropic-version
  • anthropic-beta
  • x-claude-code-session-id

For raw HTTP requests, include anthropic-version. Anthropic SDKs and Claude Code set the required protocol headers for you.

POST /v1/messages

Create a message using Anthropic’s native Messages API format.

Terminal window
curl https://api.kindo.ai/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"messages": [
{
"role": "user",
"content": "Explain Kubernetes pod security policies in plain English."
}
]
}'

Common request fields

Kindo validates the core Anthropic fields and passes through the rest, which keeps the endpoint forward-compatible with Anthropic request bodies.

FieldRequiredNotes
modelYesMust match a model ID available through GET /v1/models
messagesYesArray of Anthropic messages. Roles must be user or assistant, must alternate, and the first message must be user. Use the top-level system field for system prompts
max_tokensYesPositive integer
streamNoSet true for Server-Sent Events
systemNoOptional system prompt. Accepts a string or an array of content blocks (use blocks to attach cache_control)
toolsNoAnthropic tool definitions are passed through
tool_choiceNoUse Anthropic’s standard tool-choice format
metadataNoAccepted, but stripped before proxying upstream
Other Anthropic fieldsNoPassed through unless Kindo reserves the field for internal routing

Kindo also strips internal routing fields such as litellm_metadata and proxy_server_request before proxying upstream so clients cannot spoof governance metadata.

Example response

{
"id": "msg_01Aq9w938a90dw8q",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4-5-20250929",
"content": [
{
"type": "text",
"text": "Pod Security Policies were Kubernetes rules that controlled what a pod was allowed to do..."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 24,
"output_tokens": 134
}
}

POST /v1/messages/count_tokens

Count tokens for a Messages API request without generating a completion.

Terminal window
curl https://api.kindo.ai/v1/messages/count_tokens \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"system": "You are a concise assistant.",
"messages": [
{
"role": "user",
"content": "Summarize the principle of least privilege."
}
]
}'

count_tokens behavior

  • model and messages are required.
  • max_tokens is optional for count_tokens.
  • stream is not supported; omit it or set it to false.

Example response:

{
"input_tokens": 20
}

Streaming

Set "stream": true on POST /v1/messages to receive a standard Anthropic Server-Sent Events stream.

Terminal window
curl -N https://api.kindo.ai/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"stream": true,
"messages": [
{ "role": "user", "content": "Write a haiku about observability." }
]
}'

Kindo returns Content-Type: text/event-stream and preserves Anthropic’s event structure:

EventMeaning
message_startInitial message metadata
content_block_startStart of a streamed content block
content_block_deltaIncremental content chunk
content_block_stopEnd of the content block
message_deltaUpdated usage or stop-reason data
message_stopEnd of the stream
errorStream interruption after the stream has started
pingKeepalive heartbeat; can be safely ignored

If the upstream request fails before streaming begins, Kindo returns a normal HTTP error response with Anthropic’s JSON error envelope instead of switching to SSE.

Prompt caching

Block-level prompt caching works through Kindo. Add cache_control to Anthropic content blocks, then verify caching behavior from the usage object in the response.

{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"system": [
{
"type": "text",
"text": "You are a release-notes assistant. Always answer in bullet points.",
"cache_control": { "type": "ephemeral" }
}
],
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Summarize the last three deploys for service api."
}
]
}
]
}

Look for these fields in usage:

  • cache_creation_input_tokens — tokens written to cache on this request
  • cache_read_input_tokens — tokens read from cache on subsequent requests

Known limitation

Top-level cache_control is not currently forwarded by LiteLLM’s Anthropic TypedDict layer. Use block-level cache_control instead.

If your client requires an Anthropic beta header for prompt caching, use a header value supported by your target model. Internal end-to-end verification confirmed that anthropic-beta: prompt-caching-2024-07-31 works with Kindo’s proxy.

Tool use

Anthropic-format tools pass through to the upstream provider. Standard tool_use and tool_result content blocks work without any Kindo-specific wrapping.

{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"tools": [
{
"name": "get_weather",
"description": "Get the current weather for a city.",
"input_schema": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
],
"messages": [
{
"role": "user",
"content": "What is the weather in San Francisco?"
}
]
}

Typical flow:

  1. Send a request with tools.
  2. Receive an assistant message containing a tool_use block.
  3. Execute the tool client-side.
  4. Send the tool result back in a follow-up user message containing a tool_result block.

Claude Code setup

Point Claude Code at Kindo by setting two environment variables before launching the CLI.

  1. Export your Kindo base URL:

    Terminal window
    export ANTHROPIC_BASE_URL=https://api.kindo.ai
  2. Export your Kindo API key:

    Terminal window
    export ANTHROPIC_API_KEY=YOUR_KINDO_API_KEY
  3. Start Claude Code normally:

    Terminal window
    claude

Claude Code will send Anthropic-compatible requests to https://api.kindo.ai/v1/messages using your Kindo API key. Kindo also forwards the x-claude-code-session-id header, which preserves Claude Code session context for upstream compatibility.

Why route Claude Code through Kindo

  • Centralized governance — Requests inherit the same audit logging, DLP, and access controls as other Kindo APIs.
  • One API key — Use the same Kindo API key you already use for other endpoints.
  • Model consistency — Claude Code uses the model IDs your organization exposes through Kindo.

If Claude Code returns a model-not-found error, call GET /v1/models first and use one of the IDs available to your organization.

Client examples

Use whichever Anthropic client fits your workflow.

Terminal window
curl https://api.kindo.ai/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 512,
"messages": [
{
"role": "user",
"content": "Give three practical code review tips."
}
]
}'

Differences from Anthropic’s direct API

TopicKindo Messages APIDirect Anthropic API
Base URLhttps://api.kindo.aihttps://api.anthropic.com
API keyKindo API keyAnthropic API key
Model namesMust match Kindo model IDs from GET /v1/modelsMust match Anthropic-enabled models on your Anthropic account
GovernanceIncludes Kindo auth, access control, DLP, audit logging, and meteringAnthropic-only controls
Prompt cachingBlock-level cache_control works; top-level cache_control currently does notDepends on Anthropic’s native support

Error format

Errors use Anthropic’s standard envelope:

{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "max_tokens is required"
}
}

Kindo maps common HTTP statuses to Anthropic-style error types:

StatusError type
400invalid_request_error
401authentication_error
403permission_error
404not_found_error
429rate_limit_error
5xxapi_error

See also