Messages API

The Kindo Messages API provides Anthropic-compatible /v1/messages and /v1/messages/count_tokens endpoints on api.kindo.ai. Use it when you want Claude Code or Anthropic SDKs to route through Kindo’s governance pipeline instead of calling Anthropic directly.

With the Messages API, requests still use Anthropic’s native request and response shapes, while Kindo adds API-key auth, model access control, audit logging, DLP enforcement, and credit metering.

Why use the Messages API

Anthropic-compatible — Works with Claude Code, the Anthropic Python SDK, the Anthropic TypeScript SDK, and raw HTTP clients.
Governed by Kindo — Requests pass through Kindo auth, model access checks, DLP, and usage metering.
Supports Anthropic features — Streaming, tool use, and block-level prompt caching work with Kindo’s proxy.
Uses your Kindo model registry — Use the same model IDs returned by GET /v1/models.

Base URL and authentication

Use the api.kindo.ai domain for the Messages API:

Endpoint	Method	Purpose
`https://api.kindo.ai/v1/messages`	`POST`	Create a message
`https://api.kindo.ai/v1/messages/count_tokens`	`POST`	Count tokens for a request

For self-hosted installations, replace api.kindo.ai with your deployment’s API base URL.

Authentication

Both auth formats work:

Header	Example	Notes
`Authorization: Bearer`	`Authorization: Bearer YOUR_API_KEY`	Preferred for raw HTTP clients
`x-api-key`	`x-api-key: YOUR_API_KEY`	Common for Anthropic-compatible clients

When both headers are present, Authorization: Bearer takes precedence. If the Authorization header is present but malformed, the request is rejected instead of falling back to x-api-key.

Anthropic headers

Kindo forwards these Anthropic-specific headers to the upstream provider when present:

anthropic-version
anthropic-beta
x-claude-code-session-id

For raw HTTP requests, include anthropic-version. Anthropic SDKs and Claude Code set the required protocol headers for you.

`POST /v1/messages`

Create a message using Anthropic’s native Messages API format.

curl https://api.kindo.ai/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "max_tokens": 512,
    "messages": [
      {
        "role": "user",
        "content": "Explain Kubernetes pod security policies in plain English."
      }
    ]
  }'

Common request fields

Kindo validates the core Anthropic fields and passes through the rest, which keeps the endpoint forward-compatible with Anthropic request bodies.

Field	Required	Notes
`model`	Yes	Must match a model ID available through `GET /v1/models`
`messages`	Yes	Array of Anthropic messages. Roles must be `user` or `assistant`, must alternate, and the first message must be `user`. Use the top-level `system` field for system prompts
`max_tokens`	Yes	Positive integer
`stream`	No	Set `true` for Server-Sent Events
`system`	No	Optional system prompt. Accepts a string or an array of content blocks (use blocks to attach `cache_control`)
`tools`	No	Anthropic tool definitions are passed through
`tool_choice`	No	Use Anthropic’s standard tool-choice format
`metadata`	No	Accepted, but stripped before proxying upstream
Other Anthropic fields	No	Passed through unless Kindo reserves the field for internal routing

Kindo also strips internal routing fields such as litellm_metadata and proxy_server_request before proxying upstream so clients cannot spoof governance metadata.

Example response

{
  "id": "msg_01Aq9w938a90dw8q",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-5-20250929",
  "content": [
    {
      "type": "text",
      "text": "Pod Security Policies were Kubernetes rules that controlled what a pod was allowed to do..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 24,
    "output_tokens": 134
  }
}

`POST /v1/messages/count_tokens`

Count tokens for a Messages API request without generating a completion.

curl https://api.kindo.ai/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "system": "You are a concise assistant.",
    "messages": [
      {
        "role": "user",
        "content": "Summarize the principle of least privilege."
      }
    ]
  }'

`count_tokens` behavior

model and messages are required.
max_tokens is optional for count_tokens.
stream is not supported; omit it or set it to false.

Example response:

{
  "input_tokens": 20
}

Streaming

Set "stream": true on POST /v1/messages to receive a standard Anthropic Server-Sent Events stream.

curl -N https://api.kindo.ai/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "max_tokens": 512,
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku about observability." }
    ]
  }'

Kindo returns Content-Type: text/event-stream and preserves Anthropic’s event structure:

Event	Meaning
`message_start`	Initial message metadata
`content_block_start`	Start of a streamed content block
`content_block_delta`	Incremental content chunk
`content_block_stop`	End of the content block
`message_delta`	Updated usage or stop-reason data
`message_stop`	End of the stream
`error`	Stream interruption after the stream has started
`ping`	Keepalive heartbeat; can be safely ignored

If the upstream request fails before streaming begins, Kindo returns a normal HTTP error response with Anthropic’s JSON error envelope instead of switching to SSE.

Prompt caching

Block-level prompt caching works through Kindo. Add cache_control to Anthropic content blocks, then verify caching behavior from the usage object in the response.

{
  "model": "claude-sonnet-4-5-20250929",
  "max_tokens": 512,
  "system": [
    {
      "type": "text",
      "text": "You are a release-notes assistant. Always answer in bullet points.",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Summarize the last three deploys for service api."
        }
      ]
    }
  ]
}

Look for these fields in usage:

cache_creation_input_tokens — tokens written to cache on this request
cache_read_input_tokens — tokens read from cache on subsequent requests

Known limitation

Top-level cache_control is not currently forwarded by LiteLLM’s Anthropic TypedDict layer. Use block-level cache_control instead.

If your client requires an Anthropic beta header for prompt caching, use a header value supported by your target model. Internal end-to-end verification confirmed that anthropic-beta: prompt-caching-2024-07-31 works with Kindo’s proxy.

Tool use

Anthropic-format tools pass through to the upstream provider. Standard tool_use and tool_result content blocks work without any Kindo-specific wrapping.

{
  "model": "claude-sonnet-4-5-20250929",
  "max_tokens": 512,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a city.",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "What is the weather in San Francisco?"
    }
  ]
}

Typical flow:

Send a request with tools.
Receive an assistant message containing a tool_use block.
Execute the tool client-side.
Send the tool result back in a follow-up user message containing a tool_result block.

Claude Code setup

Point Claude Code at Kindo by setting two environment variables before launching the CLI.

Export your Kindo base URL:

export ANTHROPIC_BASE_URL=https://api.kindo.ai

Export your Kindo API key:

export ANTHROPIC_API_KEY=YOUR_KINDO_API_KEY

Start Claude Code normally:
Terminal window
```
claude
```

Claude Code will send Anthropic-compatible requests to https://api.kindo.ai/v1/messages using your Kindo API key. Kindo also forwards the x-claude-code-session-id header, which preserves Claude Code session context for upstream compatibility.

Why route Claude Code through Kindo

Centralized governance — Requests inherit the same audit logging, DLP, and access controls as other Kindo APIs.
One API key — Use the same Kindo API key you already use for other endpoints.
Model consistency — Claude Code uses the model IDs your organization exposes through Kindo.

If Claude Code returns a model-not-found error, call GET /v1/models first and use one of the IDs available to your organization.

Client examples

Use whichever Anthropic client fits your workflow.

curl https://api.kindo.ai/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "max_tokens": 512,
    "messages": [
      {
        "role": "user",
        "content": "Give three practical code review tips."
      }
    ]
  }'

import os

from anthropic import Anthropic

client = Anthropic(
    api_key=os.environ["KINDO_API_KEY"],
    base_url="https://api.kindo.ai",
)

message = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Explain RBAC in one paragraph."
        }
    ],
)

print(message.content[0].text)

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env.KINDO_API_KEY,
  baseURL: 'https://api.kindo.ai'
});

const message = await client.messages.create({
  model: 'claude-sonnet-4-5-20250929',
  max_tokens: 512,
  messages: [
    {
      role: 'user',
      content: 'Summarize defense in depth in two sentences.'
    }
  ]
});

console.log(message.content[0].text);

Differences from Anthropic’s direct API

Topic	Kindo Messages API	Direct Anthropic API
Base URL	`https://api.kindo.ai`	`https://api.anthropic.com`
API key	Kindo API key	Anthropic API key
Model names	Must match Kindo model IDs from `GET /v1/models`	Must match Anthropic-enabled models on your Anthropic account
Governance	Includes Kindo auth, access control, DLP, audit logging, and metering	Anthropic-only controls
Prompt caching	Block-level `cache_control` works; top-level `cache_control` currently does not	Depends on Anthropic’s native support

Error format

Errors use Anthropic’s standard envelope:

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "max_tokens is required"
  }
}

Kindo maps common HTTP statuses to Anthropic-style error types:

Status	Error type
`400`	`invalid_request_error`
`401`	`authentication_error`
`403`	`permission_error`
`404`	`not_found_error`
`429`	`rate_limit_error`
`5xx`	`api_error`