Chat Completions API

The Kindo Chat Completions API provides an OpenAI-compatible /v1/chat/completions endpoint on api.kindo.ai. Use it when you want to migrate existing OpenAI SDK code with minimal changes while routing through Kindo’s governance pipeline.

With the Chat Completions API, requests use OpenAI’s native request and response shapes, while Kindo adds API-key auth, model access control, audit logging, DLP enforcement, and credit metering.

Why use the Chat Completions API

OpenAI-compatible — Works with the OpenAI Python SDK, the OpenAI TypeScript SDK, and raw HTTP clients.
Governed by Kindo — Requests pass through Kindo auth, model access checks, DLP, and usage metering.
Supports streaming — Standard Server-Sent Events work with Kindo’s proxy.
Supports tool use — OpenAI function-calling tools pass through verbatim.
Uses your Kindo model registry — Use the same model IDs returned by GET /v1/models.

Base URL and authentication

Use the api.kindo.ai domain for the Chat Completions API:

Endpoint	Method	Purpose
`https://api.kindo.ai/v1/chat/completions`	`POST`	Create a chat completion

For self-hosted installations, replace api.kindo.ai with your deployment’s API base URL.

Authentication

Both auth formats work:

Header	Example	Notes
`Authorization: Bearer`	`Authorization: Bearer YOUR_API_KEY`	Preferred for raw HTTP clients
`x-api-key`	`x-api-key: YOUR_API_KEY`	Common for Anthropic-style clients

When both headers are present, Authorization: Bearer takes precedence. If the Authorization header is present but malformed, the request is rejected instead of falling back to x-api-key.

`POST /v1/chat/completions`

Create a chat completion using OpenAI’s native Chat Completions format.

curl https://api.kindo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "Explain Kubernetes pod security policies in plain English."
      }
    ]
  }'

Common request fields

Kindo validates the core OpenAI fields and passes through the rest, which keeps the endpoint forward-compatible with OpenAI request bodies.

Field	Required	Notes
`model`	Yes	Must match a model ID available through `GET /v1/models`
`messages`	Yes	Array of messages comprising the conversation. Must contain at least one entry.
`stream`	No	Set `true` for Server-Sent Events. Default: `false`
`temperature`	No	Sampling temperature
`max_tokens`	No	Maximum tokens to generate
`top_p`	No	Nucleus sampling parameter
`frequency_penalty`	No	Frequency penalty parameter
`presence_penalty`	No	Presence penalty parameter
`stop`	No	Stop sequence(s). String or array of strings
`tools`	No	OpenAI tools array. Forwarded verbatim to the upstream provider
`tool_choice`	No	Controls tool use. String or object with shape `{ type: "function", function: { name: string } }`
`response_format`	No	Response format object with a `type` field
`user`	No	End-user identifier

Extra fields not listed above are forwarded verbatim to LiteLLM because the schema uses passthrough(). See OpenAI’s API reference for full field semantics.

Kindo strips metadata, litellm_metadata, and proxy_server_request from the request body before proxying upstream so clients cannot spoof governance metadata.

Messages array

Each message in the messages array has the following shape:

Field	Required	Notes
`role`	Yes	One of `system`, `developer`, `user`, `assistant`, `tool`
`content`	No	String, array of content blocks, or `null` (e.g. assistant messages with only `tool_calls`). Arrays are forwarded verbatim and support OpenAI multimodal blocks
`name`	No	Name of the author of this message
`tool_calls`	No	Tool calls generated by the model, present on `assistant` messages
`tool_call_id`	No	ID of the tool call this message is a response to, required on `role: "tool"` messages

Example response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "claude-sonnet-4-5-20250929",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Pod Security Policies were Kubernetes rules that controlled what a pod was allowed to do..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 134,
    "total_tokens": 158
  }
}

Response fields

Field	Type	Description
`id`	string	Unique completion ID
`object`	string	Always `"chat.completion"`
`created`	number	Unix timestamp
`model`	string	Model used for generation
`choices`	array	Ordered list of completion choices. Each item has `index`, `message`, and `finish_reason`
`usage`	object (optional)	Token usage statistics: `prompt_tokens`, `completion_tokens`, `total_tokens`. Omitted from intermediate streaming chunks.

Additional fields (for example system_fingerprint, service_tier, logprobs, and refusal) may be present and are forwarded verbatim from the upstream provider.

Streaming

Set "stream": true on POST /v1/chat/completions to receive a standard OpenAI Server-Sent Events stream.

curl -N https://api.kindo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku about observability." }
    ]
  }'

Kindo returns Content-Type: text/event-stream and preserves OpenAI’s SSE structure. Each event is a JSON payload prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

The stream ends with:

data: [DONE]

If the upstream request fails before streaming begins, Kindo returns a normal HTTP error response with an OpenAI-compatible JSON error envelope instead of switching to SSE.

Tool use

OpenAI-format tool definitions pass through to the upstream provider. Standard tools and tool_choice fields work without any Kindo-specific wrapping.

{
  "model": "claude-sonnet-4-5-20250929",
  "messages": [
    { "role": "user", "content": "What is the weather in San Francisco?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Typical flow:

Send a request with tools.
Receive an assistant message containing tool_calls.
Execute the tool client-side.
Send the tool result back in a follow-up message with role: "tool", tool_call_id, and the result as content.

Supported models

The Chat Completions API uses the same model registry as the Responses API. Call GET /v1/models to enumerate the models available to your organization.

If your organization does not have access to the requested model, the API returns 403 with type: "permission_error" and code: "model_access_denied".

Error format

Errors emitted by the Kindo handler follow the OpenAI-compatible envelope:

{
  "error": {
    "message": "Model 'unknown-model' not found.",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Kindo maps common HTTP statuses to OpenAI-style error types:

Status	Type	Code
`400`	`invalid_request_error`	`invalid_body` or `model_not_found`
`403`	`permission_error`	`model_access_denied`
`429`	`rate_limit_error`	—
`502`	`server_error`	`upstream_empty_body`
`500`	`server_error`	`internal_error`

Two exceptions:

401 Unauthorized — responses from API-key authentication use a plain-string envelope instead of the OpenAI shape:
```
{ "error": "Unauthorized" }
```
There is no error.type or error.code field. This is verified by the handler test at routes/chat/postChatCompletions.test.ts:89.
4xx/5xx responses originating from upstream LiteLLM are forwarded verbatim with the upstream body and content-type. The shape (other than the Kindo-emitted upstream_empty_body code) depends on the upstream provider.

Mid-stream failures do not produce an HTTP 500. Once the SSE response has started, the outer HTTP status is already committed at 200. Stream errors arrive as an inline event: error SSE frame followed by data: [DONE]:

event: error
data: {"error":{"message":"...","type":"server_error","code":"stream_error"}}

data: [DONE]

SDK consumers should handle event: error payloads with code: "stream_error" separately from HTTP-level error envelopes.

Differences from `/v1/responses`

Topic	Chat Completions API	Responses API
Request shape	`messages` array	`input` string + `instructions`
Conversation state	Caller-managed	Kindo-managed via `conversation` ID
Tool model	OpenAI function-calling tools forwarded verbatim	Kindo MCP integrations + OpenAI-style function tools (Responses-format)
Output	`choices[].message`	Typed output items + `mcp_call` / `function_call`

When to pick which:

Chat Completions — Migrating from OpenAI or llm.kindo.ai and you want minimal code change.
Responses — Native MCP and OpenAI-style function tool integrations, plus managed conversations.

Migration from `llm.kindo.ai`

If you are currently calling https://llm.kindo.ai/v1/chat/completions, migrate to the unified api.kindo.ai surface:

	Old	New
URL	`https://llm.kindo.ai/v1/chat/completions`	`https://api.kindo.ai/v1/chat/completions`
Auth header	`Authorization: Bearer` or `api-key`	`Authorization: Bearer` or `x-api-key`
Request/response shape	Unchanged	Unchanged

The Authorization: Bearer form works on both surfaces and requires no change. If you used the bare api-key: header on llm.kindo.ai, rename it to x-api-key: for api.kindo.ai (see API overview for the full auth header matrix).

For the OpenAI SDK, change one line:

# Old
client = OpenAI(base_url="https://llm.kindo.ai", api_key="YOUR_API_KEY")

# New
client = OpenAI(base_url="https://api.kindo.ai/v1", api_key="YOUR_API_KEY")

The underlying governance pipeline is the same (it was always LiteLLM). The api.kindo.ai path adds the unified /v1 surface alongside /v1/messages and /v1/responses.

Client examples

Use whichever OpenAI client fits your workflow.

Non-streaming:

curl https://api.kindo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "messages": [
      {
        "role": "user",
        "content": "Give three practical code review tips."
      }
    ]
  }'

Streaming:

curl -N https://api.kindo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "stream": true,
    "messages": [
      {
        "role": "user",
        "content": "Give three practical code review tips."
      }
    ]
  }'

Non-streaming:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["KINDO_API_KEY"],
    base_url="https://api.kindo.ai/v1",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5-20250929",
    messages=[
        {
            "role": "user",
            "content": "Explain RBAC in one paragraph."
        }
    ],
)

print(response.choices[0].message.content)

Streaming:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["KINDO_API_KEY"],
    base_url="https://api.kindo.ai/v1",
)

stream = client.chat.completions.create(
    model="claude-sonnet-4-5-20250929",
    messages=[
        {
            "role": "user",
            "content": "Explain RBAC in one paragraph."
        }
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Non-streaming:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.KINDO_API_KEY,
  baseURL: 'https://api.kindo.ai/v1'
});

const response = await client.chat.completions.create({
  model: 'claude-sonnet-4-5-20250929',
  messages: [
    {
      role: 'user',
      content: 'Summarize defense in depth in two sentences.'
    }
  ]
});

console.log(response.choices[0].message.content);

Streaming:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.KINDO_API_KEY,
  baseURL: 'https://api.kindo.ai/v1'
});

const stream = await client.chat.completions.create({
  model: 'claude-sonnet-4-5-20250929',
  messages: [
    {
      role: 'user',
      content: 'Summarize defense in depth in two sentences.'
    }
  ],
  stream: true
});

for await (const chunk of stream) {
  if (chunk.choices[0].delta.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

Chat Completions API

Why use the Chat Completions API

Base URL and authentication

Authentication

POST /v1/chat/completions

Common request fields

Messages array

Example response

Response fields

Streaming

Tool use

Supported models

Error format

Differences from /v1/responses

Migration from llm.kindo.ai

Client examples

See also

`POST /v1/chat/completions`

Differences from `/v1/responses`

Migration from `llm.kindo.ai`