Chat Completions API
The Kindo Chat Completions API provides an OpenAI-compatible /v1/chat/completions endpoint on api.kindo.ai. Use it when you want to migrate existing OpenAI SDK code with minimal changes while routing through Kindo’s governance pipeline.
With the Chat Completions API, requests use OpenAI’s native request and response shapes, while Kindo adds API-key auth, model access control, audit logging, DLP enforcement, and credit metering.
Why use the Chat Completions API
- OpenAI-compatible — Works with the OpenAI Python SDK, the OpenAI TypeScript SDK, and raw HTTP clients.
- Governed by Kindo — Requests pass through Kindo auth, model access checks, DLP, and usage metering.
- Supports streaming — Standard Server-Sent Events work with Kindo’s proxy.
- Supports tool use — OpenAI function-calling tools pass through verbatim.
- Uses your Kindo model registry — Use the same model IDs returned by
GET /v1/models.
Base URL and authentication
Use the api.kindo.ai domain for the Chat Completions API:
| Endpoint | Method | Purpose |
|---|---|---|
https://api.kindo.ai/v1/chat/completions | POST | Create a chat completion |
For self-hosted installations, replace api.kindo.ai with your deployment’s API base URL.
Authentication
Both auth formats work:
| Header | Example | Notes |
|---|---|---|
Authorization: Bearer | Authorization: Bearer YOUR_API_KEY | Preferred for raw HTTP clients |
x-api-key | x-api-key: YOUR_API_KEY | Common for Anthropic-style clients |
When both headers are present, Authorization: Bearer takes precedence. If the Authorization header is present but malformed, the request is rejected instead of falling back to x-api-key.
POST /v1/chat/completions
Create a chat completion using OpenAI’s native Chat Completions format.
curl https://api.kindo.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "claude-sonnet-4-5-20250929", "messages": [ { "role": "user", "content": "Explain Kubernetes pod security policies in plain English." } ] }'Common request fields
Kindo validates the core OpenAI fields and passes through the rest, which keeps the endpoint forward-compatible with OpenAI request bodies.
| Field | Required | Notes |
|---|---|---|
model | Yes | Must match a model ID available through GET /v1/models |
messages | Yes | Array of messages comprising the conversation. Must contain at least one entry. |
stream | No | Set true for Server-Sent Events. Default: false |
temperature | No | Sampling temperature |
max_tokens | No | Maximum tokens to generate |
top_p | No | Nucleus sampling parameter |
frequency_penalty | No | Frequency penalty parameter |
presence_penalty | No | Presence penalty parameter |
stop | No | Stop sequence(s). String or array of strings |
tools | No | OpenAI tools array. Forwarded verbatim to the upstream provider |
tool_choice | No | Controls tool use. String or object with shape { type: "function", function: { name: string } } |
response_format | No | Response format object with a type field |
user | No | End-user identifier |
Extra fields not listed above are forwarded verbatim to LiteLLM because the schema uses passthrough(). See OpenAI’s API reference for full field semantics.
Kindo strips metadata, litellm_metadata, and proxy_server_request from the request body before proxying upstream so clients cannot spoof governance metadata.
Messages array
Each message in the messages array has the following shape:
| Field | Required | Notes |
|---|---|---|
role | Yes | One of system, developer, user, assistant, tool |
content | No | String, array of content blocks, or null (e.g. assistant messages with only tool_calls). Arrays are forwarded verbatim and support OpenAI multimodal blocks |
name | No | Name of the author of this message |
tool_calls | No | Tool calls generated by the model, present on assistant messages |
tool_call_id | No | ID of the tool call this message is a response to, required on role: "tool" messages |
Example response
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1710000000, "model": "claude-sonnet-4-5-20250929", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Pod Security Policies were Kubernetes rules that controlled what a pod was allowed to do..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 134, "total_tokens": 158 }}Response fields
| Field | Type | Description |
|---|---|---|
id | string | Unique completion ID |
object | string | Always "chat.completion" |
created | number | Unix timestamp |
model | string | Model used for generation |
choices | array | Ordered list of completion choices. Each item has index, message, and finish_reason |
usage | object (optional) | Token usage statistics: prompt_tokens, completion_tokens, total_tokens. Omitted from intermediate streaming chunks. |
Additional fields (for example system_fingerprint, service_tier, logprobs, and refusal) may be present and are forwarded verbatim from the upstream provider.
Streaming
Set "stream": true on POST /v1/chat/completions to receive a standard OpenAI Server-Sent Events stream.
curl -N https://api.kindo.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "claude-sonnet-4-5-20250929", "stream": true, "messages": [ { "role": "user", "content": "Write a haiku about observability." } ] }'Kindo returns Content-Type: text/event-stream and preserves OpenAI’s SSE structure. Each event is a JSON payload prefixed with data: :
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}The stream ends with:
data: [DONE]If the upstream request fails before streaming begins, Kindo returns a normal HTTP error response with an OpenAI-compatible JSON error envelope instead of switching to SSE.
Tool use
OpenAI-format tool definitions pass through to the upstream provider. Standard tools and tool_choice fields work without any Kindo-specific wrapping.
{ "model": "claude-sonnet-4-5-20250929", "messages": [ { "role": "user", "content": "What is the weather in San Francisco?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a city.", "parameters": { "type": "object", "properties": { "location": { "type": "string" } }, "required": ["location"] } } } ], "tool_choice": "auto"}Typical flow:
- Send a request with
tools. - Receive an assistant message containing
tool_calls. - Execute the tool client-side.
- Send the tool result back in a follow-up message with
role: "tool",tool_call_id, and the result ascontent.
Supported models
The Chat Completions API uses the same model registry as the Responses API. Call GET /v1/models to enumerate the models available to your organization.
If your organization does not have access to the requested model, the API returns 403 with type: "permission_error" and code: "model_access_denied".
Error format
Errors emitted by the Kindo handler follow the OpenAI-compatible envelope:
{ "error": { "message": "Model 'unknown-model' not found.", "type": "invalid_request_error", "code": "model_not_found" }}Kindo maps common HTTP statuses to OpenAI-style error types:
| Status | Type | Code |
|---|---|---|
400 | invalid_request_error | invalid_body or model_not_found |
403 | permission_error | model_access_denied |
429 | rate_limit_error | — |
502 | server_error | upstream_empty_body |
500 | server_error | internal_error |
Two exceptions:
-
401 Unauthorized — responses from API-key authentication use a plain-string envelope instead of the OpenAI shape:
{ "error": "Unauthorized" }There is no
error.typeorerror.codefield. This is verified by the handler test atroutes/chat/postChatCompletions.test.ts:89. -
4xx/5xx responses originating from upstream LiteLLM are forwarded verbatim with the upstream body and content-type. The shape (other than the Kindo-emitted
upstream_empty_bodycode) depends on the upstream provider.
Mid-stream failures do not produce an HTTP 500. Once the SSE response has started, the outer HTTP status is already committed at 200. Stream errors arrive as an inline event: error SSE frame followed by data: [DONE]:
event: errordata: {"error":{"message":"...","type":"server_error","code":"stream_error"}}
data: [DONE]SDK consumers should handle event: error payloads with code: "stream_error" separately from HTTP-level error envelopes.
Differences from /v1/responses
| Topic | Chat Completions API | Responses API |
|---|---|---|
| Request shape | messages array | input string + instructions |
| Conversation state | Caller-managed | Kindo-managed via conversation ID |
| Tool model | OpenAI function-calling tools forwarded verbatim | Kindo MCP integrations + OpenAI-style function tools (Responses-format) |
| Output | choices[].message | Typed output items + mcp_call / function_call |
When to pick which:
- Chat Completions — Migrating from OpenAI or
llm.kindo.aiand you want minimal code change. - Responses — Native MCP and OpenAI-style function tool integrations, plus managed conversations.
Migration from llm.kindo.ai
If you are currently calling https://llm.kindo.ai/v1/chat/completions, migrate to the unified api.kindo.ai surface:
| Old | New | |
|---|---|---|
| URL | https://llm.kindo.ai/v1/chat/completions | https://api.kindo.ai/v1/chat/completions |
| Auth header | Authorization: Bearer or api-key | Authorization: Bearer or x-api-key |
| Request/response shape | Unchanged | Unchanged |
The Authorization: Bearer form works on both surfaces and requires no change. If you used the bare api-key: header on llm.kindo.ai, rename it to x-api-key: for api.kindo.ai (see API overview for the full auth header matrix).
For the OpenAI SDK, change one line:
# Oldclient = OpenAI(base_url="https://llm.kindo.ai", api_key="YOUR_API_KEY")
# Newclient = OpenAI(base_url="https://api.kindo.ai/v1", api_key="YOUR_API_KEY")The underlying governance pipeline is the same (it was always LiteLLM). The api.kindo.ai path adds the unified /v1 surface alongside /v1/messages and /v1/responses.
Client examples
Use whichever OpenAI client fits your workflow.
Non-streaming:
curl https://api.kindo.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "x-api-key: YOUR_API_KEY" \ -d '{ "model": "claude-sonnet-4-5-20250929", "messages": [ { "role": "user", "content": "Give three practical code review tips." } ] }'Streaming:
curl -N https://api.kindo.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "x-api-key: YOUR_API_KEY" \ -d '{ "model": "claude-sonnet-4-5-20250929", "stream": true, "messages": [ { "role": "user", "content": "Give three practical code review tips." } ] }'Non-streaming:
import osfrom openai import OpenAI
client = OpenAI( api_key=os.environ["KINDO_API_KEY"], base_url="https://api.kindo.ai/v1",)
response = client.chat.completions.create( model="claude-sonnet-4-5-20250929", messages=[ { "role": "user", "content": "Explain RBAC in one paragraph." } ],)
print(response.choices[0].message.content)Streaming:
import osfrom openai import OpenAI
client = OpenAI( api_key=os.environ["KINDO_API_KEY"], base_url="https://api.kindo.ai/v1",)
stream = client.chat.completions.create( model="claude-sonnet-4-5-20250929", messages=[ { "role": "user", "content": "Explain RBAC in one paragraph." } ], stream=True,)
for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")Non-streaming:
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.KINDO_API_KEY, baseURL: 'https://api.kindo.ai/v1'});
const response = await client.chat.completions.create({ model: 'claude-sonnet-4-5-20250929', messages: [ { role: 'user', content: 'Summarize defense in depth in two sentences.' } ]});
console.log(response.choices[0].message.content);Streaming:
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.KINDO_API_KEY, baseURL: 'https://api.kindo.ai/v1'});
const stream = await client.chat.completions.create({ model: 'claude-sonnet-4-5-20250929', messages: [ { role: 'user', content: 'Summarize defense in depth in two sentences.' } ], stream: true});
for await (const chunk of stream) { if (chunk.choices[0].delta.content) { process.stdout.write(chunk.choices[0].delta.content); }}