Skip to content

Chat Completions API

The Kindo Chat Completions API provides an OpenAI-compatible /v1/chat/completions endpoint on api.kindo.ai. Use it when you want to migrate existing OpenAI SDK code with minimal changes while routing through Kindo’s governance pipeline.

With the Chat Completions API, requests use OpenAI’s native request and response shapes, while Kindo adds API-key auth, model access control, audit logging, DLP enforcement, and credit metering.

Why use the Chat Completions API

  • OpenAI-compatible — Works with the OpenAI Python SDK, the OpenAI TypeScript SDK, and raw HTTP clients.
  • Governed by Kindo — Requests pass through Kindo auth, model access checks, DLP, and usage metering.
  • Supports streaming — Standard Server-Sent Events work with Kindo’s proxy.
  • Supports tool use — OpenAI function-calling tools pass through verbatim.
  • Uses your Kindo model registry — Use the same model IDs returned by GET /v1/models.

Base URL and authentication

Use the api.kindo.ai domain for the Chat Completions API:

EndpointMethodPurpose
https://api.kindo.ai/v1/chat/completionsPOSTCreate a chat completion

For self-hosted installations, replace api.kindo.ai with your deployment’s API base URL.

Authentication

Both auth formats work:

HeaderExampleNotes
Authorization: BearerAuthorization: Bearer YOUR_API_KEYPreferred for raw HTTP clients
x-api-keyx-api-key: YOUR_API_KEYCommon for Anthropic-style clients

When both headers are present, Authorization: Bearer takes precedence. If the Authorization header is present but malformed, the request is rejected instead of falling back to x-api-key.

POST /v1/chat/completions

Create a chat completion using OpenAI’s native Chat Completions format.

Terminal window
curl https://api.kindo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": "Explain Kubernetes pod security policies in plain English."
}
]
}'

Common request fields

Kindo validates the core OpenAI fields and passes through the rest, which keeps the endpoint forward-compatible with OpenAI request bodies.

FieldRequiredNotes
modelYesMust match a model ID available through GET /v1/models
messagesYesArray of messages comprising the conversation. Must contain at least one entry.
streamNoSet true for Server-Sent Events. Default: false
temperatureNoSampling temperature
max_tokensNoMaximum tokens to generate
top_pNoNucleus sampling parameter
frequency_penaltyNoFrequency penalty parameter
presence_penaltyNoPresence penalty parameter
stopNoStop sequence(s). String or array of strings
toolsNoOpenAI tools array. Forwarded verbatim to the upstream provider
tool_choiceNoControls tool use. String or object with shape { type: "function", function: { name: string } }
response_formatNoResponse format object with a type field
userNoEnd-user identifier

Extra fields not listed above are forwarded verbatim to LiteLLM because the schema uses passthrough(). See OpenAI’s API reference for full field semantics.

Kindo strips metadata, litellm_metadata, and proxy_server_request from the request body before proxying upstream so clients cannot spoof governance metadata.

Messages array

Each message in the messages array has the following shape:

FieldRequiredNotes
roleYesOne of system, developer, user, assistant, tool
contentNoString, array of content blocks, or null (e.g. assistant messages with only tool_calls). Arrays are forwarded verbatim and support OpenAI multimodal blocks
nameNoName of the author of this message
tool_callsNoTool calls generated by the model, present on assistant messages
tool_call_idNoID of the tool call this message is a response to, required on role: "tool" messages

Example response

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "claude-sonnet-4-5-20250929",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Pod Security Policies were Kubernetes rules that controlled what a pod was allowed to do..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 134,
"total_tokens": 158
}
}

Response fields

FieldTypeDescription
idstringUnique completion ID
objectstringAlways "chat.completion"
creatednumberUnix timestamp
modelstringModel used for generation
choicesarrayOrdered list of completion choices. Each item has index, message, and finish_reason
usageobject (optional)Token usage statistics: prompt_tokens, completion_tokens, total_tokens. Omitted from intermediate streaming chunks.

Additional fields (for example system_fingerprint, service_tier, logprobs, and refusal) may be present and are forwarded verbatim from the upstream provider.

Streaming

Set "stream": true on POST /v1/chat/completions to receive a standard OpenAI Server-Sent Events stream.

Terminal window
curl -N https://api.kindo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"stream": true,
"messages": [
{ "role": "user", "content": "Write a haiku about observability." }
]
}'

Kindo returns Content-Type: text/event-stream and preserves OpenAI’s SSE structure. Each event is a JSON payload prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}

The stream ends with:

data: [DONE]

If the upstream request fails before streaming begins, Kindo returns a normal HTTP error response with an OpenAI-compatible JSON error envelope instead of switching to SSE.

Tool use

OpenAI-format tool definitions pass through to the upstream provider. Standard tools and tool_choice fields work without any Kindo-specific wrapping.

{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{ "role": "user", "content": "What is the weather in San Francisco?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}

Typical flow:

  1. Send a request with tools.
  2. Receive an assistant message containing tool_calls.
  3. Execute the tool client-side.
  4. Send the tool result back in a follow-up message with role: "tool", tool_call_id, and the result as content.

Supported models

The Chat Completions API uses the same model registry as the Responses API. Call GET /v1/models to enumerate the models available to your organization.

If your organization does not have access to the requested model, the API returns 403 with type: "permission_error" and code: "model_access_denied".

Error format

Errors emitted by the Kindo handler follow the OpenAI-compatible envelope:

{
"error": {
"message": "Model 'unknown-model' not found.",
"type": "invalid_request_error",
"code": "model_not_found"
}
}

Kindo maps common HTTP statuses to OpenAI-style error types:

StatusTypeCode
400invalid_request_errorinvalid_body or model_not_found
403permission_errormodel_access_denied
429rate_limit_error
502server_errorupstream_empty_body
500server_errorinternal_error

Two exceptions:

  1. 401 Unauthorized — responses from API-key authentication use a plain-string envelope instead of the OpenAI shape:

    { "error": "Unauthorized" }

    There is no error.type or error.code field. This is verified by the handler test at routes/chat/postChatCompletions.test.ts:89.

  2. 4xx/5xx responses originating from upstream LiteLLM are forwarded verbatim with the upstream body and content-type. The shape (other than the Kindo-emitted upstream_empty_body code) depends on the upstream provider.

Mid-stream failures do not produce an HTTP 500. Once the SSE response has started, the outer HTTP status is already committed at 200. Stream errors arrive as an inline event: error SSE frame followed by data: [DONE]:

event: error
data: {"error":{"message":"...","type":"server_error","code":"stream_error"}}
data: [DONE]

SDK consumers should handle event: error payloads with code: "stream_error" separately from HTTP-level error envelopes.

Differences from /v1/responses

TopicChat Completions APIResponses API
Request shapemessages arrayinput string + instructions
Conversation stateCaller-managedKindo-managed via conversation ID
Tool modelOpenAI function-calling tools forwarded verbatimKindo MCP integrations + OpenAI-style function tools (Responses-format)
Outputchoices[].messageTyped output items + mcp_call / function_call

When to pick which:

  • Chat Completions — Migrating from OpenAI or llm.kindo.ai and you want minimal code change.
  • Responses — Native MCP and OpenAI-style function tool integrations, plus managed conversations.

Migration from llm.kindo.ai

If you are currently calling https://llm.kindo.ai/v1/chat/completions, migrate to the unified api.kindo.ai surface:

OldNew
URLhttps://llm.kindo.ai/v1/chat/completionshttps://api.kindo.ai/v1/chat/completions
Auth headerAuthorization: Bearer or api-keyAuthorization: Bearer or x-api-key
Request/response shapeUnchangedUnchanged

The Authorization: Bearer form works on both surfaces and requires no change. If you used the bare api-key: header on llm.kindo.ai, rename it to x-api-key: for api.kindo.ai (see API overview for the full auth header matrix).

For the OpenAI SDK, change one line:

# Old
client = OpenAI(base_url="https://llm.kindo.ai", api_key="YOUR_API_KEY")
# New
client = OpenAI(base_url="https://api.kindo.ai/v1", api_key="YOUR_API_KEY")

The underlying governance pipeline is the same (it was always LiteLLM). The api.kindo.ai path adds the unified /v1 surface alongside /v1/messages and /v1/responses.

Client examples

Use whichever OpenAI client fits your workflow.

Non-streaming:

Terminal window
curl https://api.kindo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": "Give three practical code review tips."
}
]
}'

Streaming:

Terminal window
curl -N https://api.kindo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"stream": true,
"messages": [
{
"role": "user",
"content": "Give three practical code review tips."
}
]
}'

See also