Chat Completions API streaming

Set "stream": true on POST /v1/chat/completions to receive a standard OpenAI Server-Sent Events stream. Your existing parser works unchanged.

Request

curl -N https://api.kindo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KINDO_API_KEY" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku about observability." }
    ]
  }'

Kindo returns Content-Type: text/event-stream and emits OpenAI’s chunk format. Each event is a JSON payload prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":""}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Logs flow through"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":" the gate;"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

The terminal data: [DONE] line is the canonical end-of-stream marker.

Event types

The stream emits unnamed data: lines. Each line is either an incremental chunk or the terminal marker:

Event	Meaning
`data:`	Incremental `chat.completion.chunk` JSON payload.
`[DONE]`	End-of-stream marker.

Example stream

Each data: JSON object follows OpenAI’s chat.completion.chunk schema:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1710000000,
  "model": "claude-sonnet-4-5-20250929",
  "choices": [
    {
      "index": 0,
      "delta": { "content": " the gate;" },
      "finish_reason": null
    }
  ]
}

usage is not included in intermediate chunks. In the OpenAI spec, stream_options.include_usage requests usage data in the final chunk; Kindo currently does not populate this field. If you need usage stats, make a non-streaming request or rely on Kindo’s metering.

SDK consumption

import os
from openai import OpenAI

client = OpenAI(base_url="https://api.kindo.ai/v1", api_key=os.environ["KINDO_API_KEY"])

stream = client.chat.completions.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.kindo.ai/v1',
  apiKey: process.env.KINDO_API_KEY
});

const stream = await client.chat.completions.create({
  model: 'claude-sonnet-4-5-20250929',
  messages: [{ role: 'user', content: 'Write a haiku.' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? '');
}

Pre-stream errors

If the upstream request fails before streaming begins, Kindo returns a normal HTTP error response with the OpenAI-compatible JSON envelope — no SSE switch. See Errors.

Mid-stream errors

Once SSE has started, the outer HTTP status is committed at 200. Errors that arise after that point arrive as an inline event: error SSE frame followed by data: [DONE]:

event: error
data: {"error":{"message":"upstream stream interrupted","type":"server_error","code":"stream_error"}}

data: [DONE]

Treat any event: error payload with code: "stream_error" as terminal — the response will not resume.