Model Configuration

Prev Next

# Global Model Management (Self-Managed Kindo)

This page explains how self-managed Kindo installations define and maintain global models. Global models are the system-wide LLMs (and related models like embeddings or transcription) that power workflows, chat, ingestion, and other core experiences. In a self-managed environment, you add these models to Kindo, map them to Unleash feature variants, and keep those mappings up to date when models are replaced. This guide walks through the required feature flag keys, when they are needed, and the steps to add or delete models safely.

Unleash Features and Strategy Variants

Each key below is an Unleash feature whose variant payload contains one or more global model IDs. The backend and frontend read these variants to decide which models to use. Marked categories are required, optional, or conditional depending on your deployment configuration.

  • API_STEP_GENERATION (optional)
    • Description: Models used for generating API steps in workflows.
    • Example models (Kindo SaaS): Claude 3.7 Sonnet, O4 Mini.
  • INTERNAL_AUTO_GENERATION (required)
    • Description: Internal model for automatic content generation.
    • Example models (Kindo SaaS): Internal Auto Generation.
  • DYNAMIC_API_REQUEST_PARSER (optional)
    • Description: Parses and processes dynamic API requests.
    • Example models (Kindo SaaS): Internal Dynamic API Parser.
  • INTERNAL_LARGE_WORKER (required)
    • Description: High-capacity internal worker for complex tasks.
    • Example models (Kindo SaaS): Internal Large Worker.
  • INTERNAL_SMALL_WORKER (required)
    • Description: Lightweight internal worker for simple tasks.
    • Example models (Kindo SaaS): Internal Small Worker.
  • INGESTION_WORKERS (required)
    • Description: Models for data ingestion and extraction.
    • Example models (Kindo SaaS): Llama 3.1 8B (primary), Internal Ingestion Extraction (fallback).
  • AUDIO_TRANSCRIPTION (required)
    • Description: Converts audio to text.
    • Example models (Kindo SaaS): Internal Transcription.
  • DEFAULT_WORKFLOW_STEP_MODEL (required)
    • Description: Default model for workflow step execution.
    • Example models (Kindo SaaS): Claude Sonnet 4.
  • SLACK_MESSAGE_GENERATION (optional)
    • Description: Generates Slack messages and responses.
    • Notes: Only required if Slack integration is enabled (feature-flagged).
    • Example models (Kindo SaaS): Claude Sonnet 4.
  • LOCAL_STORAGE_CHAT_MODELS (required)
    • Description: Default chat model list for the chat model dropdown.
    • Example models (Kindo SaaS): Claude Sonnet 4.
  • AGENT_MODEL (optional)
    • Description: Models for agent-based interactions (ReAct-Agent).
    • Example models (Kindo SaaS): GPT-4o (primary), Claude Sonnet 4 (fallback).
  • EMBEDDING_MODELS (required)
    • Description: Models for generating text embeddings.
    • Example models (Kindo SaaS): Embedding Model.
  • STREAMING_UNSUPPORTED_MODELS (optional)
    • Description: Models that do not support streaming responses.
    • Notes: Used for conditional UI rendering of loading states.
    • Example models (Kindo SaaS): O1, O3, O4 Mini.
  • GENERAL_PURPOSE_MODELS (required)
    • Description: Versatile models for model selection categories and the “Recommend Model” feature.
    • Example models (Kindo SaaS): Claude 4 Sonnet, Claude 4 Opus, Claude Sonnet 4, Claude 3.7 Sonnet, GPT-4o, GPT 4.1, Llama 3.3 70B.
  • LONG_CONTEXT_MODELS (required)
    • Description: Models optimized for long context windows (also used for selection categories).
    • Example models (Kindo SaaS): Gemini 2.5 Flash, Gemini 2.5 Pro, GPT 4.1 Mini.
  • REASONING_MODELS (conditional)
    • Description: Models specialized in complex reasoning.
    • Notes: Required only if OpenAI o-series models (o1, o3, o4, etc.) exist in the system.
    • Example models (Kindo SaaS): DeepSeek R1, O1, O3, O4 Mini.
  • CYBERSECURITY_MODELS (conditional)
    • Description: Models specialized for cybersecurity analysis.
    • Notes: Required only if DeepHat is available in the system.
    • Example models (Kindo SaaS): DeepHat.
  • GEMINI_CHAT_MODELS (optional)
    • Description: Google Gemini models for chat interactions.
    • Notes: Only needed if multimodal chat is enabled; not used in self-managed Kindo today.
    • Example models (Kindo SaaS): Gemini 2.5 Flash, Gemini 2.5 Pro.

Model Management

Minimum Model Requirements

Self-managed Kindo needs a few baseline global models to function correctly. These models cover foundational capabilities that the platform expects to be available regardless of optional features. Without them, core flows like chat, workflow execution, ingestion, or indexing will fail.

Absolute Minimum Model Set

  • Embedding model
    • Why: Required for semantic search, retrieval, and indexing across the platform.
    • Current Kindo SaaS: BAAI/bge-m3 (hardware needs: 1xH200).
    • Alternative: Choose a top-performing embedding model from the MTEB leaderboard (currently Gemini embedding). https://huggingface.co/spaces/mteb/leaderboard
  • Strong/large model
    • Why: Handles complex reasoning, workflow execution, and long-form generation.
    • Open source recommendation: gpt-oss 120B (hardware needs: 1xH200).
    • Hosted recommendations: Gemini 2.5 Pro, Claude 4, GPT5.
  • Audio transcription
    • Why: Required for any audio inputs (file uploads, voice notes, or transcription-enabled workflows).
    • Open source recommendation: https://speaches.ai/ (Faster Whisper) or Whisper (evaluate against hardware needs).
    • Hosted recommendation: Deepgram.

Highly Recommended Additional Models

  • Small model (simple tasks like title generation, quick summaries, or low-latency steps)
    • Why: Offloads lightweight tasks from large models to reduce cost and latency.
    • Open source recommendation: Gemma 3 or Llama 3.2 7B.

Other Recommended Models

  • Security-focused: DeepHat 32B.
  • Multimodal: Gemini 2.5 Pro (image + text).

Global Model Management Endpoints

  • Production:
    • Production API endpoint: https://api.your_kindo_domain

Overview of Steps

  1. Add the new model.
  2. Update Unleash variants to reference the new model IDs.
  3. Delete the old model with replacement.
    1. If the model you are deleting is still referenced by Unleash, the response will list the feature flags that need updates. Update those variants first, then retry the delete-with-replacement call.

Add a Global Model

🚨 Note: Ensure the modelProviderDisplayName value matches the display name in the ModelProvider table. Otherwise, a new ModelProvider record will be created. Also ensure LiteLLM params are correct for the specific model you are adding (defaults vary by provider and can change behavior, such as max output tokens).

curl -X POST <API_URL>/internal/openapi/admin/model/new \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
  -d '{
    "orgId":"<YOUR_ORG_ID>",
    "userId":"<YOUR_USER_ID>",
    "displayName": "Claude 3.5 Haiku Test",
    "modelProviderDisplayName": "Anthropic",
    "type": "CHAT",
    "contextWindow": 32768,
    "metadata": {"link": "https://www.anthropic.com/news/3-5-models-and-computer-use", "type": "Text Generation", "costTier": "MEDIUM", "usageTag": "Chat + Agents", "description": "Fast model optimized for quick responses while maintaining high quality output", "modelCreator": "Anthropic"},
    "litellmModelName": "claude-3-5-haiku-test",
    "litellmParams": {
      "model": "anthropic/claude-3-5-haiku-latest",
      "api_key": "<ANTHROPIC_API_KEY>",
      "max_output_tokens": 64000
    },
    "model_info": {
      "base_model": "claude-3-5-haiku-20241022"
    }
  }'

Delete with Replacement

curl -X POST <API_URL>/internal/openapi/admin/model/delete-with-replacement \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
  -d '{
    "deletingModelId": "ff5afc3b-2a2d-4d78-8bae-61f0217a38e5",
    "replacementModelId": "b4c5d6e7-f8a9-0123-4567-890123abcdef",
    "orgId": "<YOUR_ORG_ID>",
    "userId": "<YOUR_USER_ID>"
  }'

Add Model Examples

Claude 3.5 Haiku

curl -X POST <API_URL>/internal/openapi/admin/model/new \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
  -d '{
    "orgId":"<YOUR_ORG_ID>",
    "userId":"<YOUR_USER_ID>",
    "displayName": "Claude 3.5 Haiku",
    "modelProviderDisplayName": "Anthropic",
    "type": "CHAT",
    "contextWindow": 32768,
    "metadata": {"link": "https://www.anthropic.com/news/3-5-models-and-computer-use", "type": "Text Generation", "costTier": "MEDIUM", "usageTag": "Chat + Agents", "description": "Fast model optimized for quick responses while maintaining high quality output", "modelCreator": "Anthropic"},
    "litellmModelName": "claude-3-5-haiku",
    "litellmParams": {
      "model": "anthropic/claude-3-5-haiku-latest",
      "api_key": "<ANTHROPIC_API_KEY>"
    },
    "model_info": {
      "base_model": "claude-3-5-haiku-20241022"
    }
  }'

Claude Sonnet 4

curl -X POST <API_URL>/internal/openapi/admin/model/new \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
  -d '{
    "orgId":"<YOUR_ORG_ID>",
    "userId":"<YOUR_USER_ID>",
    "displayName": "Claude Sonnet 4",
    "modelProviderDisplayName": "Anthropic",
    "type": "CHAT",
    "contextWindow": 200000,
    "metadata": {"link": "https://www.anthropic.com/news/claude-4", "type": "Text Generation", "costTier": "HIGH", "usageTag": "Chat + Agents", "description": "Claude Sonnet 4 is a powerful model with sustained performance on complex, long-running tasks and agent workflows.", "modelCreator": "Anthropic"},
    "litellmModelName": "claude-sonnet-4-20250514",
    "litellmParams": {
      "model": "anthropic/claude-sonnet-4-20250514",
      "api_key": "<ANTHROPIC_API_KEY>",
      "max_output_tokens": 64000
    }
  }'

Transcription Generator

curl -X POST <API_URL>/internal/openapi/admin/model/new \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
  -d '{
    "orgId":"<YOUR_ORG_ID>",
    "userId":"<YOUR_USER_ID>",
    "displayName": "Transcription Generator",
    "modelProviderDisplayName": "Deepgram",
    "type": "INTERNAL",
    "litellmModelName": "transcription-generator",
    "litellmParams": {
      "api_base": "https://api.deepgram.com/v1/",
      "model": "deepgram/nova-2",
      "api_key": "<DEEPGRAM_API_KEY>"
    }
  }'

Add Parameters to an Existing Model (LiteLLM DB)

UPDATE
  "public"."LiteLLM_ProxyModelTable"
SET
  "litellm_params" = jsonb_set(
    "litellm_params",
    '{max_tokens}',
    '64000'::jsonb
  )
WHERE
  "model_name" = 'claude-haiku-4-5-20251001';

Regional quota note: When adding models to dev/prod, be mindful of which region you are using in order to use quota efficiently. Ideally, separate dev and prod so local tests do not cause outages due to rate limits.