Global Model Management (Self-Managed Kindo)
This page explains how self-managed Kindo installations define and maintain global models. Global models are the system-wide LLMs (and related models like embeddings or transcription) that power workflows, chat, ingestion, and other core experiences. In a self-managed environment, you add these models to Kindo, map them to Unleash feature variants, and keep those mappings up to date when models are replaced. This guide walks through the required feature flag keys, when they are needed, and the steps to add or delete models safely.
Unleash Features and Strategy Variants
Each key below is an Unleash feature whose variant payload contains one or more global model IDs. The backend and frontend read these variants to decide which models to use. Marked categories are required, optional, or conditional depending on your deployment configuration.
- API_STEP_GENERATION (optional)
- Description: Models used for generating API steps in workflows.
- Example models (Kindo SaaS): Claude 3.7 Sonnet, GPT-5.
- AUDIO_TRANSCRIPTION (required)
- Description: Converts audio to text.
- Example models (Kindo SaaS): Deepgram Nova 2.
- CRON_EXPRESSION_GENERATION (required)
- Description: Models used for generating cron expressions from natural language schedule descriptions in workflow triggers.
- Example models (Kindo SaaS): Claude 3.7 Sonnet, GPT-5.
- DEFAULT_WORKFLOW_STEP_MODEL (required)
- Description: Default model for workflow step execution.
- Example models (Kindo SaaS): Claude Sonnet 4.5.
- DYNAMIC_API_REQUEST_PARSER (optional)
- Description: Parses and processes dynamic API requests.
- Example models (Kindo SaaS): Llama 4 Maverick (Vertex AI).
- EMBEDDING_MODELS (required)
- Description: Models for generating text embeddings.
- Example models (Kindo SaaS): BGE-M3.
- INGESTION_WORKERS (required)
- Description: Models for data ingestion and extraction.
- Example models (Kindo SaaS): Llama 4 Maverick (Vertex AI).
- INTERNAL_AUTO_GENERATION (required)
- Description: Internal model for automatic content generation.
- Example models (Kindo SaaS): Llama 4 Maverick (Vertex AI).
- INTERNAL_LARGE_WORKER (required)
- Description: High-capacity internal worker for complex tasks.
- Example models (Kindo SaaS): Llama 4 Maverick (Vertex AI).
- INTERNAL_SMALL_WORKER (required)
- Description: Lightweight internal worker for simple tasks.
- Example models (Kindo SaaS): Llama 4 Maverick (Vertex AI).
- LOCAL_STORAGE_CHAT_ACTIONS_MODELS (required)
- Description: Fallback model used when the user's currently selected chat model (saved in local storage) becomes unavailable (e.g., model disabled, access revoked). It's a single fallback model.
- Example models (Kindo SaaS): Claude Sonnet 4.5.
- TOOL_CALLING_MODELS (required)
- Description: Models that support tool/function calling, used in the model selection dropdown for chat actions.
- Example models (Kindo SaaS): GPT-5.2, Gemini 3 Pro Preview, Claude Opus 4.5, etc.
Model Management
Minimum Model Requirements
Self-managed Kindo needs a few baseline global models to function correctly. These models cover foundational capabilities that the platform expects to be available regardless of optional features. Without them, core flows like chat, workflow execution, ingestion, or indexing will fail.
Absolute Minimum Model Set
- Embedding model
- Why: Required for semantic search, retrieval, and indexing across the platform.
- Current Kindo SaaS: BAAI/bge-m3 (hardware needs: 1xH200).
- Alternative: Choose a top-performing embedding model from the MTEB leaderboard (currently Gemini embedding). https://huggingface.co/spaces/mteb/leaderboard
- Strong/large model
- Why: Handles complex reasoning, workflow execution, and long-form generation.
- Open source recommendation: gpt-oss 120B (hardware needs: 1xH200).
- Hosted recommendations: Gemini 2.5 Pro, Claude 4.5, GPT 5.2
- Audio transcription
- Why: Required for any audio inputs (file uploads, voice notes, or transcription-enabled workflows).
- Open source recommendation: https://speaches.ai/ (Faster Whisper) or Whisper (evaluate against hardware needs).
- Hosted recommendation: Deepgram.
Highly Recommended Additional Models
- Small model (simple tasks like title generation, quick summaries, or low-latency steps)
- Why: Offloads lightweight tasks from large models to reduce cost and latency.
- Open source recommendation: Gemma 3 or Llama 3.2 7B.
Other Recommended Models
- Security-focused: DeepHat 32B.
- Multimodal: Gemini 2.5 Pro (image + text).
Global Model Management Endpoints
- Production:
- Production API endpoint: https://api.your_kindo_domain
Overview of Steps
- Add the new model.
- Update Unleash variants to reference the new model IDs.
- Delete the old model with replacement.
- If the model you are deleting is still referenced by Unleash, the response will list the feature flags that need updates. Update those variants first, then retry the delete-with-replacement call.
Add a Global Model
🚨 Note: Ensure the modelProviderDisplayName value matches the display name in the ModelProvider table. Otherwise, a new ModelProvider record will be created. Also ensure LiteLLM params are correct for the specific model you are adding (defaults vary by provider and can change behavior, such as max output tokens).
curl -X POST <API_URL>/internal/openapi/admin/model/new \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
-d '{
"orgId":"<YOUR_ORG_ID>",
"userId":"<YOUR_USER_ID>",
"displayName": "Claude 3.5 Haiku Test",
"modelProviderDisplayName": "Anthropic",
"type": "CHAT",
"contextWindow": 32768,
"metadata": {"link": "https://www.anthropic.com/news/3-5-models-and-computer-use", "type": "Text Generation", "costTier": "MEDIUM", "usageTag": "Chat + Agents", "description": "Fast model optimized for quick responses while maintaining high quality output", "modelCreator": "Anthropic"},
"litellmModelName": "claude-3-5-haiku-test",
"litellmParams": {
"model": "anthropic/claude-3-5-haiku-latest",
"api_key": "<ANTHROPIC_API_KEY>",
"max_output_tokens": 64000
},
"model_info": {
"base_model": "claude-3-5-haiku-20241022"
}
}'
Delete with Replacement
curl -X POST <API_URL>/internal/openapi/admin/model/delete-with-replacement \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
-d '{
"deletingModelId": "ff5afc3b-2a2d-4d78-8bae-61f0217a38e5",
"replacementModelId": "b4c5d6e7-f8a9-0123-4567-890123abcdef",
"orgId": "<YOUR_ORG_ID>",
"userId": "<YOUR_USER_ID>"
}'
Add Model Examples
Claude 3.5 Haiku
curl -X POST <API_URL>/internal/openapi/admin/model/new \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
-d '{
"orgId":"<YOUR_ORG_ID>",
"userId":"<YOUR_USER_ID>",
"displayName": "Claude 3.5 Haiku",
"modelProviderDisplayName": "Anthropic",
"type": "CHAT",
"contextWindow": 32768,
"metadata": {"link": "https://www.anthropic.com/news/3-5-models-and-computer-use", "type": "Text Generation", "costTier": "MEDIUM", "usageTag": "Chat + Agents", "description": "Fast model optimized for quick responses while maintaining high quality output", "modelCreator": "Anthropic"},
"litellmModelName": "claude-3-5-haiku",
"litellmParams": {
"model": "anthropic/claude-3-5-haiku-latest",
"api_key": "<ANTHROPIC_API_KEY>"
},
"model_info": {
"base_model": "claude-3-5-haiku-20241022"
}
}'
Claude Sonnet 4
curl -X POST <API_URL>/internal/openapi/admin/model/new \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
-d '{
"orgId":"<YOUR_ORG_ID>",
"userId":"<YOUR_USER_ID>",
"displayName": "Claude Sonnet 4",
"modelProviderDisplayName": "Anthropic",
"type": "CHAT",
"contextWindow": 200000,
"metadata": {"link": "https://www.anthropic.com/news/claude-4", "type": "Text Generation", "costTier": "HIGH", "usageTag": "Chat + Agents", "description": "Claude Sonnet 4 is a powerful model with sustained performance on complex, long-running tasks and agent workflows.", "modelCreator": "Anthropic"},
"litellmModelName": "claude-sonnet-4-20250514",
"litellmParams": {
"model": "anthropic/claude-sonnet-4-20250514",
"api_key": "<ANTHROPIC_API_KEY>",
"max_output_tokens": 64000
}
}'
Transcription Generator
curl -X POST <API_URL>/internal/openapi/admin/model/new \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <UM_INTERNAL_API_KEY>' \
-d '{
"orgId":"<YOUR_ORG_ID>",
"userId":"<YOUR_USER_ID>",
"displayName": "Transcription Generator",
"modelProviderDisplayName": "Deepgram",
"type": "INTERNAL",
"litellmModelName": "transcription-generator",
"litellmParams": {
"api_base": "https://api.deepgram.com/v1/",
"model": "deepgram/nova-2",
"api_key": "<DEEPGRAM_API_KEY>"
}
}'
Add Parameters to an Existing Model (LiteLLM DB)
UPDATE
"public"."LiteLLM_ProxyModelTable"
SET
"litellm_params" = jsonb_set(
"litellm_params",
'{max_tokens}',
'64000'::jsonb
)
WHERE
"model_name" = 'claude-haiku-4-5-20251001';
Regional quota note: When adding models to dev/prod, be mindful of which region you are using in order to use quota efficiently. Ideally, separate dev and prod so local tests do not cause outages due to rate limits.