4: Deploying Kindo

Prev

Applications Deployment Guide

This guide provides detailed instructions for deploying Kindo application services using the kindo-applications module as a separate Terraform stack.

Table of Contents

  1. Overview

  2. Pre-Deployment Requirements

  3. Directory Setup

  4. Configuration Setup

  5. Application Configuration

  6. Deployment Process

  7. Post-Deployment Verification

  8. Troubleshooting

Overview

The kindo-applications module deploys the core Kindo services as a separate Terraform stack:

  • API Service: Backend REST API (Node.js)

  • Next.js Frontend: Web application UI

  • LiteLLM: AI model proxy and router

  • Llama Indexer: Document indexing service

  • External Poller: Background job processor

  • External Sync: Data synchronization service

  • Credits Service: Usage tracking and billing

  • Audit Log Exporter: Compliance and logging

  • Cerbos: Authorization policy engine

Each application: - Runs in its own Kubernetes namespace - Uses External Secrets for configuration - Includes health checks and monitoring - Supports horizontal scaling

Pre-Deployment Requirements

Required Infrastructure

Before deploying applications, ensure you have:

  1. ✅ Base stack deployed (infrastructure + secrets + peripheries)

    • EKS cluster must be running

    • External Secrets Operator must be configured

    • ALB Ingress Controller must be deployed

    • Secrets must exist in AWS Secrets Manager

  2. ✅ DNS properly configured

    • Base domain delegation completed

    • Wildcard certificate created

  3. ✅ Access to infrastructure outputs

    • Note the cluster name, region, and other outputs from base stack

Verify Prerequisites

# Update kubeconfig
aws eks update-kubeconfig --name <cluster-name> --region <region> --profile <profile>

# Check External Secrets Operator
kubectl get clustersecretstore
# Should show: aws-secrets-manager Ready

# Check secrets in AWS
aws secretsmanager list-secrets --query "SecretList[?contains(Name, 'kindo-prod')].[Name]" --output table --profile <profile>

# Check ingress controller
kubectl get ingressclass
# Should show: alb

# Test Unleash connectivity
curl -s https://unleash.yourdomain.com/api/client/features | jq .

Directory Setup

Create a separate directory for the applications deployment:

# Create applications deployment directory
mkdir -p my-kindo-deployment/kindo-applications
cd my-kindo-deployment/kindo-applications

# Copy example values files (adjust path as needed)
cp -r ../../application-values ./values

# Directory structure should look like:
# kindo-applications/
# ├── main.tf
# ├── provider.tf
# ├── variables.tf
# ├── outputs.tf
# ├── terraform.tfvars
# ├── registry_secrets.tf
# └── values/
#     ├── api.yaml
#     ├── next.yaml
#     ├── litellm.yaml
#     └── ...

Configuration Setup

1. Create main.tf

terraform {
 required_version = ">= 1.11.0"
 
 required_providers {
   aws = {
     source  = "hashicorp/aws"
     version = ">= 5.0"
   }
   helm = {
     source  = "hashicorp/helm"
     version = "2.17.0"
   }
   kubectl = {
     source  = "gavinbunney/kubectl"
     version = ">= 1.14.0"
   }
   time = {
     source  = "hashicorp/time"
     version = ">= 0.9.0"
   }
 }

 # Configure your state backend - Optional
 # For production, uncomment and configure the S3 backend:
 # backend "s3" {
 #   bucket = "my-terraform-state-bucket"
 #   key    = "kindo/applications/terraform.tfstate"
 #   region = "us-east-1"
 #  
 #   dynamodb_table = "terraform-state-lock"
 #   encrypt        = true
 # }
}

locals {
 # Core application settings
 project     = var.project_name
 environment = var.environment_name
 domain_name = var.domain_name

 # Secret naming pattern (for External Secrets Operator)
 secret_pattern = "%s-%s/%s-app-config"
}

# Deploy Kindo Applications
module "kindo_applications" {
 source = "../../modules/kindo-applications"  # Adjust path as needed

 # Don't wait for resources to be healthy
 helm_wait = false
 helm_atomic = false

 # Helm registry credentials (from shared.tfvars)
 registry_url      = var.registry_url
 registry_username = var.registry_username
 registry_password = var.registry_password

 # Application configurations
 applications_config = {
   # --- API Service --- #
   api = {
     install            = var.enable_api
     helm_chart_version = var.api_chart_version
     namespace          = "api"  # Use dedicated namespace for each application
     create_namespace   = true   # Create namespace automatically
     values_content     = var.api_values_content != "" ? var.api_values_content : templatefile("${path.module}/values/api.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.api_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.api_replica_count)
     }, var.api_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "api")
     }, var.api_sensitive_helm_sets)
   },
   
   # --- Next.js Frontend --- #
   next = {
     install            = var.enable_next
     helm_chart_version = var.next_chart_version
     namespace          = "next"  # Use dedicated namespace
     create_namespace   = true   # Create namespace automatically
     values_content     = var.next_values_content != "" ? var.next_values_content : templatefile("${path.module}/values/next.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.next_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.next_replica_count)
     }, var.next_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "next")
     }, var.next_sensitive_helm_sets)
   },
   
   # --- LiteLLM --- #
   litellm = {
     install            = var.enable_litellm
     helm_chart_version = var.litellm_chart_version
     namespace          = "litellm"
     create_namespace   = true   # Create namespace automatically
     values_content     = var.litellm_values_content != "" ? var.litellm_values_content : templatefile("${path.module}/values/litellm.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.litellm_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.litellm_replica_count)
     }, var.litellm_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "litellm")
     }, var.litellm_sensitive_helm_sets)
   },
   
   # --- Llama Indexer --- #
   llama_indexer = {
     install            = var.enable_llama_indexer
     helm_chart_version = var.llama_indexer_chart_version
     namespace          = "llama-indexer"
     create_namespace   = true   # Create namespace automatically
     values_content     = var.llama_indexer_values_content != "" ? var.llama_indexer_values_content : templatefile("${path.module}/values/llama-indexer.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.llama_indexer_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.llama_indexer_replica_count)
     }, var.llama_indexer_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "llama-indexer")
     }, var.llama_indexer_sensitive_helm_sets)
   },
   
   # --- Credits --- #
   credits = {
     install            = var.enable_credits
     helm_chart_version = var.credits_chart_version
     namespace          = "credits"
     create_namespace   = true   # Create namespace automatically
     values_content     = var.credits_values_content != "" ? var.credits_values_content : templatefile("${path.module}/values/credits.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.credits_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.credits_replica_count)
     }, var.credits_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "credits")
     }, var.credits_sensitive_helm_sets)
   },
   
   # --- External Sync --- #
   external_sync = {
     install            = var.enable_external_sync
     helm_chart_version = var.external_sync_chart_version
     namespace          = "external-sync"
     create_namespace   = true   # Create namespace automatically
     values_content     = var.external_sync_values_content != "" ? var.external_sync_values_content : templatefile("${path.module}/values/external-sync.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.external_sync_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.external_sync_replica_count)
     }, var.external_sync_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "external-sync")
     }, var.external_sync_sensitive_helm_sets)
   },
   
   # --- External Poller --- #
   external_poller = {
     install            = var.enable_external_poller
     helm_chart_version = var.external_poller_chart_version
     namespace          = "external-poller"
     create_namespace   = true   # Create namespace automatically
     values_content     = var.external_poller_values_content != "" ? var.external_poller_values_content : templatefile("${path.module}/values/external-poller.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.external_poller_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.external_poller_replica_count)
     }, var.external_poller_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "external-poller")
     }, var.external_poller_sensitive_helm_sets)
   },
   
   # --- Audit Log Exporter --- #
   audit_log_exporter = {
     install            = var.enable_audit_log_exporter
     helm_chart_version = var.audit_log_exporter_chart_version
     namespace          = "audit-log-exporter"
     create_namespace   = true   # Create namespace automatically
     values_content     = var.audit_log_exporter_values_content != "" ? var.audit_log_exporter_values_content : templatefile("${path.module}/values/audit-log-exporter.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.audit_log_exporter_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.audit_log_exporter_replica_count)
     }, var.audit_log_exporter_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "audit-log-exporter")
     }, var.audit_log_exporter_sensitive_helm_sets)
   },
   
   # --- Cerbos --- #
   cerbos = {
     install            = var.enable_cerbos
     helm_chart_version = var.cerbos_chart_version
     namespace          = "cerbos"
     create_namespace   = true   # Create namespace automatically
     values_content     = var.cerbos_values_content != "" ? var.cerbos_values_content : templatefile("${path.module}/values/cerbos.yaml", {
       domain_name      = local.domain_name
       replica_count    = var.cerbos_replica_count
       environment_name = local.environment
       project_name     = local.project
     })
     dynamic_helm_sets  = merge({
       "replicaCount" = tostring(var.cerbos_replica_count)
     }, var.cerbos_helm_sets)
     sensitive_helm_sets = merge({
       "secretRef.name" = format(local.secret_pattern, local.project, local.environment, "cerbos")
     }, var.cerbos_sensitive_helm_sets)
   }
 }
}

2. Create provider.tf

# provider.tf - Provider configuration for applications deployment

provider "aws" {
 region  = var.region
 profile = var.aws_profile
}

# Configure Helm provider with EKS authentication
provider "helm" {
 kubernetes {
   host                   = data.aws_eks_cluster.cluster.endpoint
   cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)

   exec {
     api_version = "client.authentication.k8s.io/v1beta1"
     command     = "aws"
     args = [
       "eks",
       "get-token",
       "--cluster-name",
       var.cluster_name,
       "--region",
       var.region
     ]
     env = {
       AWS_PROFILE = var.aws_profile
     }
   }
 }
}

# Configure kubectl provider with EKS authentication
provider "kubectl" {
 host                   = data.aws_eks_cluster.cluster.endpoint
 cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
 load_config_file       = false

 exec {
   api_version = "client.authentication.k8s.io/v1beta1"
   command     = "aws"
   args = [
     "eks",
     "get-token",
     "--cluster-name",
     var.cluster_name,
     "--region",
     var.region
   ]
   env = {
     AWS_PROFILE = var.aws_profile
   }
 }
}

# Data sources to get EKS cluster details
data "aws_eks_cluster" "cluster" {
 name = var.cluster_name
}

data "aws_eks_cluster_auth" "cluster" {
 name = var.cluster_name
}

3. Create registry_secrets.tf

# Registry credentials secrets for each application namespace
# These secrets allow pulling images from the Kindo registry

locals {
 # Get application namespaces that need to be created
 application_namespaces = {
   "api"                = var.enable_api ? "api" : null
   "next"               = var.enable_next ? "next" : null
   "litellm"            = var.enable_litellm ? "litellm" : null
   "llama-indexer"      = var.enable_llama_indexer ? "llama-indexer" : null
   "credits"            = var.enable_credits ? "credits" : null
   "external-sync"      = var.enable_external_sync ? "external-sync" : null
   "external-poller"    = var.enable_external_poller ? "external-poller" : null
   "audit-log-exporter" = var.enable_audit_log_exporter ? "audit-log-exporter" : null
   "cerbos"             = var.enable_cerbos ? "cerbos" : null
 }
 
 # Filter out null values
 enabled_namespaces = {
   for name, ns in local.application_namespaces : name => ns
   if ns != null
 }
 
 # Create the docker config JSON once to reuse
 # Use a fixed registry domain for Docker authentication
 dockerconfig_json = jsonencode({
   auths = {
     "registry.kindo.ai" = {
       username = var.registry_username
       password = var.registry_password
       auth     = base64encode("${var.registry_username}:${var.registry_password}")
     }
   }
 })
}

# Add a small delay to ensure namespaces are fully propagated in Kubernetes
resource "time_sleep" "wait_for_namespaces" {
 depends_on = [module.kindo_applications]
 create_duration = "5s"
}

# Create registry credential secrets in each namespace
resource "kubectl_manifest" "registry_credentials" {
 for_each = local.enabled_namespaces
 
 yaml_body = <<YAML
apiVersion: v1
kind: Secret
metadata:
 name: registry-credentials
 namespace: ${each.value}
type: kubernetes.io/dockerconfigjson
data:
 .dockerconfigjson: ${base64encode(local.dockerconfig_json)}
YAML

 # Only destroy if not using an override, set to true to ignore this
 force_new = true
 server_side_apply = true
 
 # Make sure the namespaces are ready before creating secrets
 depends_on = [
   time_sleep.wait_for_namespaces
 ]
}

4. Create variables.tf

# variables.tf - Variable definitions for applications deployment

# --- Core Variables --- #
variable "project_name" {
 description = "Project name used for resource naming and tagging (set via shared.tfvars)"
 type        = string
}

variable "environment_name" {
 description = "Environment name (e.g., dev, staging, prod) (set via shared.tfvars)"
 type        = string
}

variable "domain_name" {
 description = "Base domain name for application endpoints"
 type        = string
}

# --- AWS Configuration --- #
variable "region" {
 description = "AWS region"
 type        = string
}

variable "aws_profile" {
 description = "AWS profile for authentication"
 type        = string
}

variable "cluster_name" {
 description = "EKS cluster name"
 type        = string
}

# --- Registry Configuration --- #
variable "registry_url" {
 description = "OCI registry URL for Helm charts (set via shared.tfvars)"
 type        = string
 default     = "oci://registry.kindo.ai/kindo-helm"
}

variable "registry_username" {
 description = "Username for the Helm OCI registry (SENSITIVE - set via shared.tfvars)"
 type        = string
 sensitive   = true
}

variable "registry_password" {
 description = "Password for the Helm OCI registry (SENSITIVE - set via shared.tfvars)"
 type        = string
 sensitive   = true
}

# --- API Service --- #
variable "enable_api" {
 description = "Whether to install the API service"
 type        = bool
 default     = true
}

variable "api_chart_version" {
 description = "Version of the API Helm chart"
 type        = string
}

variable "api_replica_count" {
 description = "Number of API service replicas"
 type        = number
 default     = 2
}

variable "api_values_content" {
 description = "Custom values content for API service"
 type        = string
 default     = ""
}

variable "api_helm_sets" {
 description = "Additional Helm values to set for API service"
 type        = map(string)
 default     = {}
}

variable "api_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for API service"
 type        = map(string)
 default     = {}
}

# --- Next.js Frontend --- #
variable "enable_next" {
 description = "Whether to install the Next.js frontend"
 type        = bool
 default     = true
}

variable "next_chart_version" {
 description = "Version of the Next.js Helm chart"
 type        = string
}

variable "next_replica_count" {
 description = "Number of Next.js frontend replicas"
 type        = number
 default     = 2
}

variable "next_values_content" {
 description = "Custom values content for Next.js frontend"
 type        = string
 default     = ""
}

variable "next_helm_sets" {
 description = "Additional Helm values to set for Next.js frontend"
 type        = map(string)
 default     = {}
}

variable "next_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for Next.js frontend"
 type        = map(string)
 default     = {}
}

# --- LiteLLM --- #
variable "enable_litellm" {
 description = "Whether to install the LiteLLM service"
 type        = bool
 default     = true
}

variable "litellm_chart_version" {
 description = "Version of the LiteLLM Helm chart"
 type        = string
}

variable "litellm_replica_count" {
 description = "Number of LiteLLM service replicas"
 type        = number
 default     = 2
}

variable "litellm_values_content" {
 description = "Custom values content for LiteLLM service"
 type        = string
 default     = ""
}

variable "litellm_helm_sets" {
 description = "Additional Helm values to set for LiteLLM service"
 type        = map(string)
 default     = {}
}

variable "litellm_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for LiteLLM service"
 type        = map(string)
 default     = {}
}

# --- Llama Indexer --- #
variable "enable_llama_indexer" {
 description = "Whether to install the Llama Indexer service"
 type        = bool
 default     = true
}

variable "llama_indexer_chart_version" {
 description = "Version of the Llama Indexer Helm chart"
 type        = string
}

variable "llama_indexer_replica_count" {
 description = "Number of Llama Indexer service replicas"
 type        = number
 default     = 1
}

variable "llama_indexer_values_content" {
 description = "Custom values content for Llama Indexer service"
 type        = string
 default     = ""
}

variable "llama_indexer_helm_sets" {
 description = "Additional Helm values to set for Llama Indexer service"
 type        = map(string)
 default     = {}
}

variable "llama_indexer_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for Llama Indexer service"
 type        = map(string)
 default     = {}
}

# --- Credits --- #
variable "enable_credits" {
 description = "Whether to install the Credits service"
 type        = bool
 default     = true
}

variable "credits_chart_version" {
 description = "Version of the Credits Helm chart"
 type        = string
}

variable "credits_replica_count" {
 description = "Number of Credits service replicas"
 type        = number
 default     = 1
}

variable "credits_values_content" {
 description = "Custom values content for Credits service"
 type        = string
 default     = ""
}

variable "credits_helm_sets" {
 description = "Additional Helm values to set for Credits service"
 type        = map(string)
 default     = {}
}

variable "credits_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for Credits service"
 type        = map(string)
 default     = {}
}

# --- External Sync --- #
variable "enable_external_sync" {
 description = "Whether to install the External Sync service"
 type        = bool
 default     = true
}

variable "external_sync_chart_version" {
 description = "Version of the External Sync Helm chart"
 type        = string
}

variable "external_sync_replica_count" {
 description = "Number of External Sync service replicas"
 type        = number
 default     = 1
}

variable "external_sync_values_content" {
 description = "Custom values content for External Sync service"
 type        = string
 default     = ""
}

variable "external_sync_helm_sets" {
 description = "Additional Helm values to set for External Sync service"
 type        = map(string)
 default     = {}
}

variable "external_sync_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for External Sync service"
 type        = map(string)
 default     = {}
}

# --- External Poller --- #
variable "enable_external_poller" {
 description = "Whether to install the External Poller service"
 type        = bool
 default     = true
}

variable "external_poller_chart_version" {
 description = "Version of the External Poller Helm chart"
 type        = string
}

variable "external_poller_replica_count" {
 description = "Number of External Poller service replicas"
 type        = number
 default     = 1
}

variable "external_poller_values_content" {
 description = "Custom values content for External Poller service"
 type        = string
 default     = ""
}

variable "external_poller_helm_sets" {
 description = "Additional Helm values to set for External Poller service"
 type        = map(string)
 default     = {}
}

variable "external_poller_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for External Poller service"
 type        = map(string)
 default     = {}
}

# --- Audit Log Exporter --- #
variable "enable_audit_log_exporter" {
 description = "Whether to install the Audit Log Exporter service"
 type        = bool
 default     = true
}

variable "audit_log_exporter_chart_version" {
 description = "Version of the Audit Log Exporter Helm chart"
 type        = string
}

variable "audit_log_exporter_replica_count" {
 description = "Number of Audit Log Exporter service replicas"
 type        = number
 default     = 1
}

variable "audit_log_exporter_values_content" {
 description = "Custom values content for Audit Log Exporter service"
 type        = string
 default     = ""
}

variable "audit_log_exporter_helm_sets" {
 description = "Additional Helm values to set for Audit Log Exporter service"
 type        = map(string)
 default     = {}
}

variable "audit_log_exporter_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for Audit Log Exporter service"
 type        = map(string)
 default     = {}
}

# --- Cerbos --- #
variable "enable_cerbos" {
 description = "Whether to install the Cerbos service"
 type        = bool
 default     = true
}

variable "cerbos_chart_version" {
 description = "Version of the Cerbos Helm chart"
 type        = string
}

variable "cerbos_replica_count" {
 description = "Number of Cerbos service replicas"
 type        = number
 default     = 2
}

variable "cerbos_values_content" {
 description = "Custom values content for Cerbos service"
 type        = string
 default     = ""
}

variable "cerbos_helm_sets" {
 description = "Additional Helm values to set for Cerbos service"
 type        = map(string)
 default     = {}
}

variable "cerbos_sensitive_helm_sets" {
 description = "Additional sensitive Helm values to set for Cerbos service"
 type        = map(string)
 default     = {}
}

5. Create terraform.tfvars

# terraform.tfvars - Application-specific configuration

# Application replica counts (override defaults as needed)
api_replica_count = 1
next_replica_count = 2
litellm_replica_count = 1

# Chart Versions
api_chart_version                = "0.0.15"
next_chart_version               = "0.0.15"
litellm_chart_version            = "0.0.15"
llama_indexer_chart_version      = "0.0.15"
credits_chart_version            = "0.0.15"
external_sync_chart_version      = "0.0.15"
external_poller_chart_version    = "0.0.15"
audit_log_exporter_chart_version = "0.0.15"
cerbos_chart_version             = "0.0.15"

# Application-specific settings can be configured in values/ files

6. Create outputs.tf

# outputs.tf - Output values from applications deployment

output "app_endpoint" {
 description = "API service endpoint"
 value       = "https://app.${var.domain_name}"
}

output "api_endpoint" {
 description = "API service endpoint"
 value       = "https://api.${var.domain_name}"
}

output "deployment_summary" {
 description = "Summary of deployed applications"
 value = {
   api                = var.enable_api ? "Deployed" : "Skipped"
   next               = var.enable_next ? "Deployed" : "Skipped"
   litellm            = var.enable_litellm ? "Deployed" : "Skipped"
   llama_indexer      = var.enable_llama_indexer ? "Deployed" : "Skipped"
   credits            = var.enable_credits ? "Deployed" : "Skipped"
   external_sync      = var.enable_external_sync ? "Deployed" : "Skipped"
   external_poller    = var.enable_external_poller ? "Deployed" : "Skipped"
   audit_log_exporter = var.enable_audit_log_exporter ? "Deployed" : "Skipped"
   cerbos             = var.enable_cerbos ? "Deployed" : "Skipped"
 }
}

Application Configuration

Understanding Values Files

Each application has a values file in the values/ directory that configures:

  1. Resource Requirements

  2. Scaling Parameters

  3. Environment-Specific Settings

  4. Health Checks

  5. Ingress Rules

API Service Configuration (values/api.yaml)

replicaCount: ${replica_count}

image:
 repository: registry.kindo.ai/kindo-docker/api
 pullPolicy: IfNotPresent
 pullSecret: registry-credentials

secretName: api-env

service:
 applicationPort: 8000
 port: 80
 type: ClusterIP
 
ingress:
 defaults:
   tls: true
   tlsSecretName: ""
   annotations:
     kubernetes.io/ingress.class: "alb"
     alb.ingress.kubernetes.io/target-type: "ip"
     alb.ingress.kubernetes.io/healthcheck-path: /healthcheck
     alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
     alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01

 main:
   hosts:
     - api.${domain_name}
   paths: ["/"]

resources:
 requests:
   memory: "512Mi"
   cpu: "500m"
 limits:
   memory: "2Gi"
   cpu: "2000m"

autoscaling:
 enabled: true
 minReplicas: 3
 maxReplicas: 10
 targetCPUUtilizationPercentage: 70
 targetMemoryUtilizationPercentage: 80

livenessProbe:
 httpGet:
   path: /healthcheck
   port: 8000
 initialDelaySeconds: 30
 periodSeconds: 10

readinessProbe:
 httpGet:
   path: /healthcheck
   port: 8000
 initialDelaySeconds: 5
 periodSeconds: 5

# Node affinity for specific workloads
affinity:
 nodeAffinity:
   preferredDuringSchedulingIgnoredDuringExecution:
   - weight: 100
     preference:
       matchExpressions:
       - key: node.kubernetes.io/instance-type
         operator: In
         values:
         - m5.large
         - m5.xlarge

Next.js Frontend Configuration (values/next.yaml)

replicaCount: ${replica_count}

image:
 repository: registry.kindo.ai/kindo-docker/next
 pullPolicy: IfNotPresent
 pullSecret: registry-credentials

secretName: next-env

service:
 applicationPort: 3000
 port: 80
 type: ClusterIP

ingress:
 defaults:
   tls: true
   tlsSecretName: ""
   annotations:
     kubernetes.io/ingress.class: "alb"
     alb.ingress.kubernetes.io/target-type: "ip"
     alb.ingress.kubernetes.io/healthcheck-path: /
     alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
     alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
     # Frontend gets priority routing
     alb.ingress.kubernetes.io/group.name: kindo-apps
     alb.ingress.kubernetes.io/group.order: "100"

 main:
   hosts:
     - app.${domain_name}
   paths: ["/"]

resources:
 requests:
   memory: "256Mi"
   cpu: "250m"
 limits:
   memory: "1Gi"
   cpu: "1000m"

# Environment variables (non-sensitive)
env:
 - name: NEXT_PUBLIC_API_URL
   value: "https://api.${domain_name}"
 - name: NODE_ENV
   value: "production"

LiteLLM Configuration (values/litellm.yaml)

replicaCount: ${replica_count}

image:
 repository: registry.kindo.ai/kindo-docker/litellm
 pullPolicy: IfNotPresent
 pullSecret: registry-credentials

secretName: litellm-env

service:
 applicationPort: 4000
 port: 80
 type: ClusterIP

ingress:
 defaults:
   tls: true
   tlsSecretName: ""
   annotations:
     kubernetes.io/ingress.class: "alb"
     alb.ingress.kubernetes.io/target-type: "ip"
     alb.ingress.kubernetes.io/healthcheck-path: /health
     alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
     alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01

 main:
   hosts:
     - litellm.${domain_name}
   paths: ["/"]

resources:
 requests:
   memory: "1Gi"
   cpu: "1000m"
 limits:
   memory: "4Gi"
   cpu: "4000m"

# LiteLLM specific settings
persistence:
 enabled: true
 size: 10Gi
 storageClass: gp3

# Model configuration will come from External Secrets

Worker Services Configuration

For background workers (external-poller, external-sync):

replicaCount: ${replica_count}

image:
 repository: registry.kindo.ai/kindo-docker/external-poller  # or external-sync
 pullPolicy: IfNotPresent
 pullSecret: registry-credentials

secretName: external-poller-env  # or external-sync-env

service:
 applicationPort: 8000
 port: 80
 type: ClusterIP

resources:
 requests:
   memory: "512Mi"
   cpu: "500m"
 limits:
   memory: "2Gi"
   cpu: "2000m"

# No ingress for workers
ingress:
 enabled: false

# Health checks for workers
livenessProbe:
 httpGet:
   path: /health
   port: 8000
 initialDelaySeconds: 30
 periodSeconds: 30

readinessProbe:
 httpGet:
   path: /health
   port: 8000
 initialDelaySeconds: 5
 periodSeconds: 10

# Job-specific configuration via environment
env:
 - name: WORKER_CONCURRENCY
   value: "5"
 - name: POLL_INTERVAL
   value: "60"

Deployment Process

1. Initialize Terraform

# Initialize providers
terraform init

# Verify configuration
terraform validate

2. Plan Deployment

# Plan with all configuration files
terraform plan -var-file="../../shared.tfvars" -var-file="terraform.tfvars"

# Review the plan carefully:
# - Check namespace creation
# - Verify secret references
# - Confirm ingress configuration
# - Review resource allocations

3. Deploy Applications

# Apply configuration
terraform apply -var-file="../../shared.tfvars" -var-file="terraform.tfvars"

# Deployment takes 5-10 minutes depending on image sizes

4. Run initial migrations

kubectl exec -n api deployment/api -- npx prisma migrate deploy --schema /app/backend/api/node_modules/.prisma/client/schema.prisma

5. Monitor Deployment

# Watch namespace creation
kubectl get namespaces -w

# Monitor pod rollout by namespace
for ns in api next litellm llama-indexer credits external-sync external-poller audit-log-exporter cerbos; do
 echo "Checking namespace: $ns"
 kubectl get pods -n $ns
done

# Check deployment status
for ns in api next litellm; do
 echo "=== $ns ==="
 kubectl rollout status deployment -n $ns --timeout=300s
done

Post-Deployment Verification

1. Verify All Pods Running

# Check pod status across application namespaces
for ns in api next litellm llama-indexer credits external-sync external-poller audit-log-exporter cerbos; do
 echo "=== Namespace: $ns ==="
 kubectl get pods -n $ns
done

# Should see all pods in Running state with correct READY count

2. Verify External Secrets

# Check if secrets were created
for ns in api next litellm; do
 echo "=== Secrets in $ns ==="
 kubectl get secrets -n $ns
done

# Verify External Secrets are synced
for ns in api next litellm; do
 echo "=== ExternalSecret status in $ns ==="
 kubectl get externalsecrets -n $ns
done

# Verify secret content (without exposing sensitive data)
kubectl get secret -n api api-env -o jsonpath='{.data}' | jq 'keys'

3. Test Application Endpoints

# Get ingress endpoints
kubectl get ingress -A

# Test API health
curl -s https://api.yourdomain.com/health | jq .

# Test frontend
curl -I https://app.yourdomain.com

# Test LiteLLM
curl -s https://api.yourdomain.com/litellm/health | jq .

4. Check Application Logs

# API logs
kubectl logs -n api -l app.kubernetes.io/name=api --tail=50

# Frontend logs
kubectl logs -n next -l app.kubernetes.io/name=next --tail=50

# Check for errors
kubectl logs -n api -l app.kubernetes.io/name=api --tail=100 | grep -i error

5. Verify Integrations

# Check database connectivity
kubectl exec -n api deploy/api -- sh -c 'echo "Database connection test"'

# Check Redis connectivity  
kubectl exec -n api deploy/api -- sh -c 'echo "Redis connection test"'

# Check Unleash integration
kubectl exec -n api deploy/api -- curl -s http://unleash-edge.unleash-edge:3063/api/client/features

Troubleshooting

Common Issues

  1. Pods Stuck in Pending

  • 0/3 nodes are available: 3 Insufficient cpu

  • Solution: Check node resources, adjust resource requests, or scale node group:

  • kubectl describe nodes
    kubectl top nodes

  1. ImagePullBackOff

  • Failed to pull image: unauthorized

  • Solution: Verify registry secret:

  • kubectl get secret -n api registry-credentials -o json | jq '.data.".dockerconfigjson"' | base64 -d | jq .

  1. External Secret Not Found

  • SecretStore default/aws-secrets-manager, resource not found

  • Solution: Ensure External Secrets Operator is deployed and ClusterSecretStore exists:

  • kubectl get clustersecretstore

  1. Ingress Not Creating ALB

  • No LoadBalancer found for ingress

  • Solution: Check ALB controller logs and annotations:

  • kubectl logs -n kube-system deployment/aws-load-balancer-controller
    kubectl describe ingress -n api

Health Check Failures

# Debug liveness probe failures
kubectl describe pod -n api <pod-name>

# Test health endpoint from inside pod
kubectl exec -n api deploy/api -- curl -s localhost:8000/healthcheck

# Check environment variables
kubectl exec -n api deploy/api -- env | sort

Performance Issues

# Check resource usage by namespace
for ns in api next litellm; do
 echo "=== Resource usage in $ns ==="
 kubectl top pods -n $ns
done

# Review HPA status
kubectl get hpa -A

# Check for throttling
kubectl describe pod -n api <pod-name> | grep -A5 "Conditions:"

Best Practices

1. Production Configuration

  • Set appropriate resource requests and limits

  • Enable autoscaling for variable workloads

  • Configure pod disruption budgets

  • Use anti-affinity for high availability

2. Security

  • Regularly update container images

  • Scan images for vulnerabilities

  • Use network policies to restrict traffic

  • Enable audit logging

3. Monitoring

  • Export metrics to Prometheus

  • Set up alerts for key metrics

  • Monitor application logs

  • Track error rates and latencies

4. Cost Optimization

  • Right-size resource allocations based on actual usage

  • Use spot instances for non-critical workloads

  • Implement aggressive autoscaling policies

  • Clean up unused resources

Next Steps

After successful application deployment:

  1. Configure Monitoring: Set up observability stack

  2. Load Testing: Validate performance under load

  3. Backup Procedures: Implement data backup strategies

  4. Documentation: Document your specific configurations

  5. Runbooks: Create operational procedures