1: Deploying Kindo-Infra

Prev Next

This guide provides detailed step-by-step instructions for deploying the Kindo infrastructure using the kindo-infra module, following the pattern demonstrated in the example/smk-base configuration.

Table of Contents

  1. Overview

  2. Pre-Deployment Setup

  3. Configuration Setup

  4. Deployment Process

  5. Post-Deployment Verification

  6. Troubleshooting

Overview

The kindo-infra module deploys the following AWS resources:

  • Networking: VPC with public/private/database subnets across 3 AZs

  • EKS Cluster: Managed Kubernetes cluster with worker node groups

  • Databases: RDS PostgreSQL instances for Unleash and applications

  • Caching: ElastiCache Redis cluster

  • Storage: S3 buckets for application data

  • Security: IAM roles, security groups, and KMS keys

  • DNS: Route53 hosted zone (optional)

  • Email: SES configuration (optional)

Pre-Deployment Setup

1. Directory Structure

Create your deployment directory structure following the example pattern:

# Create your deployment directory
mkdir -p my-kindo-deployment/kindo-base
cd my-kindo-deployment/kindo-base

# Create the shared configuration file (git-ignored)
touch ../shared.tfvars

# Add shared.tfvars to .gitignore
echo "shared.tfvars" >> ../.gitignore

2. Shared Configuration

The example uses a shared configuration approach. Create your ../shared.tfvars file with common settings:

# shared.tfvars - Common configuration across all modules
# This file should be git-ignored as it contains sensitive information

# Core Configuration
region = "us-east-1"

# Project settings
project_name     = "kindo"
environment_name = "prod"

# AWS configuration
aws_profile  = "default"  # Your AWS profile name
cluster_name = "kindo-prod-cluster"

# Registry Configuration
registry_url      = "oci://registry.kindo.ai/kindo-helm"
registry_username = "YOUR_REGISTRY_USERNAME" # Provided by kindo in payload
registry_password = "YOUR_REGISTRY_PASSWORD" # Provided by kindo in payload file



# DNS settings
create_public_zone = true
base_domain        = "kindo.example.com"  # Your domain

# VPC Configuration
vpc_cidr_base = "10.50.0.0/16"  # Adjust based on your network planning

# Feature flags
ses_enabled                    = true
syslog_enabled                 = false
enable_adot_addon              = false
create_otel_collector_iam_role = false
enable_otel_collector_cr       = false
enable_external_dns            = true

# API Keys and Secrets (Replace with your values)
merge_api_key          = ""
merge_webhook_security = ""
pinecone_api_key       = ""
workos_api_key         = ""
workos_client_id       = ""

# LLM Provider API Keys
azure_openai_api_key        = ""
anthropic_api_key           = ""
cohere_api_key              = ""
deepgram_api_key            = ""
deepseek_api_key            = ""
groq_api_key                = ""
huggingface_api_key         = ""
nvidia_nim_api_key          = ""
openai_api_key              = ""
embedding_generator_api_key = ""
together_ai_api_key         = ""
watsonx_api_key             = ""

# Google Cloud credentials (if needed)
google_credentials_json = ""

# Additional API keys
firecrawl_api_key = ""
tavily_api_key    = ""

Security Note: The shared.tfvars file contains sensitive information. Never commit it to version control.

3. Infrastructure Configuration

Create your infrastructure-specific configuration file:

# Create main.tf for infrastructure
cat > main.tf << 'EOF'
terraform {
 required_version = ">= 1.11.0"
 
 required_providers {
   aws = {
     source  = "hashicorp/aws"
     version = ">= 5.0"
   }
   random = {
     source  = "hashicorp/random"
     version = ">= 3.0"
   }
 }

 # Configure your state backend
 # For production, uncomment and configure the S3 backend:
 # backend "s3" {
 #   bucket = "my-terraform-state-bucket"
 #   key    = "kindo/infrastructure/terraform.tfstate"
 #   region = "us-east-1"
 #  
 #   # Enable state locking
 #   dynamodb_table = "terraform-state-lock"
 #   encrypt        = true
 # }
}

provider "aws" {
 region  = var.region
 profile = var.aws_profile
 skip_metadata_api_check = true  # Add this if running from outside AWS
}

locals {
 # Core identifiers
 project     = var.project_name
 environment = var.environment_name
 region      = var.region
 aws_profile = var.aws_profile
 cluster_name = var.cluster_name

 # Generate random suffix for globally unique names
 random_suffix = random_string.suffix.result
}

# Random suffix for S3 bucket names
resource "random_string" "suffix" {
 length  = 8
 special = false
 upper   = false
}

# Deploy infrastructure
module "kindo_infra" {
 source = "../../modules/kindo-infra"  # Adjust path as needed

 # Core configuration
 project      = local.project
 environment  = local.environment
 region       = local.region
 
 # EKS Configuration
 cluster_name    = local.cluster_name
 cluster_version = "1.31"
 
 # Enable public access for initial setup
 cluster_endpoint_public_access  = true
 cluster_endpoint_private_access = true

 # Worker node configuration
 # Note: These are example values. Adjust based on your workload requirements
 general_purpose_workers_min_size     = 1
 general_purpose_workers_max_size     = 10
 general_purpose_workers_desired_size = 3
 
 memory_optimized_workers_min_size     = 1
 memory_optimized_workers_max_size     = 5
 memory_optimized_workers_desired_size = 2
 
 compute_optimized_workers_min_size     = 1
 compute_optimized_workers_max_size     = 5
 compute_optimized_workers_desired_size = 2

 # Network configuration
 vpc_cidr_base           = var.vpc_cidr_base
 availability_zone_names = ["${local.region}a", "${local.region}b", "${local.region}c"]

 # DNS and certificates
 create_public_zone = var.create_public_zone
 base_domain        = var.base_domain
 
 # Services configuration
 enable_ses    = var.ses_enabled
 syslog_enabled = var.syslog_enabled
 
 # S3 Bucket names (globally unique)
 s3_uploads_bucket_name    = "${local.project}-${local.environment}-uploads-${local.random_suffix}"
 s3_audit_logs_bucket_name = "${local.project}-${local.environment}-audit-${local.random_suffix}"

 # Database configuration
 postgres_instance_class = var.postgres_instance_class
 postgres_multi_az       = true
 
 # Redis configuration
 redis_transit_encryption_enabled = true
 redis_node_type                  = var.redis_node_type

 # VPC Peering configuration (if needed)
 peering_cidr_blocks = []  # Add CIDR blocks for peered VPCs

 # Syslog configuration (when syslog_enabled = true)
 syslog_cloudwatch_log_group_name = var.syslog_enabled ? "/aws/ec2/${local.project}-${local.environment}-syslog-forwarder" : null
 syslog_kinesis_delivery_stream = var.syslog_enabled ? "${local.project}-${local.environment}-syslog" : null

 aws_profile_for_ssm = var.aws_profile
}
EOF

4. Provider Configuration for Next Steps

Create a provider.tf file that will be used in the same directory for configuring Kubernetes and Helm providers:

# provider.tf - Provider configuration for Kubernetes and Helm

# These providers will be needed when the base stack includes secrets and peripheries
# For now, they're commented out as they depend on infrastructure outputs

provider "kubernetes" {
 host                   = module.kindo_infra.eks_cluster_endpoint
 cluster_ca_certificate = base64decode(module.kindo_infra.eks_cluster_certificate_authority_data)

 exec {
   api_version = "client.authentication.k8s.io/v1beta1"
   command     = "aws"
   args        = ["eks", "get-token", "--cluster-name", module.kindo_infra.eks_cluster_name, "--region", var.region]
   env = {
     AWS_PROFILE = var.aws_profile
   }
 }
}

provider "helm" {
 kubernetes {
   host                   = module.kindo_infra.eks_cluster_endpoint
   cluster_ca_certificate = base64decode(module.kindo_infra.eks_cluster_certificate_authority_data)

   exec {
     api_version = "client.authentication.k8s.io/v1beta1"
     command     = "aws"
     args        = ["eks", "get-token", "--cluster-name", module.kindo_infra.eks_cluster_name, "--region", var.region]
     env = {
       AWS_PROFILE = var.aws_profile
     }
   }
 }
}

5. Outputs Configuration

Create an outputs.tf file to expose important values:

# outputs.tf - Output values from infrastructure

output "cluster_name" {
 description = "EKS cluster name"
 value       = module.kindo_infra.eks_cluster_name
}

output "aws_region" {
 description = "AWS region"
 value       = var.region
}

output "aws_profile" {
 description = "AWS profile used"
 value       = var.aws_profile
}

output "project" {
 description = "Project name"
 value       = local.project
}

output "environment" {
 description = "Environment name"
 value       = local.environment
}

output "postgres_endpoint" {
 description = "PostgreSQL endpoint"
 value       = module.kindo_infra.postgres_endpoint
}

output "redis_connection_string" {
 description = "Redis connection string"
 value       = module.kindo_infra.redis_connection_string
 sensitive   = true
}

output "rabbitmq_connection_string" {
 description = "RabbitMQ connection string"
 value       = module.kindo_infra.rabbitmq_connection_string
 sensitive   = true
}

output "storage_bucket_name" {
 description = "S3 bucket for uploads"
 value       = module.kindo_infra.storage_bucket_name
}

output "storage_access_key" {
 description = "S3 storage access key"
 value       = module.kindo_infra.storage_access_key
 sensitive   = true
}

output "storage_secret_key" {
 description = "S3 storage secret key"
 value       = module.kindo_infra.storage_secret_key
 sensitive   = true
}

output "storage_region" {
 description = "S3 storage region"
 value       = module.kindo_infra.storage_region
}

output "base_domain" {
 description = "Base domain"
 value       = module.kindo_infra.base_domain
}

output "external_secrets_role_arn" {
 description = "IAM role ARN for external secrets"
 value       = module.kindo_infra.external_secrets_role_arn
}

output "alb_controller_role_arn" {
 description = "IAM role ARN for ALB controller"
 value       = module.kindo_infra.alb_controller_role_arn
}

output "external_dns_iam_role_arn" {
 description = "IAM role ARN for ExternalDNS"
 value       = module.kindo_infra.external_dns_iam_role_arn
}

output "otel_collector_iam_role_arn" {
 description = "IAM role ARN for OTel collector"
 value       = module.kindo_infra.otel_collector_iam_role_arn
}

# Database connection strings
output "kindo_db_connection_string" {
 description = "Kindo database connection string"
 value       = module.kindo_infra.kindo_db_connection_string
 sensitive   = true
}

output "litellm_db_connection_string" {
 description = "LiteLLM database connection string"
 value       = module.kindo_infra.litellm_db_connection_string
 sensitive   = true
}

output "unleash_db_connection_string" {
 description = "Unleash database connection string"
 value       = module.kindo_infra.unleash_db_connection_string
 sensitive   = true
}

# SMTP outputs (if SES enabled)
output "smtp_host" {
 description = "SMTP host"
 value       = module.kindo_infra.smtp_host
}

output "smtp_user" {
 description = "SMTP user"
 value       = module.kindo_infra.smtp_user
 sensitive   = true
}

output "smtp_password" {
 description = "SMTP password"
 value       = module.kindo_infra.smtp_password
 sensitive   = true
}

output "smtp_fromemail" {
 description = "SMTP from email"
 value       = module.kindo_infra.smtp_fromemail
}

# Syslog output (if enabled)
output "syslog_nlb_dns_name" {
 description = "Syslog NLB DNS name"
 value       = module.kindo_infra.syslog_nlb_dns_name
}

# EKS cluster certificate for provider configuration
output "eks_cluster_certificate_authority_data" {
 description = "Base64 encoded certificate data for EKS cluster"
 value       = module.kindo_infra.eks_cluster_certificate_authority_data
 sensitive   = true
}

output "infrastructure_outputs" {
 description = "Key infrastructure outputs"
 value = {
   region                  = module.kindo_infra.region
   eks_cluster_name        = module.kindo_infra.eks_cluster_name
   eks_cluster_endpoint    = module.kindo_infra.eks_cluster_endpoint
   base_domain             = module.kindo_infra.base_domain
   delegation_set_id       = module.kindo_infra.delegation_set_id
   wildcard_certificate_arn = module.kindo_infra.wildcard_certificate_arn
   wildcard_certificate_domain = module.kindo_infra.wildcard_certificate_domain
 }
}

output "name_servers" {
 description = "Name servers for the delegation set"
 value       = module.kindo_infra.delegation_set_name_servers
}

5. Variables Configuration

Create your terraform.tfvars file with infrastructure-specific overrides:


# terraform.tfvars - Infrastructure-specific configuration

# You can override any values from shared.tfvars here if needed
# Most values should be in shared.tfvars for consistency


# For production, you might want to:
# - Use larger instance types
# - Enable multi-AZ for databases
# - Restrict public access

# Example:
postgres_instance_class = "db.t3.small"
redis_node_type = "cache.t3.small"

5. Variables Definition

Create a variables.tf file to define all variables:

# variables.tf - Variable definitions

# Core Configuration Variables (from shared.tfvars)
variable "project_name" {
 description = "Project name"
 type        = string
}

variable "environment_name" {
 description = "Environment name"
 type        = string
}

variable "region" {
 description = "AWS region"
 type        = string
}

variable "aws_profile" {
 description = "AWS profile for authentication"
 type        = string
}

variable "cluster_name" {
 description = "EKS cluster name"
 type        = string
}

variable "vpc_cidr_base" {
 description = "Base CIDR block for VPC"
 type        = string
}

# DNS and Certificate Configuration
variable "base_domain" {
 description = "Base domain for Route53 hosted zone"
 type        = string
 default     = null
}

variable "create_public_zone" {
 description = "Whether to create a Route53 public hosted zone"
 type        = bool
 default     = true
}

# Feature Flags
variable "syslog_enabled" {
 description = "Whether to enable syslog"
 type        = bool
 default     = false
}

variable "ses_enabled" {
 description = "Whether to enable SES"
 type        = bool
 default     = false
}

variable "enable_adot_addon" {
 description = "Whether to enable the EKS ADOT Addon"
 type        = bool
 default     = false
}

variable "create_otel_collector_iam_role" {
 description = "Whether to create IAM Role for OTel Collector"
 type        = bool
 default     = false
}

variable "enable_external_dns" {
 description = "Whether to install ExternalDNS"
 type        = bool
 default     = false
}

variable "external_dns_hosted_zone_arns" {
 description = "List of hosted zone ARNs for ExternalDNS"
 type        = list(string)
 default     = []
}

# Registry Configuration (from shared.tfvars)
variable "registry_url" {
 description = "Helm OCI registry URL"
 type        = string
}

variable "registry_username" {
 description = "Username for the Helm OCI registry"
 type        = string
 sensitive   = true
}

variable "registry_password" {
 description = "Password for the Helm OCI registry"
 type        = string
 sensitive   = true
}

# Database and cache configuration
variable "postgres_instance_class" {
 description = "PostgreSQL instance class"
 type        = string
 default     = "db.t3.small"
}

variable "redis_node_type" {
 description = "Redis node type"
 type        = string
 default     = "cache.t3.small"
}

Deployment Process

1. Initialize Terraform

# Initialize Terraform
terraform init

# Verify providers
terraform version
terraform providers

2. Create Workspace (Optional)

For managing multiple environments:

# Create environment-specific workspace
terraform workspace new prod
# or
terraform workspace select prod

Note: When using workspaces, Terraform stores state files in terraform.tfstate.d/<workspace>/ directory.

3. Plan Deployment

# Run plan with both configuration files
terraform plan -var-file="../shared.tfvars" -var-file="terraform.tfvars"

# Save plan for review
terraform plan -var-file="../shared.tfvars" -var-file="terraform.tfvars" -out=infra.plan

4. Review Critical Resources

Before applying, carefully review:

  1. VPC CIDR: Ensure no conflicts

  2. Security Groups: Check ingress rules

  3. RDS Settings: Multi-AZ, backup settings

  4. Node Groups: Instance types and scaling

  5. S3 Buckets: Names are globally unique

5. Deploy Infrastructure

# Apply the configuration
terraform apply -var-file="../shared.tfvars" -var-file="terraform.tfvars"

# Or use saved plan
terraform apply infra.plan

Note: Initial deployment takes 15-30 minutes due to EKS cluster creation.

6. Delegate DNS

Go to your parent DNS zone manager and make sure to delegate the NS servers for the subdomain you have selected for kindo by using the nameservers output from the infra implementation.

Example output:

name_servers = tolist([ “ns-865.awsdns-44.net”, “ns-1083.awsdns-07.org”, “ns-2002.awsdns-58.co.uk”, “ns-99.awsdns-12.com”,])

7. Request SES production account

Upon creation of an SES account aws will place it in sandbox status. This means it will be able to send outgoing emails to a list of validated email addresses. This is not fit for production usage and you will need to ropen a request with AWS to upgrade the account.

  • Log in to the AWS Management Console and navigate to the SES dashboard.

  • Check for the “Your Amazon SES account is in the sandbox” warning.

  • Click “Request production access” or “Get set up” to initiate the request.

  • Select the type of email you’ll be sending (Marketing or Transactional).

  • Provide a brief description of your use case, including the type of emails you’ll be sending and who your recipients are.

  • Submit your request.

AWS will review your request and may ask for more information if needed. Once approved, you’ll receive an email and your account will be moved out of sandbox mode.

Post-Deployment Verification

1. Verify EKS Cluster

# Update kubeconfig
aws eks update-kubeconfig --name $(terraform output -raw cluster_name) --region $(terraform output -raw aws_region) --profile $(terraform output -raw aws_profile)

# Verify cluster access
kubectl get nodes
kubectl get namespaces

2. Verify RDS Instances

# Check RDS instances
aws rds describe-db-instances --region $(terraform output -raw aws_region) --profile $(terraform output -raw aws_profile) | jq '.DBInstances[] | {DBInstanceIdentifier, DBInstanceStatus, Endpoint}'

# Test connectivity (from within VPC)
psql -h $(terraform output -raw postgres_endpoint | cut -d: -f1) -U postgres_admin -d postgresdb

3. Verify S3 Buckets

# List created buckets
aws s3 ls --profile $(terraform output -raw aws_profile) | grep $(terraform output -raw project)

# Verify encryption
aws s3api get-bucket-encryption --bucket $(terraform output -raw storage_bucket_name) --profile $(terraform output -raw aws_profile)

4. Verify Security Groups

# Check security group rules
aws ec2 describe-security-groups --filters "Name=vpc-id,Values=$(terraform output -raw vpc_id)" --region $(terraform output -raw aws_region) --profile $(terraform output -raw aws_profile) --query 'SecurityGroups[*].[GroupName,IpPermissions[0]]'

5. DNS Verification (if using Route53)

# Check hosted zone
aws route53 list-hosted-zones --profile $(terraform output -raw aws_profile) --query "HostedZones[?Name=='$(terraform output -raw base_domain).']"

# Verify DNS resolution
dig +short $(terraform output -raw base_domain)

Troubleshooting

Common Issues

  1. VPC CIDR Conflicts

  • Error: Invalid CIDR block

  • Solution: Choose a different CIDR range that doesn’t conflict with existing VPCs.

  1. Insufficient IAM Permissions

  • Error: UnauthorizedOperation

  • Solution: Ensure your IAM user/role has all required permissions.

  1. EKS Cluster Creation Timeout

  • Error: timeout while waiting for state

  • Solution: Check CloudFormation stack events for detailed errors.

  1. S3 Bucket Name Conflicts

  • Error: BucketAlreadyExists

  • Solution: S3 bucket names are globally unique. The module adds random suffixes, but conflicts can still occur.

Validation Commands

# Check all resources
terraform state list

# Inspect specific resource
terraform state show module.kindo_infra.aws_eks_cluster.this

# Check for drift
terraform plan -var-file="../shared.tfvars" -var-file="terraform.tfvars"

Rollback Procedure

If deployment fails:

  1. Partial Failure: Fix the issue and re-run terraform apply

  2. Complete Rollback:

  • terraform destroy -var-file="../shared.tfvars" -var-file="terraform.tfvars"

Warning: Destroy will delete all resources including data. Ensure backups exist.

Next Steps

After successful infrastructure deployment:

  1. Secrets Management - Generate and store application secrets

  2. Note down all outputs - they’ll be needed for subsequent steps

  3. Verify all resources are healthy before proceeding

Important Outputs Reference

Save these outputs as they’re needed for next steps:

Output

Used In

Purpose

cluster_name

All modules

Kubernetes cluster connection

eks_cluster_endpoint

Providers

Kubernetes/Helm provider config

eks_cluster_certificate_authority_data

Providers

Kubernetes/Helm authentication

postgres_endpoint

Secrets module

Database endpoint

kindo_db_connection_string

Secrets module

Kindo database connection

litellm_db_connection_string

Secrets module

LiteLLM database connection

unleash_db_connection_string

Secrets module

Unleash database connection

redis_connection_string

Secrets module

Cache configuration

rabbitmq_connection_string

Secrets module

Message queue configuration

storage_bucket_name

Secrets module

S3 storage bucket

storage_access_key

Secrets module

S3 access credentials

storage_secret_key

Secrets module

S3 secret credentials

smtp_* outputs

Secrets module

Email configuration

base_domain

All modules

Domain configuration

external_secrets_role_arn

Peripheries

Secrets management

alb_controller_role_arn

Peripheries

ALB ingress controller

external_dns_iam_role_arn

Peripheries

External DNS

name_servers

DNS Delegation

Kindo app endpoint management

Note: The eks_cluster_certificate_authority_data output might not be available in the current version of the kindo-infra module. If this output is missing, you’ll need to retrieve it manually using:

aws eks describe-cluster --name $(terraform output -raw cluster_name) --query "cluster.certificateAuthority.data" --output text