This guide provides detailed step-by-step instructions for deploying the Kindo infrastructure using the kindo-infra module, following the pattern demonstrated in the example/smk-base configuration.
Table of Contents
Overview
The kindo-infra module deploys the following AWS resources:
Networking: VPC with public/private/database subnets across 3 AZs
EKS Cluster: Managed Kubernetes cluster with worker node groups
Databases: RDS PostgreSQL instances for Unleash and applications
Caching: ElastiCache Redis cluster
Storage: S3 buckets for application data
Security: IAM roles, security groups, and KMS keys
DNS: Route53 hosted zone (optional)
Email: SES configuration (optional)
Pre-Deployment Setup
1. Directory Structure
Create your deployment directory structure following the example pattern:
# Create your deployment directory
mkdir -p my-kindo-deployment/kindo-base
cd my-kindo-deployment/kindo-base
# Create the shared configuration file (git-ignored)
touch ../shared.tfvars
# Add shared.tfvars to .gitignore
echo "shared.tfvars" >> ../.gitignore
2. Shared Configuration
The example uses a shared configuration approach. Create your ../shared.tfvars file with common settings:
# shared.tfvars - Common configuration across all modules
# This file should be git-ignored as it contains sensitive information
# Core Configuration
region = "us-east-1"
# Project settings
project_name = "kindo"
environment_name = "prod"
# AWS configuration
aws_profile = "default" # Your AWS profile name
cluster_name = "kindo-prod-cluster"
# Registry Configuration
registry_url = "oci://registry.kindo.ai/kindo-helm"
registry_username = "YOUR_REGISTRY_USERNAME" # Provided by kindo in payload
registry_password = "YOUR_REGISTRY_PASSWORD" # Provided by kindo in payload file
# DNS settings
create_public_zone = true
base_domain = "kindo.example.com" # Your domain
# VPC Configuration
vpc_cidr_base = "10.50.0.0/16" # Adjust based on your network planning
# Feature flags
ses_enabled = true
syslog_enabled = false
enable_adot_addon = false
create_otel_collector_iam_role = false
enable_otel_collector_cr = false
enable_external_dns = true
# API Keys and Secrets (Replace with your values)
merge_api_key = ""
merge_webhook_security = ""
pinecone_api_key = ""
workos_api_key = ""
workos_client_id = ""
# LLM Provider API Keys
azure_openai_api_key = ""
anthropic_api_key = ""
cohere_api_key = ""
deepgram_api_key = ""
deepseek_api_key = ""
groq_api_key = ""
huggingface_api_key = ""
nvidia_nim_api_key = ""
openai_api_key = ""
embedding_generator_api_key = ""
together_ai_api_key = ""
watsonx_api_key = ""
# Google Cloud credentials (if needed)
google_credentials_json = ""
# Additional API keys
firecrawl_api_key = ""
tavily_api_key = ""
Security Note: The shared.tfvars file contains sensitive information. Never commit it to version control.
3. Infrastructure Configuration
Create your infrastructure-specific configuration file:
# Create main.tf for infrastructure
cat > main.tf << 'EOF'
terraform {
required_version = ">= 1.11.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
}
random = {
source = "hashicorp/random"
version = ">= 3.0"
}
}
# Configure your state backend
# For production, uncomment and configure the S3 backend:
# backend "s3" {
# bucket = "my-terraform-state-bucket"
# key = "kindo/infrastructure/terraform.tfstate"
# region = "us-east-1"
#
# # Enable state locking
# dynamodb_table = "terraform-state-lock"
# encrypt = true
# }
}
provider "aws" {
region = var.region
profile = var.aws_profile
skip_metadata_api_check = true # Add this if running from outside AWS
}
locals {
# Core identifiers
project = var.project_name
environment = var.environment_name
region = var.region
aws_profile = var.aws_profile
cluster_name = var.cluster_name
# Generate random suffix for globally unique names
random_suffix = random_string.suffix.result
}
# Random suffix for S3 bucket names
resource "random_string" "suffix" {
length = 8
special = false
upper = false
}
# Deploy infrastructure
module "kindo_infra" {
source = "../../modules/kindo-infra" # Adjust path as needed
# Core configuration
project = local.project
environment = local.environment
region = local.region
# EKS Configuration
cluster_name = local.cluster_name
cluster_version = "1.31"
# Enable public access for initial setup
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
# Worker node configuration
# Note: These are example values. Adjust based on your workload requirements
general_purpose_workers_min_size = 1
general_purpose_workers_max_size = 10
general_purpose_workers_desired_size = 3
memory_optimized_workers_min_size = 1
memory_optimized_workers_max_size = 5
memory_optimized_workers_desired_size = 2
compute_optimized_workers_min_size = 1
compute_optimized_workers_max_size = 5
compute_optimized_workers_desired_size = 2
# Network configuration
vpc_cidr_base = var.vpc_cidr_base
availability_zone_names = ["${local.region}a", "${local.region}b", "${local.region}c"]
# DNS and certificates
create_public_zone = var.create_public_zone
base_domain = var.base_domain
# Services configuration
enable_ses = var.ses_enabled
syslog_enabled = var.syslog_enabled
# S3 Bucket names (globally unique)
s3_uploads_bucket_name = "${local.project}-${local.environment}-uploads-${local.random_suffix}"
s3_audit_logs_bucket_name = "${local.project}-${local.environment}-audit-${local.random_suffix}"
# Database configuration
postgres_instance_class = var.postgres_instance_class
postgres_multi_az = true
# Redis configuration
redis_transit_encryption_enabled = true
redis_node_type = var.redis_node_type
# VPC Peering configuration (if needed)
peering_cidr_blocks = [] # Add CIDR blocks for peered VPCs
# Syslog configuration (when syslog_enabled = true)
syslog_cloudwatch_log_group_name = var.syslog_enabled ? "/aws/ec2/${local.project}-${local.environment}-syslog-forwarder" : null
syslog_kinesis_delivery_stream = var.syslog_enabled ? "${local.project}-${local.environment}-syslog" : null
aws_profile_for_ssm = var.aws_profile
}
EOF
4. Provider Configuration for Next Steps
Create a provider.tf file that will be used in the same directory for configuring Kubernetes and Helm providers:
# provider.tf - Provider configuration for Kubernetes and Helm
# These providers will be needed when the base stack includes secrets and peripheries
# For now, they're commented out as they depend on infrastructure outputs
provider "kubernetes" {
host = module.kindo_infra.eks_cluster_endpoint
cluster_ca_certificate = base64decode(module.kindo_infra.eks_cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.kindo_infra.eks_cluster_name, "--region", var.region]
env = {
AWS_PROFILE = var.aws_profile
}
}
}
provider "helm" {
kubernetes {
host = module.kindo_infra.eks_cluster_endpoint
cluster_ca_certificate = base64decode(module.kindo_infra.eks_cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.kindo_infra.eks_cluster_name, "--region", var.region]
env = {
AWS_PROFILE = var.aws_profile
}
}
}
}
5. Outputs Configuration
Create an outputs.tf file to expose important values:
# outputs.tf - Output values from infrastructure
output "cluster_name" {
description = "EKS cluster name"
value = module.kindo_infra.eks_cluster_name
}
output "aws_region" {
description = "AWS region"
value = var.region
}
output "aws_profile" {
description = "AWS profile used"
value = var.aws_profile
}
output "project" {
description = "Project name"
value = local.project
}
output "environment" {
description = "Environment name"
value = local.environment
}
output "postgres_endpoint" {
description = "PostgreSQL endpoint"
value = module.kindo_infra.postgres_endpoint
}
output "redis_connection_string" {
description = "Redis connection string"
value = module.kindo_infra.redis_connection_string
sensitive = true
}
output "rabbitmq_connection_string" {
description = "RabbitMQ connection string"
value = module.kindo_infra.rabbitmq_connection_string
sensitive = true
}
output "storage_bucket_name" {
description = "S3 bucket for uploads"
value = module.kindo_infra.storage_bucket_name
}
output "storage_access_key" {
description = "S3 storage access key"
value = module.kindo_infra.storage_access_key
sensitive = true
}
output "storage_secret_key" {
description = "S3 storage secret key"
value = module.kindo_infra.storage_secret_key
sensitive = true
}
output "storage_region" {
description = "S3 storage region"
value = module.kindo_infra.storage_region
}
output "base_domain" {
description = "Base domain"
value = module.kindo_infra.base_domain
}
output "external_secrets_role_arn" {
description = "IAM role ARN for external secrets"
value = module.kindo_infra.external_secrets_role_arn
}
output "alb_controller_role_arn" {
description = "IAM role ARN for ALB controller"
value = module.kindo_infra.alb_controller_role_arn
}
output "external_dns_iam_role_arn" {
description = "IAM role ARN for ExternalDNS"
value = module.kindo_infra.external_dns_iam_role_arn
}
output "otel_collector_iam_role_arn" {
description = "IAM role ARN for OTel collector"
value = module.kindo_infra.otel_collector_iam_role_arn
}
# Database connection strings
output "kindo_db_connection_string" {
description = "Kindo database connection string"
value = module.kindo_infra.kindo_db_connection_string
sensitive = true
}
output "litellm_db_connection_string" {
description = "LiteLLM database connection string"
value = module.kindo_infra.litellm_db_connection_string
sensitive = true
}
output "unleash_db_connection_string" {
description = "Unleash database connection string"
value = module.kindo_infra.unleash_db_connection_string
sensitive = true
}
# SMTP outputs (if SES enabled)
output "smtp_host" {
description = "SMTP host"
value = module.kindo_infra.smtp_host
}
output "smtp_user" {
description = "SMTP user"
value = module.kindo_infra.smtp_user
sensitive = true
}
output "smtp_password" {
description = "SMTP password"
value = module.kindo_infra.smtp_password
sensitive = true
}
output "smtp_fromemail" {
description = "SMTP from email"
value = module.kindo_infra.smtp_fromemail
}
# Syslog output (if enabled)
output "syslog_nlb_dns_name" {
description = "Syslog NLB DNS name"
value = module.kindo_infra.syslog_nlb_dns_name
}
# EKS cluster certificate for provider configuration
output "eks_cluster_certificate_authority_data" {
description = "Base64 encoded certificate data for EKS cluster"
value = module.kindo_infra.eks_cluster_certificate_authority_data
sensitive = true
}
output "infrastructure_outputs" {
description = "Key infrastructure outputs"
value = {
region = module.kindo_infra.region
eks_cluster_name = module.kindo_infra.eks_cluster_name
eks_cluster_endpoint = module.kindo_infra.eks_cluster_endpoint
base_domain = module.kindo_infra.base_domain
delegation_set_id = module.kindo_infra.delegation_set_id
wildcard_certificate_arn = module.kindo_infra.wildcard_certificate_arn
wildcard_certificate_domain = module.kindo_infra.wildcard_certificate_domain
}
}
output "name_servers" {
description = "Name servers for the delegation set"
value = module.kindo_infra.delegation_set_name_servers
}
5. Variables Configuration
Create your terraform.tfvars file with infrastructure-specific overrides:
# terraform.tfvars - Infrastructure-specific configuration
# You can override any values from shared.tfvars here if needed
# Most values should be in shared.tfvars for consistency
# For production, you might want to:
# - Use larger instance types
# - Enable multi-AZ for databases
# - Restrict public access
# Example:
postgres_instance_class = "db.t3.small"
redis_node_type = "cache.t3.small"
5. Variables Definition
Create a variables.tf file to define all variables:
# variables.tf - Variable definitions
# Core Configuration Variables (from shared.tfvars)
variable "project_name" {
description = "Project name"
type = string
}
variable "environment_name" {
description = "Environment name"
type = string
}
variable "region" {
description = "AWS region"
type = string
}
variable "aws_profile" {
description = "AWS profile for authentication"
type = string
}
variable "cluster_name" {
description = "EKS cluster name"
type = string
}
variable "vpc_cidr_base" {
description = "Base CIDR block for VPC"
type = string
}
# DNS and Certificate Configuration
variable "base_domain" {
description = "Base domain for Route53 hosted zone"
type = string
default = null
}
variable "create_public_zone" {
description = "Whether to create a Route53 public hosted zone"
type = bool
default = true
}
# Feature Flags
variable "syslog_enabled" {
description = "Whether to enable syslog"
type = bool
default = false
}
variable "ses_enabled" {
description = "Whether to enable SES"
type = bool
default = false
}
variable "enable_adot_addon" {
description = "Whether to enable the EKS ADOT Addon"
type = bool
default = false
}
variable "create_otel_collector_iam_role" {
description = "Whether to create IAM Role for OTel Collector"
type = bool
default = false
}
variable "enable_external_dns" {
description = "Whether to install ExternalDNS"
type = bool
default = false
}
variable "external_dns_hosted_zone_arns" {
description = "List of hosted zone ARNs for ExternalDNS"
type = list(string)
default = []
}
# Registry Configuration (from shared.tfvars)
variable "registry_url" {
description = "Helm OCI registry URL"
type = string
}
variable "registry_username" {
description = "Username for the Helm OCI registry"
type = string
sensitive = true
}
variable "registry_password" {
description = "Password for the Helm OCI registry"
type = string
sensitive = true
}
# Database and cache configuration
variable "postgres_instance_class" {
description = "PostgreSQL instance class"
type = string
default = "db.t3.small"
}
variable "redis_node_type" {
description = "Redis node type"
type = string
default = "cache.t3.small"
}
Deployment Process
1. Initialize Terraform
# Initialize Terraform
terraform init
# Verify providers
terraform version
terraform providers
2. Create Workspace (Optional)
For managing multiple environments:
# Create environment-specific workspace
terraform workspace new prod
# or
terraform workspace select prod
Note: When using workspaces, Terraform stores state files in terraform.tfstate.d/<workspace>/ directory.
3. Plan Deployment
# Run plan with both configuration files
terraform plan -var-file="../shared.tfvars" -var-file="terraform.tfvars"
# Save plan for review
terraform plan -var-file="../shared.tfvars" -var-file="terraform.tfvars" -out=infra.plan
4. Review Critical Resources
Before applying, carefully review:
VPC CIDR: Ensure no conflicts
Security Groups: Check ingress rules
RDS Settings: Multi-AZ, backup settings
Node Groups: Instance types and scaling
S3 Buckets: Names are globally unique
5. Deploy Infrastructure
# Apply the configuration
terraform apply -var-file="../shared.tfvars" -var-file="terraform.tfvars"
# Or use saved plan
terraform apply infra.plan
Note: Initial deployment takes 15-30 minutes due to EKS cluster creation.
6. Delegate DNS
Go to your parent DNS zone manager and make sure to delegate the NS servers for the subdomain you have selected for kindo by using the nameservers output from the infra implementation.
Example output:
name_servers = tolist([ “ns-865.awsdns-44.net”, “ns-1083.awsdns-07.org”, “ns-2002.awsdns-58.co.uk”, “ns-99.awsdns-12.com”,])
7. Request SES production account
Upon creation of an SES account aws will place it in sandbox status. This means it will be able to send outgoing emails to a list of validated email addresses. This is not fit for production usage and you will need to ropen a request with AWS to upgrade the account.
Log in to the AWS Management Console and navigate to the SES dashboard.
Check for the “Your Amazon SES account is in the sandbox” warning.
Click “Request production access” or “Get set up” to initiate the request.
Select the type of email you’ll be sending (Marketing or Transactional).
Provide a brief description of your use case, including the type of emails you’ll be sending and who your recipients are.
Submit your request.
AWS will review your request and may ask for more information if needed. Once approved, you’ll receive an email and your account will be moved out of sandbox mode.
Post-Deployment Verification
1. Verify EKS Cluster
# Update kubeconfig
aws eks update-kubeconfig --name $(terraform output -raw cluster_name) --region $(terraform output -raw aws_region) --profile $(terraform output -raw aws_profile)
# Verify cluster access
kubectl get nodes
kubectl get namespaces
2. Verify RDS Instances
# Check RDS instances
aws rds describe-db-instances --region $(terraform output -raw aws_region) --profile $(terraform output -raw aws_profile) | jq '.DBInstances[] | {DBInstanceIdentifier, DBInstanceStatus, Endpoint}'
# Test connectivity (from within VPC)
psql -h $(terraform output -raw postgres_endpoint | cut -d: -f1) -U postgres_admin -d postgresdb
3. Verify S3 Buckets
# List created buckets
aws s3 ls --profile $(terraform output -raw aws_profile) | grep $(terraform output -raw project)
# Verify encryption
aws s3api get-bucket-encryption --bucket $(terraform output -raw storage_bucket_name) --profile $(terraform output -raw aws_profile)
4. Verify Security Groups
# Check security group rules
aws ec2 describe-security-groups --filters "Name=vpc-id,Values=$(terraform output -raw vpc_id)" --region $(terraform output -raw aws_region) --profile $(terraform output -raw aws_profile) --query 'SecurityGroups[*].[GroupName,IpPermissions[0]]'
5. DNS Verification (if using Route53)
# Check hosted zone
aws route53 list-hosted-zones --profile $(terraform output -raw aws_profile) --query "HostedZones[?Name=='$(terraform output -raw base_domain).']"
# Verify DNS resolution
dig +short $(terraform output -raw base_domain)
Troubleshooting
Common Issues
VPC CIDR Conflicts
Error: Invalid CIDR block
Solution: Choose a different CIDR range that doesn’t conflict with existing VPCs.
Insufficient IAM Permissions
Error: UnauthorizedOperation
Solution: Ensure your IAM user/role has all required permissions.
EKS Cluster Creation Timeout
Error: timeout while waiting for state
Solution: Check CloudFormation stack events for detailed errors.
S3 Bucket Name Conflicts
Error: BucketAlreadyExists
Solution: S3 bucket names are globally unique. The module adds random suffixes, but conflicts can still occur.
Validation Commands
# Check all resources
terraform state list
# Inspect specific resource
terraform state show module.kindo_infra.aws_eks_cluster.this
# Check for drift
terraform plan -var-file="../shared.tfvars" -var-file="terraform.tfvars"
Rollback Procedure
If deployment fails:
Partial Failure: Fix the issue and re-run terraform apply
Complete Rollback:
terraform destroy -var-file="../shared.tfvars" -var-file="terraform.tfvars"
Warning: Destroy will delete all resources including data. Ensure backups exist.
Next Steps
After successful infrastructure deployment:
Secrets Management - Generate and store application secrets
Note down all outputs - they’ll be needed for subsequent steps
Verify all resources are healthy before proceeding
Important Outputs Reference
Save these outputs as they’re needed for next steps:
Output | Used In | Purpose |
---|---|---|
cluster_name | All modules | Kubernetes cluster connection |
eks_cluster_endpoint | Providers | Kubernetes/Helm provider config |
eks_cluster_certificate_authority_data | Providers | Kubernetes/Helm authentication |
postgres_endpoint | Secrets module | Database endpoint |
kindo_db_connection_string | Secrets module | Kindo database connection |
litellm_db_connection_string | Secrets module | LiteLLM database connection |
unleash_db_connection_string | Secrets module | Unleash database connection |
redis_connection_string | Secrets module | Cache configuration |
rabbitmq_connection_string | Secrets module | Message queue configuration |
storage_bucket_name | Secrets module | S3 storage bucket |
storage_access_key | Secrets module | S3 access credentials |
storage_secret_key | Secrets module | S3 secret credentials |
smtp_* outputs | Secrets module | Email configuration |
base_domain | All modules | Domain configuration |
external_secrets_role_arn | Peripheries | Secrets management |
alb_controller_role_arn | Peripheries | ALB ingress controller |
external_dns_iam_role_arn | Peripheries | External DNS |
name_servers | DNS Delegation | Kindo app endpoint management |
Note: The eks_cluster_certificate_authority_data output might not be available in the current version of the kindo-infra module. If this output is missing, you’ll need to retrieve it manually using:
aws eks describe-cluster --name $(terraform output -raw cluster_name) --query "cluster.certificateAuthority.data" --output text