This guide walks you through deploying the Kindo infrastructure using the kindo-infra module.
Table of Contents
Quick Start
1. Set Up Your Deployment Directory
# Clone the Kindo modules repository
git clone <https://github.com/your-org/kindo-modules.git>
cd kindo-modules/examples/kindo-infra-aws-example
# Copy the example configuration
cp terraform.tfvars.example terraform.tfvars
# Edit your configuration
vi terraform.tfvars2. Minimal Configuration
Here’s the absolute minimum you need to configure in terraform.tfvars:
# Core configuration (REQUIRED)
project_name = "mycompany" # Your project identifier
environment = "production" # Environment name (dev/staging/production)
aws_region = "us-west-2" # AWS region for deployment
# Deployment sizing (REQUIRED)
deployment_size = "small" # Choose: dev, small, medium, large, xlarge
# Network configuration (REQUIRED)
vpc_cidr = "10.0.0.0/16" # Must not conflict with existing networks
# DNS configuration (REQUIRED)
domain_name = "kindo.mycompany.com" # Your domain for the application3. Deploy
# Set your AWS profile
export AWS_PROFILE=your-aws-profile
# Initialize Terraform
terraform init
# Review the plan
terraform plan
# Deploy
terraform applyUnderstanding terraform.tfvars
The terraform.tfvars file is your primary configuration interface. Each parameter controls specific aspects of your infrastructure deployment. Let’s explore each section in detail. Unlike variables.tf that require explicit variable declaration, .tfvars files are automatically interpolated without explicit variable declarations.
Core Configuration Parameters
Project Identification
project_name = "mycompany"
environment = "production"What these do:
project_name: Creates a namespace for all resources. Used in resource naming (e.g., mycompany-production-eks-cluster)
environment: Distinguishes between deployments (dev/staging/production). Affects resource naming and default configurations
Best practices:
Use lowercase, no spaces or special characters
Keep project_name consistent across all environments
Use standard environment names: dev, staging, production
AWS Region
aws_region = "us-west-2"What this does:
Determines where all resources are deployed
Affects latency for your users
Influences service availability and pricing
How to choose:
Pick a region close to your users
Ensure all required services are available (EKS, RDS, ElastiCache, MQ)
Consider data residency requirements
Check AWS pricing for your chosen region
Deployment Sizing
T-Shirt Sizing Model
deployment_size = "small"
production_mode = falseAvailable sizes and what they configure:
Size | Best For | Resources | Monthly Cost |
|---|---|---|---|
dev | Development/Testing | • 1-3 t3.medium nodes• db.t3.micro RDS• No HA | ~$150-200 |
small | Small Production | • 2-5 t3.large nodes• db.t3.small RDS• Optional HA | ~$400-500 |
medium | Standard Production | • 3-8 t3.xlarge nodes• db.t3.medium RDS• HA enabled | ~$800-1000 |
large | High-Traffic | • 5-15 m5.xlarge nodes• db.m5.large RDS• Full HA | ~$2000-3000 |
xlarge | Enterprise | • 10-30 m5.2xlarge nodes• db.m5.xlarge RDS• Full HA + backup | ~$5000+ |
What production_mode does:
true: Enables deletion protection on RDS, prevents accidental terraform destroy
false: Allows easy cleanup for development environments
Overriding Size Defaults
If the t-shirt sizes don’t fit perfectly, you can override specific settings:
deployment_size = "small"
# Override specific node group settings
general_min_size = 3 # Instead of default 2 for small
general_max_size = 10 # Instead of default 5 for small
general_instance_types = ["t3.xlarge"] # Instead of t3.large
# Override database settings
postgres_main_instance_class = "db.t3.medium" # Instead of db.t3.small
postgres_main_allocated_storage = 50 # Instead of 20GBNetwork Configuration
VPC CIDR
vpc_cidr = "10.0.0.0/16"What this does:
Creates a VPC with the specified IP range
Automatically creates subnets across 3 availability zones
Must not conflict with:
Other VPCs in your account
On-premises networks (if using VPN/Direct Connect)
Peered VPCs
Subnet allocation (automatic):
10.0.0.0/16 creates:
Public: 10.0.0.0/20, 10.0.16.0/20, 10.0.32.0/20 (3 AZs)
Private: 10.0.48.0/20, 10.0.64.0/20, 10.0.80.0/20 (3 AZs)
Database: Created within private subnetsAvailability Zones
# Usually not needed - auto-detected
# availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]What this does:
Specifies which AZs to use for high availability
Default: Automatically uses first 3 AZs in your region
Only override if you have specific AZ requirements
Feature Toggles
DNS and SSL Configuration
domain_name = "kindo.mycompany.com"
create_public_zone = trueWhat these do:
domain_name: Base domain for your application
Creates subdomains: app.kindo.mycompany.com, api.kindo.mycompany.com
Used for SSL certificate generation
create_public_zone:
true: Creates Route53 hosted zone (you’ll need to update NS records)
false: Use existing Route53 zone or external DNS
Audit Logging
enable_syslog = trueWhat this does:
enable_syslog:
true: Deploys managed audit log infrastructure
EC2 instances with rsyslog for audit log collection
Kinesis Firehose for secure audit log streaming
S3 storage with lifecycle policies for compliance
Network Load Balancer providing syslog endpoint for applications
Used by Kindo applications to store audit logs (user actions, security events)
false: You must provide external syslog endpoint for audit logging
Important: This is a critical compliance feature. Audit logs track all user actions and security events within the Kindo platform.
Monitoring and Observability
enable_monitoring = trueWhat this does:
enable_monitoring:
true: Deploys ADOT (AWS Distro for OpenTelemetry)
Collects application metrics and traces
Sends to AWS X-Ray and CloudWatch
Configures Container Insights for cluster monitoring
Provides performance and operational visibility
false: Use your existing monitoring solution (Datadog, New Relic, etc.)
Email Service
enable_ses = true
# If false, configure SMTP settings in kindo-secrets moduleWhat this does:
true: Configures Amazon SES for email
Creates SES domain identity
Sets up DKIM verification
Configures IAM permissions
false: You’ll provide SMTP configuration later
VPN Access
enable_vpn = false
# vpn_client_cidr_block = "10.100.0.0/16" # If enabledWhat this does:
true: Creates AWS Client VPN endpoint
Secure access to private resources
Requires certificate generation
Additional cost (~$0.10/hour + connection fees)
false: Access via public endpoints only
Additional Node Groups
memory_node_group_enabled = true
compute_node_group_enabled = falseWhat these do:
memory_node_group_enabled: Adds memory-optimized nodes (r6i family)
For memory-intensive workloads
Uses ONDEMAND instances by default
compute_node_group_enabled: Adds compute-optimized nodes (c6i family)
For CPU-intensive workloads
Uses ONDEMAND instances by default
Advanced Configuration
Kubernetes Version
kubernetes_version = "1.32"Considerations:
Use latest stable version for new deployments
Check compatibility with your applications
Plan for regular upgrades (every 3-4 months)
PostgreSQL Configuration
# Database versions
postgres_engine_version = "17.4"
# User management
manage_postgres_users = true # Auto-provision database users
# skip_postgres_resource_destroy = true # Helps with clean destroy
# Performance tuning
postgres_main_max_connections = 200
postgres_auxiliary_max_connections = 100
# Storage encryption
postgres_storage_encrypted = true
# Backup settings
postgres_backup_retention_period = 7
postgres_backup_window = "03:00-04:00"
postgres_maintenance_window = "sun:04:00-sun:05:00"
# High Availability
postgres_main_multi_az = true # For production
postgres_auxiliary_multi_az = false # Can be false for cost savingsDatabase User Provisioning: When manage_postgres_users = true, the module automatically:
Creates dedicated databases and users for each application:
Main RDS: kindo database and user for the main application
Auxiliary RDS:
unleash database and user for feature flags
litellm database and user for LLM proxy
nango database and user for integrations
Generates secure passwords stored in AWS Secrets Manager
Sets appropriate permissions for each user
Uses ephemeral SSM tunnels for secure provisioning
Redis Configuration
# Redis version
redis_engine_version = "7.0"
# Node type (affects memory and performance)
redis_node_type = "cache.t3.small"
# Cluster mode
redis_number_cache_clusters = 2 # For HA
redis_automatic_failover_enabled = true
# Encryption
redis_transit_encryption_enabled = true
redis_at_rest_encryption_enabled = trueRabbitMQ Configuration
# Version
rabbitmq_engine_version = "3.13"
# Instance type
rabbitmq_instance_type = "mq.t3.micro"
# Deployment mode
rabbitmq_deployment_mode = "SINGLE_INSTANCE" # or "CLUSTER_MULTI_AZ" for HAStorage Configuration
# S3 bucket names (auto-generated if not specified)
# s3_uploads_bucket_name = "mycompany-kindo-uploads"
# s3_audit_logs_bucket_name = "mycompany-kindo-audit"
# Audit log retention (compliance requirement)
syslog_log_retention_days = 365 # Minimum 1 year for compliance
# syslog_s3_transition_ia_days = 30 # Move to Infrequent Access after 30 days
# syslog_s3_transition_glacier_days = 90 # Move to Glacier after 90 daysSecurity and Compliance
# Deletion protection
postgres_deletion_protection = true # Prevent accidental deletion
postgres_skip_final_snapshot = false # Take snapshot before deletion
# Network security
cluster_endpoint_public_access = true # For development
cluster_endpoint_private_access = true # Always recommended
# Encryption
enable_s3_encryption = true
kms_key_deletion_window_in_days = 30Resource Tags
tags = {
Owner = "platform-team"
CostCenter = "engineering"
Environment = "production"
Compliance = "sox"
}
Best practices for tags:
Use for cost allocation
Include compliance markers
Add contact information
Keep consistent across resources
Terraform Deployment Process
1. Pre-Deployment Checklist
Before running Terraform:
[x] AWS credentials configured (aws sts get-caller-identity)
[x] VPC CIDR doesn’t conflict with existing networks
[x] Domain name is available
[x] Required service quotas are available
[x] terraform.tfvars is complete
2. Initialize Terraform
cd examples/kindo-infra-aws-example
# Initialize Terraform
terraform init
# Format configuration files
terraform fmt
# Validate configuration
terraform validate3. Review the Terraform Plan
# Generate and review the plan
terraform plan -out=tfplan
# For a specific deployment size
terraform plan -var="deployment_size=medium" -out=tfplan
# Save plan for review
terraform show -no-color tfplan > plan.txt4. Deploy Infrastructure
# Apply the configuration
terraform apply tfplan
# Or directly (will prompt for confirmation)
terraform apply
# For automated deployments (CI/CD)
terraform apply -auto-approve5. Monitor Deployment
The deployment typically takes:
VPC and networking: 2-3 minutes
EKS cluster: 10-15 minutes
RDS instances: 10-15 minutes
Total: 25-35 minutes
Watch for:
EKS cluster becoming ACTIVE
RDS instances becoming available
Load balancers becoming active
Post-Deployment Verification
1. Review Critical Outputs
# Display all outputs
terraform output
# Save outputs for next steps
terraform output -json > infrastructure-outputs.jsonKey Outputs and Their Purpose:
Output | Purpose | Action Required |
|---|---|---|
route53_name_servers | NS records for your domain | ⚠️ ACTION: Update your domain registrar |
ses_verification_token | SES domain verification | ⚠️ ACTION: May need to verify domain |
syslog_endpoint | Audit log endpoint | Note for application configuration |
eks_cluster_name | Cluster identifier | Used for kubectl configuration |
postgres_*_connection_string | Database connections | Automatically used by secrets module |
redis_connection_string | Cache connection | Automatically used by secrets module |
rabbitmq_connection_string | Message queue | Automatically used by secrets module |
vpc_id | Network identifier | May need for additional resources |
alb_dns_name | Load balancer endpoint | Will be mapped to your domain |
2. Required Post-Deployment Actions
DNS Delegation (if create_public_zone = true)
⚠️ IMMEDIATE ACTION REQUIRED:
# Get the Route53 name servers
terraform output route53_name_servers
# Example output:
# [
# "ns-123.awsdns-12.com",
# "ns-456.awsdns-34.net",
# "ns-789.awsdns-56.org",
# "ns-012.awsdns-78.co.uk"
# ]What to do:
Log into your domain registrar (GoDaddy, Namecheap, etc.)
Find DNS/Nameserver settings for your domain
Replace existing nameservers with the AWS Route53 nameservers above
Wait for DNS propagation (5-30 minutes typically)
Verify DNS delegation:
# Check if DNS is properly delegated
dig +short NS $(terraform output -raw domain_name)
# Should return the Route53 name serversAmazon SES Setup (if enable_ses = true)
⚠️ ACTION MAY BE REQUIRED:
# Check SES verification status
aws ses get-identity-verification-attributes \\
--identities $(terraform output -raw domain_name) \\
--region $(terraform output -raw aws_region)
# Check if account is in sandbox mode
aws ses describe-configuration-set \\
--configuration-set-name default \\
--region $(terraform output -raw aws_region) 2>/dev/null || \\
echo "⚠️ SES is likely in sandbox mode"If SES is in sandbox mode:
For testing: Add verified email addresses
aws ses verify-email-identity \\
--email-address [email protected] \\
--region $(terraform output -raw aws_region)For production: Request production access
Open AWS Support Center
Create case: “Service limit increase”
Service: “Simple Email Service (SES)”
Limit type: “Sending Quota”
Provide use case and expected volume
Typical approval time: 24-48 hours
3. Configure kubectl Access
# Update kubeconfig
aws eks update-kubeconfig \\
--region $(terraform output -raw aws_region) \\
--name $(terraform output -raw eks_cluster_name) \\
--profile ${AWS_PROFILE}
# Verify cluster accesskubectl get nodes
kubectl get pods -AExpected output:
2-5 nodes in Ready state (based on deployment_size)
Core system pods running in kube-system namespace
4. Verify Database Provisioning
If manage_postgres_users = true:
# Check created databases
echo "Main RDS databases:"
terraform output postgres_main_databases
echo "Auxiliary RDS databases:"
terraform output postgres_auxiliary_databases
# Verify secrets were created
aws secretsmanager list-secrets \\
--filters "Key=name,Values=kindo-${ENVIRONMENT}-postgres" \\
--query 'SecretList[*].Name' \\
--region $(terraform output -raw aws_region)Expected databases:
Main RDS: kindo (main application database)
Auxiliary RDS: unleash, litellm, nango
5. Verify Audit Logging Setup
If enable_syslog = true:
# Get syslog endpoint
SYSLOG_ENDPOINT=$(terraform output -raw syslog_endpoint)
echo "Syslog endpoint: $SYSLOG_ENDPOINT"
# Test syslog connectivity (from a machine with network access)
echo "<14>$(date '+%b %d %H:%M:%S') test-host kindo-test: Test audit log" | \\
nc -u -w1 $SYSLOG_ENDPOINT 514
# Check if logs are being stored in S3
aws s3 ls s3://$(terraform output -raw audit_logs_bucket_name)/syslog/ \\
--region $(terraform output -raw aws_region)6. Verify Monitoring Setup
If enable_monitoring = true:
# Check ADOT addon status
kubectl get pods -n opentelemetry-operator-system
# Verify X-Ray service map (after applications are deployed)
aws xray get-service-graph \\
--start-time $(date -u -d '5 minutes ago' +%s) \\
--end-time $(date +%s) \\
--region $(terraform output -raw aws_region)7. Security Verification
# Verify encryption is enabled
echo "RDS Encryption Status:"
aws rds describe-db-instances \\
--query 'DBInstances[*].[DBInstanceIdentifier,StorageEncrypted]' \\
--output table \\
--region $(terraform output -raw aws_region)
echo "Redis Encryption Status:"
aws elasticache describe-replication-groups \\
--query 'ReplicationGroups[*].[ReplicationGroupId,TransitEncryptionEnabled,AtRestEncryptionEnabled]' \\
--output table \\
--region $(terraform output -raw aws_region)8. Document Important Endpoints
Create a deployment summary for your team:
cat > deployment-summary.md << EOF# Kindo Infrastructure Deployment Summary## Deployment Information- **Date**: $(date)- **Environment**: $(terraform output -raw environment)- **Region**: $(terraform output -raw aws_region)- **Deployment Size**: $(terraform output -raw deployment_size)## Critical Endpoints- **EKS Cluster**: $(terraform output -raw eks_cluster_name)- **Domain**: $(terraform output -raw domain_name)- **Syslog Endpoint**: $(terraform output -raw syslog_endpoint)## Database Information- **Main RDS**: $(terraform output -raw postgres_main_endpoint)- **Auxiliary RDS**: $(terraform output -raw postgres_auxiliary_endpoint)## Action Items- [ ] DNS nameservers updated at registrar- [ ] SES production access requested (if needed)- [ ] Team members granted kubectl access- [ ] Monitoring dashboards configured## Next Steps1. Deploy secrets using kindo-secrets module2. Deploy applications using kindo-applications module3. Configure peripherals using kindo-peripheries moduleEOFecho "✅ Deployment summary saved to deployment-summary.md"Troubleshooting
Common Issues and Solutions
1. VPC CIDR Conflicts
Error: “The CIDR ‘10.0.0.0/16’ conflicts with a CIDR of another VPC”
Solution:
# Use a different CIDR range
vpc_cidr = "10.1.0.0/16" # or "172.16.0.0/16"2. Service Quotas Exceeded
Error: “You have exceeded the maximum number of DBInstances”
Solution:
# Check current quotas
aws service-quotas get-service-quota \\
--service-code rds \\
--quota-code L-952B80B9
# Request increase via AWS Console or CLI
3. EKS Addon Version Incompatibility
Error: “Addon version is not compatible with cluster version”
Solution:
# List compatible addon versions
aws eks describe-addon-versions \\
--addon-name vpc-cni \\
--kubernetes-version 1.32 \\
--query 'addons[0].addonVersions[*].addonVersion'4. PostgreSQL Connection Issues
Error: “PostgreSQL tunnel failed to start”
Solution:
# Option 1: Skip user management during destroy
skip_postgres_resource_destroy = true
# Option 2: Ensure VPC endpoints are created
create_vpc_endpoints = true5. Insufficient Node Capacity
Error: “Pod failed to schedule: Insufficient CPU/memory”
Solution:
# Increase node group size
general_min_size = 3
general_max_size = 10
# Or add specialized node groups
memory_node_group_enabled = trueDebugging Commands
# Check Terraform state
terraform state list
terraform state show <resource>
# Force resource recreation
terraform taint <resource>
terraform apply
# Clean up failed resources
terraform destroy -target=<resource>
# Enable debug logging
export TF_LOG=DEBUG
terraform applyGetting Help
If you encounter issues:
Check the troubleshooting guide
Review CloudWatch logs
Check AWS CloudTrail for API errors
Contact support with:
Terraform version (terraform version)
Error messages
terraform.tfvars (sanitized)
Terraform state list
Next Steps
After successful infrastructure deployment:
Deploy Secrets: Follow the Secrets Configuration Guide to generate and deploy application secrets
Deploy Applications: Use the Application Deployment Guide to deploy Kindo applications
Configure Peripherals: Set up additional services using the Peripherals Configuration Guide
Set Up Monitoring: Configure observability using the Monitoring Setup Guide
Appendix: Complete terraform.tfvars Reference
For reference, here’s a complete terraform.tfvars with all available options:
# ===============================================================
# Complete terraform.tfvars Reference
# ===============================================================
# Core Configuration (REQUIRED)
project_name = "mycompany"
environment = "production"
aws_region = "us-west-2"
# Deployment Sizing (REQUIRED)
deployment_size = "medium" # dev, small, medium, large, xlarge
production_mode = true # Enable deletion protection
# Network Configuration (REQUIRED)
vpc_cidr = "10.0.0.0/16"
# availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"] # Auto-detected
# DNS Configuration (REQUIRED)
domain_name = "kindo.mycompany.com"
create_public_zone = true
# Feature Toggles
enable_syslog = true
enable_monitoring = true
enable_ses = true
enable_vpn = false
# Additional Node Groups
memory_node_group_enabled = true
compute_node_group_enabled = false
# Kubernetes Configuration
kubernetes_version = "1.32"
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
# Database Configuration
postgres_engine_version = "17.4"
postgres_deletion_protection = true
postgres_skip_final_snapshot = false
postgres_backup_retention_period = 7
postgres_backup_window = "03:00-04:00"
postgres_maintenance_window = "sun:04:00-sun:05:00"
# Override size defaults (optional)
# general_min_size = 3
# general_max_size = 10
# general_instance_types = ["t3.xlarge"]
# postgres_main_instance_class = "db.t3.medium"
# postgres_main_allocated_storage = 50
# redis_node_type = "cache.t3.medium"
# Advanced Options
create_vpc_endpoints = true
skip_postgres_resource_destroy = true
# Resource Tags
tags = {
Owner = "platform-team"
CostCenter = "engineering"
Environment = "production"
}
Remember: Start with the minimal configuration and add parameters as needed. The t-shirt sizing handles most defaults appropriately for your deployment size.