Skip to content

Plan & Provision Infrastructure

This guide covers everything you need to plan, bring, and provision before kindo-cli can install Kindo on top. Work through the five sections in order — pick a size, confirm each component meets spec, plan DNS and firewall, provision via the AWS Terraform helper or BYO on other providers, then tick every box on the team-scoped checklist before handing off to the CLI.

1. Pick a deployment size

Sizing drives everything downstream: node count, database class, RAM and vCPU targets for managed services, and the service-quota requests you need to file with your cloud provider. Pick the smallest tier that matches your expected active users — you can scale up later, but it is cheaper to start right.

Node counts are shown as the recommended max per node group. Two groups (primary + agents) run at this sizing, so totals are 2× per-group.

SizeUse caseKubernetes nodes (per group)PostgreSQL (main / auxiliary)RedisRabbitMQTotal cluster capacity
Medium50 to 200 active users6 nodes × 8 vCPU, 16 GB RAM (c6i.2xlarge)main: 2 vCPU, 8 GB (db.m6i.large) / aux: 2 vCPU, 8 GB (db.t4g.large)3.09 GB (cache.t3.medium)2 vCPU, 8 GB RAM (mq.m5.large)96 vCPU, 192 GB RAM
Large200 to 1,000 active users10 nodes × 16 vCPU, 32 GB RAM (c6i.4xlarge)main: 4 vCPU, 16 GB (db.m6i.xlarge) / aux: 2 vCPU, 8 GB (db.m6i.large)6.38 GB (cache.m6g.large)4 vCPU, 16 GB RAM (mq.m5.xlarge)320 vCPU, 640 GB RAM

2. Component requirements

Kindo is a Helm-installed application. As long as the components below meet spec and are reachable from the cluster, the CLI install is cloud-agnostic. Every row is a hard requirement unless otherwise noted — if one is missing or misconfigured, the install will fail.

Kubernetes cluster

AttributeRequirement
Version1.32 or higher
NetworkingCNI with NetworkPolicy support (Calico, Cilium, Flannel)
StorageDynamic volume provisioning, default StorageClass, ReadWriteOnce volumes
IngressAn ingress controller installed (NGINX, Traefik, ALB Controller, or equivalent). Gateway API is also supported.
TLScert-manager installed, or certificates provisioned out-of-band
RBACEnabled (required)
Metricsmetrics-server installed (required for HPA and kubectl top)
SizingMedium: 6 nodes × 8 vCPU, 16 GB RAM (c6i.2xlarge), Large: 10 nodes × 16 vCPU, 32 GB RAM (c6i.4xlarge). Node counts are the recommended max per group; two groups (primary + agents) run at this sizing.

GPU nodes are only required if you plan to self-host models. Labels nvidia.com/gpu=true and accelerator=<model> let Helm schedule model workloads onto them.

PostgreSQL

AttributeRequirement
Version17.0 or higher (17.4 recommended)
Storage100 GB SSD minimum; scale with data volume
Concurrent connections200+
EncryptionAt-rest and in-transit required for production
HA (production)Streaming replication or managed service with automatic failover
BackupsDaily automated backups, 7+ day retention
Sizing (main)Medium: 2 vCPU, 8 GB RAM (db.m6i.large), Large: 4 vCPU, 16 GB RAM (db.m6i.xlarge)
Sizing (auxiliary)Medium: 2 vCPU, 8 GB RAM (db.t4g.large), Large: 2 vCPU, 8 GB RAM (db.m6i.large)

Redis — standalone only

AttributeRequirement
Version7.0 or higher (7.2 recommended)
Concurrent connections100+
ModeStandalone (single primary, no shards)
SizingMedium: 3.09 GB (cache.t3.medium), Large: 6.38 GB (cache.m6g.large)

Cloud provider compatibility:

  • AWS ElastiCache — Cluster Mode Disabled with num_node_groups = 1, replicas_per_node_group = 0. Engine version 7.0+. Cluster Mode Enabled with 2+ shards is not supported.
  • Azure Cache for Redis — Basic or Standard tier (single-node, non-clustered). Premium/Enterprise clustered tiers with 2+ shards are not supported.
  • Google Cloud Memorystore — Basic tier (standalone instance). Redis Cluster mode is not supported.

Verify your Redis mode before proceeding:

Terminal window
redis-cli INFO server | grep redis_mode
# redis_mode:standalone -> compatible
# redis_mode:sentinel -> not supported
# redis_mode:cluster -> not supported

RabbitMQ

AttributeRequirement
Version3.13 or higher
Disk20 GB minimum across all sizes
Management pluginEnabled
HA (production)3+ node cluster with quorum queues, or managed service (e.g., Amazon MQ)
SizingMedium: 2 vCPU, 8 GB RAM (mq.m5.large), Large: 4 vCPU, 16 GB RAM (mq.m5.xlarge)

S3-compatible object storage

Any S3 API-compatible store: AWS S3, Google Cloud Storage (S3 compatibility), Azure Blob (S3 compatibility), MinIO, or Ceph.

BucketPurposeAccess
kindo-uploadsUser file uploadsPrivate

Provision the buckets (or equivalent naming) with server-side encryption and block-public-access enabled.

Vector database

Pick one:

  • Qdrant (self-hosted or managed) — deploy into Kubernetes, or point at a managed Qdrant endpoint. Kindo creates the collection automatically on first call with the correct vector size for your embedding model — no manual setup required.
  • Pinecone (managed)pod-based index, cosine metric. Serverless Pinecone is not supported. The index dimension must be pre-created and must match the embedding model you configure in LiteLLM (e.g., 1536 for text-embedding-3-small, 3072 for text-embedding-3-large). Kindo does not create or resize the index for you.

TLS / certificates

Either (a) cert-manager configured against an ACME issuer (Let’s Encrypt) or a private CA, or (b) pre-issued wildcard certificate for *.kindo.example.com loaded as a Kubernetes Secret. kindo-cli references the certificate by name — it does not issue or rotate it for you.

Ingress

A running ingress controller in the cluster with:

  • Class name known to you (e.g., nginx, alb, traefik) — you’ll pass it to kindo-cli.
  • Inbound 443 (HTTPS) open; 80 allowed for HTTP→HTTPS redirect.
  • Outbound 443 to the public internet (or your private AI endpoints), plus access to PostgreSQL (5432), Redis (6379), RabbitMQ (5672), and syslog (514) as applicable. :::

Audit log sink

Syslog server supporting RFC3164 on TCP/UDP 514, reachable from the cluster. 1+ year retention recommended. Required for production compliance postures.

Email

Any email service that supports the SMTP protocol — a self-hosted SMTP server, or managed providers like Amazon SES, SendGrid, or Mailgun that expose SMTP credentials. Used for transactional email (invites, password reset, alerts).

3. Plan DNS and firewall

DNS subdomains

Kindo provisions 12 ingress hostnames across a single load balancer. Pick a parent domain you control (examples use kindo.company.com) and create A or CNAME records for each subdomain through your ingress controller. A wildcard TLS certificate covering *.kindo.company.com is the simplest option; with AWS Route53 + external-dns, records are created automatically from pod annotations.

SubdomainComponentExample
app.Next.js frontendapp.kindo.company.com
api.Backend API (also receives webhooks via path rules)api.kindo.company.com
integrations-api.Nango integration APIintegrations-api.kindo.company.com
integrations-connect.Nango OAuth / Connect UIintegrations-connect.kindo.company.com
sso.SSOReady adminsso.kindo.company.com
sso-auth.SSOReady auth endpointsso-auth.kindo.company.com
sso-api.SSOReady APIsso-api.kindo.company.com
sso-app.SSOReady user-facing appsso-app.kindo.company.com
hatchet.Hatchet API (/api) and dashboard (/) on one hosthatchet.kindo.company.com
litellm.LiteLLM model proxylitellm.kindo.company.com
unleash.Unleash feature flag serviceunleash.kindo.company.com
unleash-edge.Unleash Edge (client-facing read cache)unleash-edge.kindo.company.com

Inbound

Open only the ingress entry points to the public internet (or to your corporate network, if Kindo is internal-only).

PortProtocolPurpose
443TCPHTTPS — all user traffic
80TCPHTTP redirect to 443 (optional, recommended)

Outbound

The cluster needs egress to reach managed infrastructure and external AI providers. If any of these paths are blocked, the install or runtime behaviour will break — check with your network team before you start provisioning.

PortProtocolDestinationPurpose
443TCPAI providers (OpenAI, Anthropic, Azure OpenAI, Groq), Pinecone, container registriesModel inference, embeddings, image pulls
5432TCPPostgreSQL endpointDatabase traffic
6379TCPRedis endpointCache and streaming
5672TCPRabbitMQ endpointMessage queue
25 / 587 / 465TCPEmail providerSMTP
514TCP + UDPSyslog serverAudit log forwarding

4. Provider paths

Kindo ships a Terraform module, kindo-infra, that provisions the entire cloud-agnostic spec above on AWS. For other providers (GCP, Azure, on-prem, OpenShift, Rancher), follow the component requirements and wire the outputs into kindo-cli at install time — there is no pre-built Terraform module for those targets today.

AWS

Minimum configuration

Create a root Terraform module that consumes kindo-infra:

module "kindo_infra" {
source = "./kindo-infra"
# --- Required inputs ---
project = "mycompany"
environment = "production"
region = "us-west-2"
availability_zone_names = ["us-west-2a", "us-west-2b", "us-west-2c"]
vpc_cidr_base = "10.0.0.0/16"
cluster_name = "mycompany-kindo"
s3_uploads_bucket_name = "mycompany-kindo-uploads" # globally unique
s3_audit_logs_bucket_name = "mycompany-kindo-audit-logs" # globally unique
# --- Optional: t-shirt sizing profile ---
# Valid values: "custom" (default), "medium", "large".
tshirt_size = "medium"
# --- Optional: DNS / TLS ---
create_public_zone = true
base_domain = "kindo.mycompany.com"
create_wildcard_certificate = true
# --- Optional: production safety ---
termination_protection_enabled = true
postgres_deletion_protection = true
postgres_multi_az = true
# --- Database bootstrap ---
# Leave disabled (default); `kindo-cli` creates the per-service
# databases (`kindo`, `unleash`, `litellm`, `ssoready`, `hatchet`,
# `nango`) during install.
manage_postgres_dbs = false
}

T-shirt sizing

tshirt_size sets the EKS node group sizes and PostgreSQL instance class. custom gives you full control via the kindo_workers_* and postgres_* variables.

tshirt_sizeUse case
customYou specify every node group and DB override manually
mediumStandard production (up to ~200 users)
largeHigh-traffic production (200+ users)

What gets created

  • VPC with public, private, and database subnets across 3 AZs, NAT gateways, VPC endpoints for AWS services.
  • EKS cluster (cluster_version = 1.32+) with managed node groups (SPOT by default), optional sandbox node group, cluster autoscaler IAM, cert-manager and metrics-server EKS addons.
  • RDS PostgreSQL main instance (17.x) plus an auxiliary instance. Admin credentials written to AWS Secrets Manager.
  • ElastiCache Redis in standalone single-AZ mode (module defaults — Kindo requires no replicas, no Multi-AZ, no Cluster).
  • Amazon MQ (RabbitMQ 3.13) single-instance broker by default; set rabbitmq_deployment_mode = "CLUSTER_MULTI_AZ" for HA.
  • S3 buckets for uploads and audit logs, with SSE and strict access policies.
  • Route 53 public hosted zone and ACM wildcard certificate for *.base_domain.
  • KMS keys for encryption at rest.
  • Optional: Client VPN, syslog forwarder (EC2 ASG → CloudWatch/S3/Kinesis Firehose), SES domain, LiteLLM Bedrock IRSA role.

Provision:

Terminal window
export AWS_PROFILE=your-aws-profile
terraform init
terraform plan
terraform apply

Outputs you will need later

kindo-cli reads infrastructure outputs from the terraform output JSON during install. Capture them:

Terminal window
terraform output -json > infra-outputs.json

Key outputs include postgres_connection_string, postgres_auxiliary_connection_string, redis_connection_string, rabbitmq_connection_string, S3 bucket names, EKS cluster name, VPC ID, and subnet IDs. You do not consume these directly — the CLI does.

Verify before handing off to the CLI

Terminal window
# Cluster reachable
aws eks update-kubeconfig --name mycompany-kindo --region us-west-2
kubectl get nodes
# Data services provisioned
aws rds describe-db-instances --query 'DBInstances[*].DBInstanceIdentifier'
aws elasticache describe-replication-groups --query 'ReplicationGroups[*].ReplicationGroupId'
aws mq list-brokers --query 'BrokerSummaries[*].BrokerName'
aws s3 ls | grep mycompany-kindo

At this point you have infrastructure ready for kindo-cli — no Kindo application pods are running yet.

5. Consolidated prerequisites checklist

Work through the tables below and tick items off as you gather them. Every row is tagged with the team that typically owns it so the list can be split and distributed across departments — ownership is a suggestion, not a mandate. You are ready to proceed to the next section only once every box is checked.

Infrastructure

OwnerItem
PlatformDeployment size chosen (Medium / Large)
Cloud OpsKubernetes cluster target identified (1.32+, 3+ nodes, RBAC enabled, default StorageClass, LoadBalancer support)
DBA / Cloud OpsPostgreSQL plan in place (17.0+, managed service recommended, capacity for six databases: kindo, unleash, litellm, ssoready, hatchet, nango)
DBA / Cloud OpsRedis plan in place (7.0+, standalone mode only, sized per deployment tier)
DBA / Cloud OpsRabbitMQ plan in place (3.13+, management plugin enabled)
Cloud OpsS3-compatible object storage selected (AWS S3, GCS, Azure Blob, MinIO, or Ceph) with the kindo-uploads bucket planned
PlatformVector database selected — Qdrant (self-hosted or managed) or Pinecone (pod-based). If Pinecone, index dimension is pre-created to match your embedding model
NetworkingIngress controller chosen (NGINX, Traefik, or cloud provider equivalent)
Security / NetworkingTLS strategy decided (cert-manager, wildcard cert, or per-subdomain certs)
Security(Optional) Syslog server endpoint identified (TCP/UDP 514, RFC 3164)
PlatformEmail provider selected — must be SMTP-protocol-compatible (self-hosted SMTP, SES, SendGrid, Mailgun, or any other provider exposing SMTP credentials)
Cloud OpsGPU node plan written down if self-hosting AI models (see Prepare AI Models)

Tools to download (to the operator workstation — no cluster interaction yet)

OwnerItem
Platformpython 3.11+ (kindo-cli runtime)
Platformkindo-cli — latest release, provided by Kindo
Platformhelm 3.8.0+
Platformhelmfile 0.162.0+ (CLI wraps it)
Platformkubectl 1.32+
Platformyq — latest
Platformjq — latest
Platformpsql — latest (optional, needed for kindo db tunnel / prompt / reset after install)

Credentials and accounts

OwnerItem
SecurityKindo container registry credentials received from Kindo
Security / AppDevAt least one AI provider API key obtained (OpenAI, Anthropic, Azure OpenAI, Groq, or self-hosted vLLM)
SecurityVector DB credentials obtained — Qdrant endpoint + API key, or Pinecone API key
SecurityEmail provider credentials obtained
Networking(Optional) Syslog server credentials or allowlist entry confirmed
Security(Optional) Secrets management strategy decided (External Secrets Operator with AWS Secrets Manager, HashiCorp Vault, Google Secret Manager, or Azure Key Vault)

Network and DNS

OwnerItem
NetworkingParent domain confirmed (e.g. kindo.company.com)
NetworkingA or CNAME records planned for: app., api., sso., litellm., unleash., webhooks., and hatchet.
Networking / SecurityInbound firewall rule for 443/TCP (and 80/TCP redirect) approved
Networking / SecurityOutbound firewall rules approved for 443 (external APIs), 5432 (PostgreSQL), 6379 (Redis), 5672 (RabbitMQ), SMTP ports (25/587/465), and 514 (syslog)
NetworkingCNI type confirmed — flag to Kindo before the install call if it is not AWS VPC CNI (Calico, Cilium, custom — see the caution in Component requirements)
SecurityNetwork posture decided (open / private / restricted)

Identity & access

OwnerItem
Cloud OpsOperator IAM / service account identified — permissions to create Kubernetes Secrets, Deployments, Services, Ingresses, and ConfigMaps across all Kindo namespaces
SecurityOrganization-level guardrails reviewed — SCPs (AWS), org policies (GCP), management group policies (Azure) that might block IAM role creation, resource provisioning, or required tagging
Cloud OpsWorkload identity pattern chosen — IRSA (AWS), Workload Identity (GCP), or Azure Pod Identity — with trust relationships wired for services that need cloud access (e.g. LiteLLM → AWS Bedrock)
PlatformCluster RBAC verified — the operator has permission to create service accounts, role bindings, and cluster-scoped resources needed by the install

Security & compliance

OwnerItem
SecurityRegistry allowlist updated to include registry.kindo.ai (or the internal mirror, if air-gapped) — verify image pulls are not blocked by admission controllers
SecurityImage scanning requirements clarified — if your environment blocks unscanned images, coordinate with Kindo for SBOMs or a mirror-and-scan workflow
SecurityEncryption at rest confirmed on PostgreSQL, Redis, S3, and Kubernetes Secrets storage
SecuritySecrets handling policy agreed (rotation cadence, who has read access to kindo-secrets-config)

Install host

OwnerItem
PlatformInstall host identified — bastion, jump box, developer laptop, or CI runner — with network reachability to the cluster’s API server
PlatformInstall host has python 3.11+, kindo-cli, helm 3.8+, helmfile 0.162+, kubectl 1.32+, yq, and jq installed and on PATH (plus optional psql for kindo db)
Platformkubectl context for the target cluster is configured and tested (kubectl cluster-info, kubectl auth can-i create namespace --all-namespaces)
SecuritySession recording / audit requirements on the install host confirmed — many regulated envs require this for any privileged shell session

Approvals

OwnerItem
Platform ownerCapacity plan approved (sizing, GPU nodes, autoscaling limits)
Networking ownerDNS, TLS, firewall, and CNI/ingress plan approved
Security ownerSecrets storage, image source, IAM, and compliance plan approved
Compliance ownerEvidence collection and audit plan approved (if a compliance regime applies)

What comes next

You’ve finished the infrastructure layer. From here on, no more Terraform — kindo-cli takes over. The CLI reads your Terraform outputs (or hand-gathered connection strings), generates the centralized secrets, installs peripheries (External Secrets Operator, Unleash, OTel), and deploys the Kindo application stack (API, workers, LiteLLM, frontends) via Helm.

Two steps remain before you run kindo install:

  • Prepare your AI models. Decide whether you’re using cloud providers (OpenAI, Anthropic, Bedrock, Azure, Gemini), self-hosted vLLM, or a mix. The model endpoints must be reachable from the cluster before the CLI’s post-install step registers them. This can happen in parallel with the CLI setup.
  • Install the CLI and generate the install contract. kindo config init walks you through a 10-section wizard that writes install-contract.yaml and environment-bindings.yaml. kindo config validate --preflight then deploys a short-lived Job into the cluster to confirm every item on your checklist is actually reachable.