Plan & Provision Infrastructure
This guide covers everything you need to plan, bring, and provision before kindo-cli can install Kindo on top. Work through the five sections in order — pick a size, confirm each component meets spec, plan DNS and firewall, provision via the AWS Terraform helper or BYO on other providers, then tick every box on the team-scoped checklist before handing off to the CLI.
1. Pick a deployment size
Sizing drives everything downstream: node count, database class, RAM and vCPU targets for managed services, and the service-quota requests you need to file with your cloud provider. Pick the smallest tier that matches your expected active users — you can scale up later, but it is cheaper to start right.
Node counts are shown as the recommended max per node group. Two groups (primary + agents) run at this sizing, so totals are 2× per-group.
| Size | Use case | Kubernetes nodes (per group) | PostgreSQL (main / auxiliary) | Redis | RabbitMQ | Total cluster capacity |
|---|---|---|---|---|---|---|
| Medium | 50 to 200 active users | 6 nodes × 8 vCPU, 16 GB RAM (c6i.2xlarge) | main: 2 vCPU, 8 GB (db.m6i.large) / aux: 2 vCPU, 8 GB (db.t4g.large) | 3.09 GB (cache.t3.medium) | 2 vCPU, 8 GB RAM (mq.m5.large) | 96 vCPU, 192 GB RAM |
| Large | 200 to 1,000 active users | 10 nodes × 16 vCPU, 32 GB RAM (c6i.4xlarge) | main: 4 vCPU, 16 GB (db.m6i.xlarge) / aux: 2 vCPU, 8 GB (db.m6i.large) | 6.38 GB (cache.m6g.large) | 4 vCPU, 16 GB RAM (mq.m5.xlarge) | 320 vCPU, 640 GB RAM |
2. Component requirements
Kindo is a Helm-installed application. As long as the components below meet spec and are reachable from the cluster, the CLI install is cloud-agnostic. Every row is a hard requirement unless otherwise noted — if one is missing or misconfigured, the install will fail.
Kubernetes cluster
| Attribute | Requirement |
|---|---|
| Version | 1.32 or higher |
| Networking | CNI with NetworkPolicy support (Calico, Cilium, Flannel) |
| Storage | Dynamic volume provisioning, default StorageClass, ReadWriteOnce volumes |
| Ingress | An ingress controller installed (NGINX, Traefik, ALB Controller, or equivalent). Gateway API is also supported. |
| TLS | cert-manager installed, or certificates provisioned out-of-band |
| RBAC | Enabled (required) |
| Metrics | metrics-server installed (required for HPA and kubectl top) |
| Sizing | Medium: 6 nodes × 8 vCPU, 16 GB RAM (c6i.2xlarge), Large: 10 nodes × 16 vCPU, 32 GB RAM (c6i.4xlarge). Node counts are the recommended max per group; two groups (primary + agents) run at this sizing. |
GPU nodes are only required if you plan to self-host models. Labels nvidia.com/gpu=true and accelerator=<model> let Helm schedule model workloads onto them.
PostgreSQL
| Attribute | Requirement |
|---|---|
| Version | 17.0 or higher (17.4 recommended) |
| Storage | 100 GB SSD minimum; scale with data volume |
| Concurrent connections | 200+ |
| Encryption | At-rest and in-transit required for production |
| HA (production) | Streaming replication or managed service with automatic failover |
| Backups | Daily automated backups, 7+ day retention |
| Sizing (main) | Medium: 2 vCPU, 8 GB RAM (db.m6i.large), Large: 4 vCPU, 16 GB RAM (db.m6i.xlarge) |
| Sizing (auxiliary) | Medium: 2 vCPU, 8 GB RAM (db.t4g.large), Large: 2 vCPU, 8 GB RAM (db.m6i.large) |
Redis — standalone only
| Attribute | Requirement |
|---|---|
| Version | 7.0 or higher (7.2 recommended) |
| Concurrent connections | 100+ |
| Mode | Standalone (single primary, no shards) |
| Sizing | Medium: 3.09 GB (cache.t3.medium), Large: 6.38 GB (cache.m6g.large) |
Cloud provider compatibility:
- AWS ElastiCache — Cluster Mode Disabled with
num_node_groups = 1,replicas_per_node_group = 0. Engine version7.0+. Cluster Mode Enabled with 2+ shards is not supported. - Azure Cache for Redis — Basic or Standard tier (single-node, non-clustered). Premium/Enterprise clustered tiers with 2+ shards are not supported.
- Google Cloud Memorystore — Basic tier (standalone instance). Redis Cluster mode is not supported.
Verify your Redis mode before proceeding:
redis-cli INFO server | grep redis_mode# redis_mode:standalone -> compatible# redis_mode:sentinel -> not supported# redis_mode:cluster -> not supportedRabbitMQ
| Attribute | Requirement |
|---|---|
| Version | 3.13 or higher |
| Disk | 20 GB minimum across all sizes |
| Management plugin | Enabled |
| HA (production) | 3+ node cluster with quorum queues, or managed service (e.g., Amazon MQ) |
| Sizing | Medium: 2 vCPU, 8 GB RAM (mq.m5.large), Large: 4 vCPU, 16 GB RAM (mq.m5.xlarge) |
S3-compatible object storage
Any S3 API-compatible store: AWS S3, Google Cloud Storage (S3 compatibility), Azure Blob (S3 compatibility), MinIO, or Ceph.
| Bucket | Purpose | Access |
|---|---|---|
kindo-uploads | User file uploads | Private |
Provision the buckets (or equivalent naming) with server-side encryption and block-public-access enabled.
Vector database
Pick one:
- Qdrant (self-hosted or managed) — deploy into Kubernetes, or point at a managed Qdrant endpoint. Kindo creates the collection automatically on first call with the correct vector size for your embedding model — no manual setup required.
- Pinecone (managed) — pod-based index, cosine metric. Serverless Pinecone is not supported. The index dimension must be pre-created and must match the embedding model you configure in LiteLLM (e.g., 1536 for
text-embedding-3-small, 3072 fortext-embedding-3-large). Kindo does not create or resize the index for you.
TLS / certificates
Either (a) cert-manager configured against an ACME issuer (Let’s Encrypt) or a private CA, or (b) pre-issued wildcard certificate for *.kindo.example.com loaded as a Kubernetes Secret. kindo-cli references the certificate by name — it does not issue or rotate it for you.
Ingress
A running ingress controller in the cluster with:
- Class name known to you (e.g.,
nginx,alb,traefik) — you’ll pass it tokindo-cli. - Inbound 443 (HTTPS) open; 80 allowed for HTTP→HTTPS redirect.
- Outbound 443 to the public internet (or your private AI endpoints), plus access to PostgreSQL (5432), Redis (6379), RabbitMQ (5672), and syslog (514) as applicable. :::
Audit log sink
Syslog server supporting RFC3164 on TCP/UDP 514, reachable from the cluster. 1+ year retention recommended. Required for production compliance postures.
Any email service that supports the SMTP protocol — a self-hosted SMTP server, or managed providers like Amazon SES, SendGrid, or Mailgun that expose SMTP credentials. Used for transactional email (invites, password reset, alerts).
3. Plan DNS and firewall
DNS subdomains
Kindo provisions 12 ingress hostnames across a single load balancer. Pick a parent domain you control (examples use kindo.company.com) and create A or CNAME records for each subdomain through your ingress controller. A wildcard TLS certificate covering *.kindo.company.com is the simplest option; with AWS Route53 + external-dns, records are created automatically from pod annotations.
| Subdomain | Component | Example |
|---|---|---|
app. | Next.js frontend | app.kindo.company.com |
api. | Backend API (also receives webhooks via path rules) | api.kindo.company.com |
integrations-api. | Nango integration API | integrations-api.kindo.company.com |
integrations-connect. | Nango OAuth / Connect UI | integrations-connect.kindo.company.com |
sso. | SSOReady admin | sso.kindo.company.com |
sso-auth. | SSOReady auth endpoint | sso-auth.kindo.company.com |
sso-api. | SSOReady API | sso-api.kindo.company.com |
sso-app. | SSOReady user-facing app | sso-app.kindo.company.com |
hatchet. | Hatchet API (/api) and dashboard (/) on one host | hatchet.kindo.company.com |
litellm. | LiteLLM model proxy | litellm.kindo.company.com |
unleash. | Unleash feature flag service | unleash.kindo.company.com |
unleash-edge. | Unleash Edge (client-facing read cache) | unleash-edge.kindo.company.com |
Inbound
Open only the ingress entry points to the public internet (or to your corporate network, if Kindo is internal-only).
| Port | Protocol | Purpose |
|---|---|---|
| 443 | TCP | HTTPS — all user traffic |
| 80 | TCP | HTTP redirect to 443 (optional, recommended) |
Outbound
The cluster needs egress to reach managed infrastructure and external AI providers. If any of these paths are blocked, the install or runtime behaviour will break — check with your network team before you start provisioning.
| Port | Protocol | Destination | Purpose |
|---|---|---|---|
| 443 | TCP | AI providers (OpenAI, Anthropic, Azure OpenAI, Groq), Pinecone, container registries | Model inference, embeddings, image pulls |
| 5432 | TCP | PostgreSQL endpoint | Database traffic |
| 6379 | TCP | Redis endpoint | Cache and streaming |
| 5672 | TCP | RabbitMQ endpoint | Message queue |
| 25 / 587 / 465 | TCP | Email provider | SMTP |
| 514 | TCP + UDP | Syslog server | Audit log forwarding |
4. Provider paths
Kindo ships a Terraform module, kindo-infra, that provisions the entire cloud-agnostic spec above on AWS. For other providers (GCP, Azure, on-prem, OpenShift, Rancher), follow the component requirements and wire the outputs into kindo-cli at install time — there is no pre-built Terraform module for those targets today.
AWS
Minimum configuration
Create a root Terraform module that consumes kindo-infra:
module "kindo_infra" { source = "./kindo-infra"
# --- Required inputs --- project = "mycompany" environment = "production" region = "us-west-2" availability_zone_names = ["us-west-2a", "us-west-2b", "us-west-2c"] vpc_cidr_base = "10.0.0.0/16" cluster_name = "mycompany-kindo" s3_uploads_bucket_name = "mycompany-kindo-uploads" # globally unique s3_audit_logs_bucket_name = "mycompany-kindo-audit-logs" # globally unique
# --- Optional: t-shirt sizing profile --- # Valid values: "custom" (default), "medium", "large". tshirt_size = "medium"
# --- Optional: DNS / TLS --- create_public_zone = true base_domain = "kindo.mycompany.com" create_wildcard_certificate = true
# --- Optional: production safety --- termination_protection_enabled = true postgres_deletion_protection = true postgres_multi_az = true
# --- Database bootstrap --- # Leave disabled (default); `kindo-cli` creates the per-service # databases (`kindo`, `unleash`, `litellm`, `ssoready`, `hatchet`, # `nango`) during install. manage_postgres_dbs = false}T-shirt sizing
tshirt_size sets the EKS node group sizes and PostgreSQL instance class. custom gives you full control via the kindo_workers_* and postgres_* variables.
tshirt_size | Use case |
|---|---|
custom | You specify every node group and DB override manually |
medium | Standard production (up to ~200 users) |
large | High-traffic production (200+ users) |
What gets created
- VPC with public, private, and database subnets across 3 AZs, NAT gateways, VPC endpoints for AWS services.
- EKS cluster (
cluster_version = 1.32+) with managed node groups (SPOT by default), optional sandbox node group, cluster autoscaler IAM, cert-manager and metrics-server EKS addons. - RDS PostgreSQL main instance (17.x) plus an auxiliary instance. Admin credentials written to AWS Secrets Manager.
- ElastiCache Redis in standalone single-AZ mode (module defaults — Kindo requires no replicas, no Multi-AZ, no Cluster).
- Amazon MQ (RabbitMQ 3.13) single-instance broker by default; set
rabbitmq_deployment_mode = "CLUSTER_MULTI_AZ"for HA. - S3 buckets for uploads and audit logs, with SSE and strict access policies.
- Route 53 public hosted zone and ACM wildcard certificate for
*.base_domain. - KMS keys for encryption at rest.
- Optional: Client VPN, syslog forwarder (EC2 ASG → CloudWatch/S3/Kinesis Firehose), SES domain, LiteLLM Bedrock IRSA role.
Provision:
export AWS_PROFILE=your-aws-profileterraform initterraform planterraform applyOutputs you will need later
kindo-cli reads infrastructure outputs from the terraform output JSON during install. Capture them:
terraform output -json > infra-outputs.jsonKey outputs include postgres_connection_string, postgres_auxiliary_connection_string, redis_connection_string, rabbitmq_connection_string, S3 bucket names, EKS cluster name, VPC ID, and subnet IDs. You do not consume these directly — the CLI does.
Verify before handing off to the CLI
# Cluster reachableaws eks update-kubeconfig --name mycompany-kindo --region us-west-2kubectl get nodes
# Data services provisionedaws rds describe-db-instances --query 'DBInstances[*].DBInstanceIdentifier'aws elasticache describe-replication-groups --query 'ReplicationGroups[*].ReplicationGroupId'aws mq list-brokers --query 'BrokerSummaries[*].BrokerName'aws s3 ls | grep mycompany-kindoAt this point you have infrastructure ready for kindo-cli — no Kindo application pods are running yet.
5. Consolidated prerequisites checklist
Work through the tables below and tick items off as you gather them. Every row is tagged with the team that typically owns it so the list can be split and distributed across departments — ownership is a suggestion, not a mandate. You are ready to proceed to the next section only once every box is checked.
Infrastructure
| Owner | Item | |
|---|---|---|
| ☐ | Platform | Deployment size chosen (Medium / Large) |
| ☐ | Cloud Ops | Kubernetes cluster target identified (1.32+, 3+ nodes, RBAC enabled, default StorageClass, LoadBalancer support) |
| ☐ | DBA / Cloud Ops | PostgreSQL plan in place (17.0+, managed service recommended, capacity for six databases: kindo, unleash, litellm, ssoready, hatchet, nango) |
| ☐ | DBA / Cloud Ops | Redis plan in place (7.0+, standalone mode only, sized per deployment tier) |
| ☐ | DBA / Cloud Ops | RabbitMQ plan in place (3.13+, management plugin enabled) |
| ☐ | Cloud Ops | S3-compatible object storage selected (AWS S3, GCS, Azure Blob, MinIO, or Ceph) with the kindo-uploads bucket planned |
| ☐ | Platform | Vector database selected — Qdrant (self-hosted or managed) or Pinecone (pod-based). If Pinecone, index dimension is pre-created to match your embedding model |
| ☐ | Networking | Ingress controller chosen (NGINX, Traefik, or cloud provider equivalent) |
| ☐ | Security / Networking | TLS strategy decided (cert-manager, wildcard cert, or per-subdomain certs) |
| ☐ | Security | (Optional) Syslog server endpoint identified (TCP/UDP 514, RFC 3164) |
| ☐ | Platform | Email provider selected — must be SMTP-protocol-compatible (self-hosted SMTP, SES, SendGrid, Mailgun, or any other provider exposing SMTP credentials) |
| ☐ | Cloud Ops | GPU node plan written down if self-hosting AI models (see Prepare AI Models) |
Tools to download (to the operator workstation — no cluster interaction yet)
| Owner | Item | |
|---|---|---|
| ☐ | Platform | python 3.11+ (kindo-cli runtime) |
| ☐ | Platform | kindo-cli — latest release, provided by Kindo |
| ☐ | Platform | helm 3.8.0+ |
| ☐ | Platform | helmfile 0.162.0+ (CLI wraps it) |
| ☐ | Platform | kubectl 1.32+ |
| ☐ | Platform | yq — latest |
| ☐ | Platform | jq — latest |
| ☐ | Platform | psql — latest (optional, needed for kindo db tunnel / prompt / reset after install) |
Credentials and accounts
| Owner | Item | |
|---|---|---|
| ☐ | Security | Kindo container registry credentials received from Kindo |
| ☐ | Security / AppDev | At least one AI provider API key obtained (OpenAI, Anthropic, Azure OpenAI, Groq, or self-hosted vLLM) |
| ☐ | Security | Vector DB credentials obtained — Qdrant endpoint + API key, or Pinecone API key |
| ☐ | Security | Email provider credentials obtained |
| ☐ | Networking | (Optional) Syslog server credentials or allowlist entry confirmed |
| ☐ | Security | (Optional) Secrets management strategy decided (External Secrets Operator with AWS Secrets Manager, HashiCorp Vault, Google Secret Manager, or Azure Key Vault) |
Network and DNS
| Owner | Item | |
|---|---|---|
| ☐ | Networking | Parent domain confirmed (e.g. kindo.company.com) |
| ☐ | Networking | A or CNAME records planned for: app., api., sso., litellm., unleash., webhooks., and hatchet. |
| ☐ | Networking / Security | Inbound firewall rule for 443/TCP (and 80/TCP redirect) approved |
| ☐ | Networking / Security | Outbound firewall rules approved for 443 (external APIs), 5432 (PostgreSQL), 6379 (Redis), 5672 (RabbitMQ), SMTP ports (25/587/465), and 514 (syslog) |
| ☐ | Networking | CNI type confirmed — flag to Kindo before the install call if it is not AWS VPC CNI (Calico, Cilium, custom — see the caution in Component requirements) |
| ☐ | Security | Network posture decided (open / private / restricted) |
Identity & access
| Owner | Item | |
|---|---|---|
| ☐ | Cloud Ops | Operator IAM / service account identified — permissions to create Kubernetes Secrets, Deployments, Services, Ingresses, and ConfigMaps across all Kindo namespaces |
| ☐ | Security | Organization-level guardrails reviewed — SCPs (AWS), org policies (GCP), management group policies (Azure) that might block IAM role creation, resource provisioning, or required tagging |
| ☐ | Cloud Ops | Workload identity pattern chosen — IRSA (AWS), Workload Identity (GCP), or Azure Pod Identity — with trust relationships wired for services that need cloud access (e.g. LiteLLM → AWS Bedrock) |
| ☐ | Platform | Cluster RBAC verified — the operator has permission to create service accounts, role bindings, and cluster-scoped resources needed by the install |
Security & compliance
| Owner | Item | |
|---|---|---|
| ☐ | Security | Registry allowlist updated to include registry.kindo.ai (or the internal mirror, if air-gapped) — verify image pulls are not blocked by admission controllers |
| ☐ | Security | Image scanning requirements clarified — if your environment blocks unscanned images, coordinate with Kindo for SBOMs or a mirror-and-scan workflow |
| ☐ | Security | Encryption at rest confirmed on PostgreSQL, Redis, S3, and Kubernetes Secrets storage |
| ☐ | Security | Secrets handling policy agreed (rotation cadence, who has read access to kindo-secrets-config) |
Install host
| Owner | Item | |
|---|---|---|
| ☐ | Platform | Install host identified — bastion, jump box, developer laptop, or CI runner — with network reachability to the cluster’s API server |
| ☐ | Platform | Install host has python 3.11+, kindo-cli, helm 3.8+, helmfile 0.162+, kubectl 1.32+, yq, and jq installed and on PATH (plus optional psql for kindo db) |
| ☐ | Platform | kubectl context for the target cluster is configured and tested (kubectl cluster-info, kubectl auth can-i create namespace --all-namespaces) |
| ☐ | Security | Session recording / audit requirements on the install host confirmed — many regulated envs require this for any privileged shell session |
Approvals
| Owner | Item | |
|---|---|---|
| ☐ | Platform owner | Capacity plan approved (sizing, GPU nodes, autoscaling limits) |
| ☐ | Networking owner | DNS, TLS, firewall, and CNI/ingress plan approved |
| ☐ | Security owner | Secrets storage, image source, IAM, and compliance plan approved |
| ☐ | Compliance owner | Evidence collection and audit plan approved (if a compliance regime applies) |
What comes next
You’ve finished the infrastructure layer. From here on, no more Terraform — kindo-cli takes over. The CLI reads your Terraform outputs (or hand-gathered connection strings), generates the centralized secrets, installs peripheries (External Secrets Operator, Unleash, OTel), and deploys the Kindo application stack (API, workers, LiteLLM, frontends) via Helm.
Two steps remain before you run kindo install:
- Prepare your AI models. Decide whether you’re using cloud providers (OpenAI, Anthropic, Bedrock, Azure, Gemini), self-hosted vLLM, or a mix. The model endpoints must be reachable from the cluster before the CLI’s
post-installstep registers them. This can happen in parallel with the CLI setup. - Install the CLI and generate the install contract.
kindo config initwalks you through a 10-section wizard that writesinstall-contract.yamlandenvironment-bindings.yaml.kindo config validate --preflightthen deploys a short-lived Job into the cluster to confirm every item on your checklist is actually reachable.