Plan & Provision Infrastructure

This guide covers everything you need to plan, bring, and provision before kindo-cli can install Kindo on top. Work through the five sections in order — pick a size, confirm each component meets spec, plan DNS and firewall, provision via the AWS Terraform helper or BYO on other providers, then tick every box on the team-scoped checklist before handing off to the CLI.

This page is the manual, pre-install checklist. It lists everything your organization must procure, approve, or configure before the deployment call — and it is designed to be passed across networking, security, cloud-ops, and platform teams days or weeks in advance.

Kindo also ships an automated preflight — kindo config validate --preflight — that deploys a short-lived Job into the cluster and verifies runtime connectivity (PostgreSQL, Redis, RabbitMQ, S3, StorageClass, cert-manager, DNS, ingress). That runs during the install session itself; see Install with kindo-cli.

Relationship: the manual checklist catches procurement and approval gaps (missing IAM roles, SCPs blocking resources, firewall rules not yet requested). The automated preflight catches wiring and connectivity gaps (wrong connection string, unreachable endpoint, Redis not standalone). The preflight cannot run at all until enough of this checklist is done.

1. Pick a deployment size

Sizing drives everything downstream: node count, database class, RAM and vCPU targets for managed services, and the service-quota requests you need to file with your cloud provider. Pick the smallest tier that matches your expected active users — you can scale up later, but it is cheaper to start right.

Node counts are shown as the recommended max per node group. Two groups (primary + agents) run at this sizing, so totals are 2× per-group.

Size	Use case	Kubernetes nodes (per group)	PostgreSQL (main / auxiliary)	Redis	RabbitMQ	Total cluster capacity
Medium	50 to 200 active users	6 nodes × 8 vCPU, 16 GB RAM (c6i.2xlarge)	main: 2 vCPU, 8 GB (db.m6i.large) / aux: 2 vCPU, 8 GB (db.t4g.large)	3.09 GB (cache.t3.medium)	2 vCPU, 8 GB RAM (mq.m5.large)	96 vCPU, 192 GB RAM
Large	200 to 1,000 active users	10 nodes × 16 vCPU, 32 GB RAM (c6i.4xlarge)	main: 4 vCPU, 16 GB (db.m6i.xlarge) / aux: 2 vCPU, 8 GB (db.m6i.large)	6.38 GB (cache.m6g.large)	4 vCPU, 16 GB RAM (mq.m5.xlarge)	320 vCPU, 640 GB RAM

2. Component requirements

Kindo is a Helm-installed application. As long as the components below meet spec and are reachable from the cluster, the CLI install is cloud-agnostic. Every row is a hard requirement unless otherwise noted — if one is missing or misconfigured, the install will fail.

Kubernetes cluster

Attribute	Requirement
Version	1.32 or higher
Networking	CNI with NetworkPolicy support (Calico, Cilium, Flannel)
Storage	Dynamic volume provisioning, default StorageClass, ReadWriteOnce volumes
Ingress	An ingress controller installed (NGINX, Traefik, ALB Controller, or equivalent). Gateway API is also supported.
TLS	cert-manager installed, or certificates provisioned out-of-band
RBAC	Enabled (required)
Metrics	`metrics-server` installed (required for HPA and `kubectl top`)
Sizing	Medium: 6 nodes × 8 vCPU, 16 GB RAM (c6i.2xlarge), Large: 10 nodes × 16 vCPU, 32 GB RAM (c6i.4xlarge). Node counts are the recommended max per group; two groups (primary + agents) run at this sizing.

GPU nodes are only required if you plan to self-host models. Labels nvidia.com/gpu=true and accelerator=<model> let Helm schedule model workloads onto them.

PostgreSQL

Attribute	Requirement
Version	17.0 or higher (17.4 recommended)
Storage	100 GB SSD minimum; scale with data volume
Concurrent connections	200+
Encryption	At-rest and in-transit required for production
HA (production)	Streaming replication or managed service with automatic failover
Backups	Daily automated backups, 7+ day retention
Sizing (main)	Medium: 2 vCPU, 8 GB RAM (db.m6i.large), Large: 4 vCPU, 16 GB RAM (db.m6i.xlarge)
Sizing (auxiliary)	Medium: 2 vCPU, 8 GB RAM (db.t4g.large), Large: 2 vCPU, 8 GB RAM (db.m6i.large)

Kindo suggests provisioning two PostgreSQL engines rather than stacking every service onto one:

Main — hosts the kindo application database. This is the hot path for user traffic, agents, and conversations; it gets the larger instance class and the strictest backup / HA posture.
Auxiliary — hosts the third-party service databases: unleash (feature flags), litellm (model proxy), ssoready (SSO), hatchet (workflow engine), and nango (integration auth). These have lower throughput and can tolerate a smaller instance.

Splitting them keeps a heavy load on one service (for example, a Hatchet backlog or a LiteLLM spike) from contending with the main Kindo database. If you are running a managed service like RDS, provision two instances; if you are running self-managed PostgreSQL, run two engines on separate nodes. kindo-cli creates the per-service databases on each engine during install — you do not need to pre-create them.

Redis — standalone only

Attribute	Requirement
Version	7.0 or higher (7.2 recommended)
Concurrent connections	100+
Mode	Standalone (single primary, no shards)
Sizing	Medium: 3.09 GB (cache.t3.medium), Large: 6.38 GB (cache.m6g.large)

Cloud provider compatibility:

AWS ElastiCache — Cluster Mode Disabled with num_node_groups = 1, replicas_per_node_group = 0. Engine version 7.0+. Cluster Mode Enabled with 2+ shards is not supported.
Azure Cache for Redis — Basic or Standard tier (single-node, non-clustered). Premium/Enterprise clustered tiers with 2+ shards are not supported.
Google Cloud Memorystore — Basic tier (standalone instance). Redis Cluster mode is not supported.

Verify your Redis mode before proceeding:

redis-cli INFO server | grep redis_mode
# redis_mode:standalone  -> compatible
# redis_mode:sentinel    -> not supported
# redis_mode:cluster     -> not supported

RabbitMQ

Attribute	Requirement
Version	3.13 or higher
Disk	20 GB minimum across all sizes
Management plugin	Enabled
HA (production)	3+ node cluster with quorum queues, or managed service (e.g., Amazon MQ)
Sizing	Medium: 2 vCPU, 8 GB RAM (mq.m5.large), Large: 4 vCPU, 16 GB RAM (mq.m5.xlarge)

S3-compatible object storage

Any S3 API-compatible store: AWS S3, Google Cloud Storage (S3 compatibility), Azure Blob (S3 compatibility), MinIO, or Ceph.

Bucket	Purpose	Access
`kindo-uploads`	User file uploads	Private

Provision the buckets (or equivalent naming) with server-side encryption and block-public-access enabled.

Vector database

Pick one:

Qdrant (self-hosted or managed) — deploy into Kubernetes, or point at a managed Qdrant endpoint. Kindo creates the collection automatically on first call with the correct vector size for your embedding model — no manual setup required.
Pinecone (managed) — pod-based index, cosine metric. Serverless Pinecone is not supported. The index dimension must be pre-created and must match the embedding model you configure in LiteLLM (e.g., 1536 for text-embedding-3-small, 3072 for text-embedding-3-large). Kindo does not create or resize the index for you.

TLS / certificates

Either (a) cert-manager configured against an ACME issuer (Let’s Encrypt) or a private CA, or (b) pre-issued wildcard certificate for *.kindo.example.com loaded as a Kubernetes Secret. kindo-cli references the certificate by name — it does not issue or rotate it for you.

Ingress

A running ingress controller in the cluster with:

Class name known to you (e.g., nginx, alb, traefik) — you’ll pass it to kindo-cli.
Inbound 443 (HTTPS) open; 80 allowed for HTTP→HTTPS redirect.
Outbound 443 to the public internet (or your private AI endpoints), plus access to PostgreSQL (5432), Redis (6379), RabbitMQ (5672), and syslog (514) as applicable. :::

Audit log sink

Syslog server supporting RFC3164 on TCP/UDP 514, reachable from the cluster. 1+ year retention recommended. Required for production compliance postures.

Email

Any email service that supports the SMTP protocol — a self-hosted SMTP server, or managed providers like Amazon SES, SendGrid, or Mailgun that expose SMTP credentials. Used for transactional email (invites, password reset, alerts).

3. Plan DNS and firewall

DNS subdomains

Kindo provisions 12 ingress hostnames across a single load balancer. Pick a parent domain you control (examples use kindo.company.com) and create A or CNAME records for each subdomain through your ingress controller. A wildcard TLS certificate covering *.kindo.company.com is the simplest option; with AWS Route53 + external-dns, records are created automatically from pod annotations.

Subdomain	Component	Example
`app.`	Next.js frontend	`app.kindo.company.com`
`api.`	Backend API (also receives webhooks via path rules)	`api.kindo.company.com`
`integrations-api.`	Nango integration API	`integrations-api.kindo.company.com`
`integrations-connect.`	Nango OAuth / Connect UI	`integrations-connect.kindo.company.com`
`sso.`	SSOReady admin	`sso.kindo.company.com`
`sso-auth.`	SSOReady auth endpoint	`sso-auth.kindo.company.com`
`sso-api.`	SSOReady API	`sso-api.kindo.company.com`
`sso-app.`	SSOReady user-facing app	`sso-app.kindo.company.com`
`hatchet.`	Hatchet API (`/api`) and dashboard (`/`) on one host	`hatchet.kindo.company.com`
`litellm.`	LiteLLM model proxy	`litellm.kindo.company.com`
`unleash.`	Unleash feature flag service	`unleash.kindo.company.com`
`unleash-edge.`	Unleash Edge (client-facing read cache)	`unleash-edge.kindo.company.com`

Inbound webhooks default to api.<domain>

Webhook endpoints — including the direct workflow-trigger URLs the Workflow Builder generates for end users — are served under paths on api.<domain> by default. A workflow trigger URL looks like:

https://api.<domain>/webhook/agent/<workflowId>/<token>

The frontend builds these URLs from the NEXT_PUBLIC_WEBHOOK_BASE_URL env var on the Next.js app, which kindo-cli auto-derives from your API domain at install time — you do not need to set it manually. If you want webhooks on a different hostname (for example, a dedicated webhooks.<domain> behind its own ingress rules so you can scope /webhook/* more permissively than the rest of the API), point NEXT_PUBLIC_WEBHOOK_BASE_URL at that hostname and create a matching DNS record and ingress route.

Inbound

Open only the ingress entry points to the public internet (or to your corporate network, if Kindo is internal-only).

Port	Protocol	Purpose
443	TCP	HTTPS — all user traffic
80	TCP	HTTP redirect to 443 (optional, recommended)

Outbound

The cluster needs egress to reach managed infrastructure and external AI providers. If any of these paths are blocked, the install or runtime behaviour will break — check with your network team before you start provisioning.

Port	Protocol	Destination	Purpose
443	TCP	AI providers (OpenAI, Anthropic, Azure OpenAI, Groq), Pinecone, container registries	Model inference, embeddings, image pulls
5432	TCP	PostgreSQL endpoint	Database traffic
6379	TCP	Redis endpoint	Cache and streaming
5672	TCP	RabbitMQ endpoint	Message queue
25 / 587 / 465	TCP	Email provider	SMTP
514	TCP + UDP	Syslog server	Audit log forwarding

4. Provider paths

Kindo ships a Terraform module, kindo-infra, that provisions the entire cloud-agnostic spec above on AWS. For other providers (GCP, Azure, on-prem, OpenShift, Rancher), follow the component requirements and wire the outputs into kindo-cli at install time — there is no pre-built Terraform module for those targets today.

AWS

Minimum configuration

Create a root Terraform module that consumes kindo-infra:

module "kindo_infra" {
  source = "./kindo-infra"

  # --- Required inputs ---
  project                 = "mycompany"
  environment             = "production"
  region                  = "us-west-2"
  availability_zone_names = ["us-west-2a", "us-west-2b", "us-west-2c"]
  vpc_cidr_base           = "10.0.0.0/16"
  cluster_name            = "mycompany-kindo"
  s3_uploads_bucket_name    = "mycompany-kindo-uploads"    # globally unique
  s3_audit_logs_bucket_name = "mycompany-kindo-audit-logs" # globally unique

  # --- Optional: t-shirt sizing profile ---
  # Valid values: "custom" (default), "medium", "large".
  tshirt_size = "medium"

  # --- Optional: DNS / TLS ---
  create_public_zone          = true
  base_domain                 = "kindo.mycompany.com"
  create_wildcard_certificate = true

  # --- Optional: production safety ---
  termination_protection_enabled = true
  postgres_deletion_protection   = true
  postgres_multi_az              = true

  # --- Database bootstrap ---
  # Leave disabled (default); `kindo-cli` creates the per-service
  # databases (`kindo`, `unleash`, `litellm`, `ssoready`, `hatchet`,
  # `nango`) during install.
  manage_postgres_dbs = false
}

T-shirt sizing

tshirt_size sets the EKS node group sizes and PostgreSQL instance class. custom gives you full control via the kindo_workers_* and postgres_* variables.

`tshirt_size`	Use case
`custom`	You specify every node group and DB override manually
`medium`	Standard production (up to ~200 users)
`large`	High-traffic production (200+ users)

What gets created

VPC with public, private, and database subnets across 3 AZs, NAT gateways, VPC endpoints for AWS services.
EKS cluster (cluster_version = 1.32+) with managed node groups (SPOT by default), optional sandbox node group, cluster autoscaler IAM, cert-manager and metrics-server EKS addons.
RDS PostgreSQL main instance (17.x) plus an auxiliary instance. Admin credentials written to AWS Secrets Manager.
ElastiCache Redis in standalone single-AZ mode (module defaults — Kindo requires no replicas, no Multi-AZ, no Cluster).
Amazon MQ (RabbitMQ 3.13) single-instance broker by default; set rabbitmq_deployment_mode = "CLUSTER_MULTI_AZ" for HA.
S3 buckets for uploads and audit logs, with SSE and strict access policies.
Route 53 public hosted zone and ACM wildcard certificate for *.base_domain.
KMS keys for encryption at rest.
Optional: Client VPN, syslog forwarder (EC2 ASG → CloudWatch/S3/Kinesis Firehose), SES domain, LiteLLM Bedrock IRSA role.

Provision:

export AWS_PROFILE=your-aws-profile
terraform init
terraform plan
terraform apply

Outputs you will need later

kindo-cli reads infrastructure outputs from the terraform output JSON during install. Capture them:

terraform output -json > infra-outputs.json

Key outputs include postgres_connection_string, postgres_auxiliary_connection_string, redis_connection_string, rabbitmq_connection_string, S3 bucket names, EKS cluster name, VPC ID, and subnet IDs. You do not consume these directly — the CLI does.

Verify before handing off to the CLI

# Cluster reachable
aws eks update-kubeconfig --name mycompany-kindo --region us-west-2
kubectl get nodes

# Data services provisioned
aws rds describe-db-instances --query 'DBInstances[*].DBInstanceIdentifier'
aws elasticache describe-replication-groups --query 'ReplicationGroups[*].ReplicationGroupId'
aws mq list-brokers --query 'BrokerSummaries[*].BrokerName'
aws s3 ls | grep mycompany-kindo

At this point you have infrastructure ready for kindo-cli — no Kindo application pods are running yet.

5. Consolidated prerequisites checklist

Work through the tables below and tick items off as you gather them. Every row is tagged with the team that typically owns it so the list can be split and distributed across departments — ownership is a suggestion, not a mandate. You are ready to proceed to the next section only once every box is checked.

Infrastructure

	Owner	Item
☐	Platform	Deployment size chosen (Medium / Large)
☐	Cloud Ops	Kubernetes cluster target identified (1.32+, 3+ nodes, RBAC enabled, default StorageClass, LoadBalancer support)
☐	DBA / Cloud Ops	PostgreSQL plan in place (17.0+, managed service recommended, capacity for six databases: `kindo`, `unleash`, `litellm`, `ssoready`, `hatchet`, `nango`)
☐	DBA / Cloud Ops	Redis plan in place (7.0+, standalone mode only, sized per deployment tier)
☐	DBA / Cloud Ops	RabbitMQ plan in place (3.13+, management plugin enabled)
☐	Cloud Ops	S3-compatible object storage selected (AWS S3, GCS, Azure Blob, MinIO, or Ceph) with the `kindo-uploads` bucket planned
☐	Platform	Vector database selected — Qdrant (self-hosted or managed) or Pinecone (pod-based). If Pinecone, index dimension is pre-created to match your embedding model
☐	Networking	Ingress controller chosen (NGINX, Traefik, or cloud provider equivalent)
☐	Security / Networking	TLS strategy decided (cert-manager, wildcard cert, or per-subdomain certs)
☐	Security	(Optional) Syslog server endpoint identified (TCP/UDP 514, RFC 3164)
☐	Platform	Email provider selected — must be SMTP-protocol-compatible (self-hosted SMTP, SES, SendGrid, Mailgun, or any other provider exposing SMTP credentials)
☐	Cloud Ops	GPU node plan written down if self-hosting AI models (see Prepare AI Models)

Tools to download (to the operator workstation — no cluster interaction yet)

	Owner	Item
☐	Platform	`python` 3.11+ (kindo-cli runtime)
☐	Platform	`kindo-cli` — latest release, provided by Kindo
☐	Platform	`helm` 3.8.0+
☐	Platform	`helmfile` 0.162.0+ (CLI wraps it)
☐	Platform	`kubectl` 1.32+
☐	Platform	`yq` — latest
☐	Platform	`jq` — latest
☐	Platform	`psql` — latest (optional, needed for `kindo db tunnel / prompt / reset` after install)

Credentials and accounts

	Owner	Item
☐	Security	Kindo container registry credentials received from Kindo
☐	Security / AppDev	At least one AI provider API key obtained (OpenAI, Anthropic, Azure OpenAI, Groq, or self-hosted vLLM)
☐	Security	Vector DB credentials obtained — Qdrant endpoint + API key, or Pinecone API key
☐	Security	Email provider credentials obtained
☐	Networking	(Optional) Syslog server credentials or allowlist entry confirmed
☐	Security	(Optional) Secrets management strategy decided (External Secrets Operator with AWS Secrets Manager, HashiCorp Vault, Google Secret Manager, or Azure Key Vault)

Network and DNS

	Owner	Item
☐	Networking	Parent domain confirmed (e.g. `kindo.company.com`)
☐	Networking	A or CNAME records planned for: `app.`, `api.`, `sso.`, `litellm.`, `unleash.`, `webhooks.`, and `hatchet.`
☐	Networking / Security	Inbound firewall rule for 443/TCP (and 80/TCP redirect) approved
☐	Networking / Security	Outbound firewall rules approved for 443 (external APIs), 5432 (PostgreSQL), 6379 (Redis), 5672 (RabbitMQ), SMTP ports (25/587/465), and 514 (syslog)
☐	Networking	CNI type confirmed — flag to Kindo before the install call if it is not AWS VPC CNI (Calico, Cilium, custom — see the caution in Component requirements)
☐	Security	Network posture decided (open / private / restricted)

Identity & access

	Owner	Item
☐	Cloud Ops	Operator IAM / service account identified — permissions to create Kubernetes Secrets, Deployments, Services, Ingresses, and ConfigMaps across all Kindo namespaces
☐	Security	Organization-level guardrails reviewed — SCPs (AWS), org policies (GCP), management group policies (Azure) that might block IAM role creation, resource provisioning, or required tagging
☐	Cloud Ops	Workload identity pattern chosen — IRSA (AWS), Workload Identity (GCP), or Azure Pod Identity — with trust relationships wired for services that need cloud access (e.g. LiteLLM → AWS Bedrock)
☐	Platform	Cluster RBAC verified — the operator has permission to create service accounts, role bindings, and cluster-scoped resources needed by the install

Security & compliance

	Owner	Item
☐	Security	Registry allowlist updated to include `registry.kindo.ai` (or the internal mirror, if air-gapped) — verify image pulls are not blocked by admission controllers
☐	Security	Image scanning requirements clarified — if your environment blocks unscanned images, coordinate with Kindo for SBOMs or a mirror-and-scan workflow
☐	Security	Encryption at rest confirmed on PostgreSQL, Redis, S3, and Kubernetes Secrets storage
☐	Security	Secrets handling policy agreed (rotation cadence, who has read access to `kindo-secrets-config`)

Install host

	Owner	Item
☐	Platform	Install host identified — bastion, jump box, developer laptop, or CI runner — with network reachability to the cluster’s API server
☐	Platform	Install host has `python` 3.11+, `kindo-cli`, `helm` 3.8+, `helmfile` 0.162+, `kubectl` 1.32+, `yq`, and `jq` installed and on `PATH` (plus optional `psql` for `kindo db`)
☐	Platform	`kubectl` context for the target cluster is configured and tested (`kubectl cluster-info`, `kubectl auth can-i create namespace --all-namespaces`)
☐	Security	Session recording / audit requirements on the install host confirmed — many regulated envs require this for any privileged shell session

Approvals

	Owner	Item
☐	Platform owner	Capacity plan approved (sizing, GPU nodes, autoscaling limits)
☐	Networking owner	DNS, TLS, firewall, and CNI/ingress plan approved
☐	Security owner	Secrets storage, image source, IAM, and compliance plan approved
☐	Compliance owner	Evidence collection and audit plan approved (if a compliance regime applies)

What comes next

You’ve finished the infrastructure layer. From here on, no more Terraform — kindo-cli takes over. The CLI reads your Terraform outputs (or hand-gathered connection strings), generates the centralized secrets, installs peripheries (External Secrets Operator, Unleash, OTel), and deploys the Kindo application stack (API, workers, LiteLLM, frontends) via Helm.

Two steps remain before you run kindo install:

Prepare your AI models. Decide whether you’re using cloud providers (OpenAI, Anthropic, Bedrock, Azure, Gemini), self-hosted vLLM, or a mix. The model endpoints must be reachable from the cluster before the CLI’s post-install step registers them. This can happen in parallel with the CLI setup.
Install the CLI and generate the install contract. kindo config init walks you through a 10-section wizard that writes install-contract.yaml and environment-bindings.yaml. kindo config validate --preflight then deploys a short-lived Job into the cluster to confirm every item on your checklist is actually reachable.

Prepare AI Models Cloud API providers, self-hosted vLLM, and DeepHat — how models plug into the install contract.

Install with kindo-cli Config wizard, preflight, plan, and apply the install steps.