Deploying DeepHat

DeepHat is Kindo’s cybersecurity model built for offensive reasoning, long-context analysis, and secure execution. This guide covers setup and deployment of DeepHat V2 using vLLM.

Hardware Requirements

DeepHat V2 is verified to run and serve full context length (250k tokens) on:

1x B200 GPU
2x H100 GPUs

1x H100 can serve at most 90,000 tokens as the max context length due to KV-cache memory requirements.

Dependencies

vllm>=0.12.0 (the only explicitly required package; installing vLLM brings all implicit dependencies)

Environment Setup

A fine-grained access token provided by Kindo is necessary to pull model weights from HuggingFace:

export HF_TOKEN=<your_access_token>

Deployment Options

Option 1: Standalone vLLM

Run vLLM directly without Kubernetes.

2x H100:

vllm serve DeepHat/DeepHat-V2-ext \
  --served-model-name deephat-v2 \
  --port <port> \
  --max_model_len 250000 \
  --dtype bfloat16 \
  --tensor_parallel_size 2 \
  --tool_call_parser qwen3_coder \
  --enable-prefix-caching \
  --enable-chunked-prefill \
  --enable-auto-tool-choice

1x B200:

vllm serve DeepHat/DeepHat-V2-ext \
  --served-model-name deephat-v2 \
  --port <port> \
  --max_model_len 250000 \
  --dtype bfloat16 \
  --tensor_parallel_size 1 \
  --tool_call_parser qwen3_coder \
  --enable-prefix-caching \
  --enable-chunked-prefill \
  --enable-auto-tool-choice

Option 2: Kindo Helm Chart

This Helm chart is optional — if you already have LLM serving infrastructure, use that instead.

Create a values-deephat.yaml:

For 2x H100:

model: DeepHat/DeepHat-V2-ext
servedModelName: deephat-v2
maxModelLen: 250000
dtype: bfloat16
tensorParallelSize: "2"
enableChunkedPrefill: "true"
enablePrefixCaching: "true"
enableAutoToolChoice: "true"
toolCallParser: "qwen3_coder"

resources:
  limits:
    nvidia.com/gpu: 2
  requests:
    nvidia.com/gpu: 2

hfToken: "<your_huggingface_token>"
vllmApiKey: "<your_api_key>"

For 1x B200: Set tensorParallelSize: "1" and GPU resources to 1.

Deploy:

helm install deephat-v2 . -f values-deephat.yaml

Verifying the Deployment

curl -X POST "<server_base_url>:<port>/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deephat-v2",
    "stream": false,
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Who are you?"}
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

Next Steps

AI Model Deployment Guide — general guidance on deploying and verifying models for Kindo, including vLLM best practices and common pitfalls
Model Configuration — add DeepHat to Kindo’s global model set and configure Unleash feature flags