Deploying DeepHat
DeepHat is Kindo’s cybersecurity model built for offensive reasoning, long-context analysis, and secure execution. This guide covers setup and deployment of DeepHat V2 using vLLM.
Hardware Requirements
DeepHat V2 is verified to run and serve full context length (250k tokens) on:
- 1x B200 GPU
- 2x H100 GPUs
1x H100 can serve at most 90,000 tokens as the max context length due to KV-cache memory requirements.
Dependencies
vllm>=0.12.0(the only explicitly required package; installing vLLM brings all implicit dependencies)
Environment Setup
A fine-grained access token provided by Kindo is necessary to pull model weights from HuggingFace:
export HF_TOKEN=<your_access_token>Deployment Options
Option 1: Standalone vLLM
Run vLLM directly without Kubernetes.
2x H100:
vllm serve DeepHat/DeepHat-V2-ext \ --served-model-name deephat-v2 \ --port <port> \ --max_model_len 250000 \ --dtype bfloat16 \ --tensor_parallel_size 2 \ --tool_call_parser qwen3_coder \ --enable-prefix-caching \ --enable-chunked-prefill \ --enable-auto-tool-choice1x B200:
vllm serve DeepHat/DeepHat-V2-ext \ --served-model-name deephat-v2 \ --port <port> \ --max_model_len 250000 \ --dtype bfloat16 \ --tensor_parallel_size 1 \ --tool_call_parser qwen3_coder \ --enable-prefix-caching \ --enable-chunked-prefill \ --enable-auto-tool-choiceOption 2: Kindo Helm Chart
This Helm chart is optional — if you already have LLM serving infrastructure, use that instead.
Create a values-deephat.yaml:
For 2x H100:
model: DeepHat/DeepHat-V2-extservedModelName: deephat-v2maxModelLen: 250000dtype: bfloat16tensorParallelSize: "2"enableChunkedPrefill: "true"enablePrefixCaching: "true"enableAutoToolChoice: "true"toolCallParser: "qwen3_coder"
resources: limits: nvidia.com/gpu: 2 requests: nvidia.com/gpu: 2
hfToken: "<your_huggingface_token>"vllmApiKey: "<your_api_key>"For 1x B200: Set tensorParallelSize: "1" and GPU resources to 1.
Deploy:
helm install deephat-v2 . -f values-deephat.yamlVerifying the Deployment
curl -X POST "<server_base_url>:<port>/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "deephat-v2", "stream": false, "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who are you?"} ], "max_tokens": 1024, "temperature": 0.7 }'Next Steps
- AI Model Deployment Guide — general guidance on deploying and verifying models for Kindo, including vLLM best practices and common pitfalls
- Model Configuration — add DeepHat to Kindo’s global model set and configure Unleash feature flags