Skip to content

Deploying DeepHat

DeepHat is Kindo’s cybersecurity model built for offensive reasoning, long-context analysis, and secure execution. This guide covers setup and deployment of DeepHat V2 using vLLM.

Hardware Requirements

DeepHat V2 is verified to run and serve full context length (250k tokens) on:

  • 1x B200 GPU
  • 2x H100 GPUs

1x H100 can serve at most 90,000 tokens as the max context length due to KV-cache memory requirements.

Dependencies

  • vllm>=0.12.0 (the only explicitly required package; installing vLLM brings all implicit dependencies)

Environment Setup

A fine-grained access token provided by Kindo is necessary to pull model weights from HuggingFace:

Terminal window
export HF_TOKEN=<your_access_token>

Deployment Options

Option 1: Standalone vLLM

Run vLLM directly without Kubernetes.

2x H100:

Terminal window
vllm serve DeepHat/DeepHat-V2-ext \
--served-model-name deephat-v2 \
--port <port> \
--max_model_len 250000 \
--dtype bfloat16 \
--tensor_parallel_size 2 \
--tool_call_parser qwen3_coder \
--enable-prefix-caching \
--enable-chunked-prefill \
--enable-auto-tool-choice

1x B200:

Terminal window
vllm serve DeepHat/DeepHat-V2-ext \
--served-model-name deephat-v2 \
--port <port> \
--max_model_len 250000 \
--dtype bfloat16 \
--tensor_parallel_size 1 \
--tool_call_parser qwen3_coder \
--enable-prefix-caching \
--enable-chunked-prefill \
--enable-auto-tool-choice

Option 2: Kindo Helm Chart

This Helm chart is optional — if you already have LLM serving infrastructure, use that instead.

Create a values-deephat.yaml:

For 2x H100:

model: DeepHat/DeepHat-V2-ext
servedModelName: deephat-v2
maxModelLen: 250000
dtype: bfloat16
tensorParallelSize: "2"
enableChunkedPrefill: "true"
enablePrefixCaching: "true"
enableAutoToolChoice: "true"
toolCallParser: "qwen3_coder"
resources:
limits:
nvidia.com/gpu: 2
requests:
nvidia.com/gpu: 2
hfToken: "<your_huggingface_token>"
vllmApiKey: "<your_api_key>"

For 1x B200: Set tensorParallelSize: "1" and GPU resources to 1.

Deploy:

Terminal window
helm install deephat-v2 . -f values-deephat.yaml

Verifying the Deployment

Terminal window
curl -X POST "<server_base_url>:<port>/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "deephat-v2",
"stream": false,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"}
],
"max_tokens": 1024,
"temperature": 0.7
}'

Next Steps

  • AI Model Deployment Guide — general guidance on deploying and verifying models for Kindo, including vLLM best practices and common pitfalls
  • Model Configuration — add DeepHat to Kindo’s global model set and configure Unleash feature flags