KUBEINTELLECT
AI DEVOPS ENGINEER FOR KUBERNETES
Diagnose CrashLoopBackOff, pending pods, and RBAC issues in plain English. Parallel specialist agents investigate your cluster — a human-approval gate waits before any write operation runs.
What it does¶
-
:material-kubernetes: Kubernetes Intelligence
Runs
kubectlacross get, describe, logs, top, events, apply, scale, and delete. Routes complex diagnostics to four parallel specialist subagents (pod, metrics, logs, events) and synthesises findings into a single root-cause report. -
:material-chart-line: Metrics + Logs
Native Prometheus PromQL and Loki LogQL integration. The coordinator automatically delegates to the right data source — you ask in plain English, it picks the tool.
-
:material-shield-check: Safety Gates
Every destructive operation pauses for human approval before kubectl is called. Four role tiers (superadmin / admin / operator / readonly) limit what each API key can request. Shell injection, secret/serviceaccount access, and writes to infrastructure namespaces are blocked before any subprocess runs.
-
:material-brain: Stateful Conversations
Sessions are checkpointed in PostgreSQL or SQLite. Ask follow-up questions, approve a pending action hours later, or replay a session post-mortem — all in the same thread.
See it in action¶
How it works¶
You (kq CLI or any OpenAI-compatible client)
│ POST /v1/chat/completions (SSE streaming)
▼
┌──────────────────────────────────────────────────────────────────┐
│ LangGraph workflow │
│ │
│ memory_loader → context_fetcher → coordinator │
│ (DB context) (live snapshot + (LLM + tools) │
│ playbook match) │
│ │ │
│ ┌──────────────────────────────────┴──────────┐ │
│ ▼ TARGETED ▼ RCA_REQUIRED ▼ direct │
│ targeted_investigator subagent_executor × 4 answer →END │
│ (3 parallel reads) (pod | metrics | │
│ │ logs | events, │
│ │ parallel fan-out) │
│ ▼ │ │
│ coordinator ← findings — coordinator (synthesis) → END │
│ │
│ Tools: run_kubectl │ query_prometheus (PromQL) │ query_loki │
│ HITL interrupt fires on every destructive kubectl verb. │
└──────────────────────────────────────────────────────────────────┘
│
▼
LangGraph checkpoint store (Postgres / SQLite)
+ rca_outcomes / failure_patterns (reflexion subsystem)
Each turn passes through five additive agent behaviors — error interpretation, snapshot bias, parallel discipline, playbook injection, and a visible investigation plan. Verified outcomes feed the reflexion subsystem, which promotes recurring fixes back into future prompts (cluster-scoped, with cooldown and decay).
Responses stream back as Server-Sent Events. The API is OpenAI-compatible — point any
SSE client or your own tooling at /v1/chat/completions.
Pick your path¶
-
:material-lightning-bolt: Quickest — no cluster
pip install kubeintellect+ SQLite. Explore the API and CLI in minutes, no Kubernetes needed. -
:material-server-network: Existing cluster
Connect KubeIntellect to any cluster you already have — AKS, EKS, GKE, or any kubeconfig.
-
:material-docker: Docker Compose
Full local stack with PostgreSQL, optional Prometheus + Grafana + Loki, and optional Langfuse LLM tracing.
-
:material-cloud-upload: Production (Helm)
Helm chart for AKS / EKS / GKE. Includes RBAC, secrets management, ingress, and resource limits.
Quick install¶
Supported LLM providers¶
-
:material-brain: OpenAI
GPT-4o (coordinator) + GPT-4o-mini (subagents). Set
LLM_PROVIDER=openai. -
:material-microsoft-azure: Azure OpenAI
Any Azure-hosted deployment. Default
AZURE_OPENAI_API_VERSION=2024-10-01-previewenables automatic prefix caching. SetLLM_PROVIDER=azure.
See Configuration → LLM provider for the full list of model and deployment variables.