Skip to content

KubeIntellect — Architecture

KubeIntellect is an AI-powered Kubernetes management platform built on a LangGraph multi-agent system. Users interact via natural language; a supervisor LLM routes requests to specialized worker agents that execute Kubernetes operations via the official Python client, with mandatory human-in-the-loop approval for all write operations.


System Layers

Layer Role
User Interaction Receives natural language queries; presents human-readable results (LibreChat UI, CLI, MCP server)
Query Processing LLM interprets user intent; detects out-of-scope queries and handles them inline
Task Orchestration LangGraph StateGraph routes tasks to the appropriate specialized agent
Agent Execution 14 domain-specific agents execute Kubernetes operations as ReAct loops
Kubernetes Interaction Kubernetes Python client — read and write operations against the cluster API
Persistence PostgreSQL (checkpoints, context, tool registry, audit log) + MongoDB (chat history) + PVC (generated tool files)
Security & Governance Kubernetes RBAC via Helm, AST sandbox for generated code, SHA-256 tool integrity, audit log
Observability Langfuse (LLM traces), Prometheus + Grafana (metrics), Loki + Promtail (logs)

Request Flow

User query
  → LibreChat UI (POST /v1/chat/completions)
  → Memory Orchestrator (load reflections + failure hints + user prefs + registered tools)
  → Supervisor LLM (LangGraph StateGraph routing)
  → Specialized agent(s) (ReAct loops → Kubernetes API)
  → [HITL gate if write operation]
  → Streaming SSE response

Full Architecture Diagram

---
config:
  flowchart:
    curve: basis
---
graph TD

User(["👤 User"]):::ext
K8S(["⎈ Kubernetes Cluster"]):::k8s

subgraph CORE["Core System"]
  direction TB
  UIL["🖥️ User Interaction Layer\nLibreChat · REST /chat/completions"]:::layer
  QPM["🔍 Query Processing\nLLM scope filter · OOS rejection · clarification"]:::layer

  subgraph ORCH["Task Orchestration Layer"]
    direction TB
    MO["🧠 Memory Orchestrator  ≤550 tokens pinned\n① Reflections  ② Failure Hints  ③ User Prefs  ④ Registered Tools"]:::mem
    SUP{{"🎛️ Supervisor LLM\nLangGraph StateGraph routing"}}:::sup
    HITL["🔒 HITL Gates  interrupt_before\nCodeGenerator · Apply"]:::hitl
  end

  subgraph AGENTS["Agent & Tool Execution Layer  (ReAct loops)"]
    direction LR
    subgraph READ["Read / Inspect"]
      A1["Logs"]:::agent
      A2["ConfigMapsSecrets"]:::agent
      A3["RBAC"]:::agent
      A4["Metrics"]:::agent
      A5["Security"]:::agent
    end
    subgraph WRITE["Write / Exec"]
      A6["Lifecycle"]:::agent
      A7["Execution"]:::agent
      A8["Deletion"]:::agent
      A9["Infrastructure"]:::agent
      A10["Apply"]:::agent
    end
    subgraph DYN["Dynamic Tools"]
      A11["DynamicToolsExecutor"]:::dyn
      A12["CodeGenerator\ngenerate→test→register"]:::codegen
    end
    subgraph DIAG["DiagnosticsOrchestrator  LangGraph Send API"]
      DO["Dispatch"]:::diag
      DL["DiagnosticsLogs\n15s timeout"]:::diag
      DM["DiagnosticsMetrics\n15s timeout"]:::diag
      DE["DiagnosticsEvents\n15s timeout"]:::diag
      DC["DiagnosticsCollect\nbarrier sync"]:::diag
      DO -->|Send| DL & DM & DE
      DL & DM & DE --> DC
    end
  end

  KIL["⎈ Kubernetes Interaction Layer\nK8s Python Client"]:::layer
end

subgraph SUPP["Supporting Infrastructure"]
  direction TB
  LLMGW["🔁 LLM Gateway\nAzure · OpenAI · Anthropic · Google · Bedrock · Ollama · LiteLLM"]:::sup

  subgraph PG["PostgreSQL"]
    PG1["LangGraph Checkpoints\nHITL resume"]:::pgbox
    PG2["Conversation Context\nsticky namespace + resource"]:::pgbox
    PG3["Failure Patterns ×30\nkeyword match → hint injection"]:::pgbox
    PG4["User Preferences\nverbosity · format · namespace"]:::pgbox
    PG5["Tool Registry\n+ PVC /mnt/runtime-tools"]:::pgbox
    PG6["Audit Log\nuser · query · agents · latency"]:::pgbox
  end

  subgraph OBS["Observability"]
    LF["🔭 Langfuse\nLLM traces · token · cost · latency"]:::obs
    PR["📊 Prometheus + Grafana"]:::obs
    LK["📜 Loki + Promtail\nJSON logs"]:::obs
  end

  SG["🔐 Security & Governance\nK8s RBAC · AST sandbox · SHA-256 · Audit log"]:::sec
end

User -->|"NL query"| UIL
UIL -->|"response"| User
UIL --> QPM --> MO --> SUP
SUP -->|"route"| READ & WRITE & DYN
SUP -->|"HITL"| HITL
HITL -->|"approve"| A12 & A10
HITL -->|"deny"| UIL
SUP -->|"diagnose"| DO
DC -->|"evidence"| SUP
READ & WRITE & DYN --> SUP
SUP -->|"FINISH"| UIL
AGENTS --> KIL --> K8S
SUP -.->|"LLM calls"| LLMGW
LLMGW -.-> LF
SUP -.-> PG
HITL -.-> PG1
A12 -.-> PG5
SUP -.-> PR

classDef ext fill:#dfe6e9,stroke:#636e72,color:#2d3436,font-weight:bold
classDef k8s fill:#d5f5e3,stroke:#27ae60,color:#1a5e32,font-weight:bold
classDef layer fill:#ebf5fb,stroke:#2e86c1,color:#1a5276,font-weight:bold
classDef sup fill:#e8daef,stroke:#8e44ad,color:#4a235a,font-weight:bold
classDef mem fill:#e8d5f5,stroke:#7d3c98,color:#4a235a
classDef hitl fill:#fde8d8,stroke:#d35400,color:#6e2c00,font-weight:bold
classDef agent fill:#d6eaf8,stroke:#2e86c1,color:#1a5276
classDef dyn fill:#d5f5e3,stroke:#1e8449,color:#1a5e32
classDef codegen fill:#fae5d3,stroke:#ca6f1e,color:#6e2c00,font-weight:bold
classDef diag fill:#d1f2eb,stroke:#148f77,color:#0e6655
classDef obs fill:#fef9e7,stroke:#d4ac0d,color:#7d6608
classDef pgbox fill:#f4ecf7,stroke:#7d3c98,color:#4a235a
classDef sec fill:#fdedec,stroke:#c0392b,color:#78281f

CodeGenerator Pipeline

When no existing tool covers a request, CodeGenerator synthesizes one:

graph TD
  __start__(["start"]) --> generate_code
  generate_code --> test_code
  test_code --> evaluate_test_results
  evaluate_test_results -.->|pass| generate_metadata
  evaluate_test_results -.->|fail| generate_code
  evaluate_test_results -.->|max retries| handle_failure
  generate_metadata -.->|ok| register_tool
  generate_metadata -.->|error| handle_failure
  register_tool --> finish
  handle_failure --> finish
  finish --> __end__(["end"])

Supervisor Routing Logic

The Supervisor LLM handles some query types inline (no agent delegation):

Query type Detection Supervisor action
Capability question "what can you do?", "are you able to…" FINISH with feature overview
Out-of-scope Non-Kubernetes subject FINISH with polite decline
Worker clarification Worker asks "Which namespace?" FINISH → user responds
Next step / planning "what is the next step?", "any suggestions?" FINISH with 3–5 context-aware suggestions

Memory System

The Memory Orchestrator assembles a pinned context (≤ 550 tokens) before each request via a single asyncio.gather:

Tier Source Service
Short-term Last SHORT_TERM_MEMORY_WINDOW (default 3) conversation turns In-memory
Working context Sticky namespace + resource name per conversation conversation_context table (PostgreSQL)
Failure patterns 30 seeded Kubernetes failure patterns, keyword-matched pre-query failure_patterns table (PostgreSQL)
Registered tools Enabled tools from tool registry — prevents unnecessary CodeGenerator invocations tool_registry table (PostgreSQL)

Storage

Store Purpose Deployed as
MongoDB LibreChat chat history Deployment + PVC
PostgreSQL LangGraph checkpoints · tool registry · conversation context · reflections · audit log · failure patterns Deployment + PVC
PVC (kubeintellect-runtime-tools-pvc) Dynamic tool code files (gen_<id>.py) PVC mounted into core pod
Prometheus Time-series metrics (cluster + app) kube-prometheus-stack
Loki Log aggregation (app + workloads + events) loki-stack

Dynamic Tool Storage: Three-Service Split

Runtime-generated tools (from CodeGenerator) flow through three separate services:

CodeGenerator
tool_storage_service.py          ← PVC file I/O
  Writes gen_<tool_id>.py to /mnt/runtime-tools/tools/
  Computes SHA-256 checksum
tool_registry_service.py         ← PostgreSQL metadata
  Inserts: tool_id, name, description, file_path,
           checksum, input_schema, output_schema, status
     ▼  (optional, GITHUB_PR_ENABLED=true)
github_pr_service.py             ← Promotion to codebase
  Creates branch, commits code, opens PR
  Writes pr_url + pr_number back to registry

Client Interfaces

Client Entry point Use case
LibreChat UI http://localhost:3080 (port-forward) or Kind/AKS ingress Chat UI; production default
CLI (kube-q · PyPI) pip install kube-qkq --url <api-url> Terminal REPL or single-query mode
MCP Server uv run python -m app.mcp.server (stdio) Claude Desktop, VS Code, MCP clients

MCP Server

app/mcp/server.py exposes KubeIntellect as an MCP server (stdio transport).

Tools (37): kubeintellect_query (full AI workflow), kubeintellect_approve (HITL), plus direct Kubernetes tools — pods, deployments, services, namespaces, nodes, RBAC, metrics, and runtime tool management.

Resources: k8s://pods/{namespace}, k8s://deployments/{namespace}, k8s://services/{namespace}, k8s://namespaces, k8s://nodes, kubeintellect://tools, kubeintellect://health

Prompts: debug_pod, investigate_namespace, cluster_health_check, scale_workload, audit_rbac

Claude Desktop config:

{
  "mcpServers": {
    "kubeintellect": {
      "command": "uv",
      "args": ["run", "python", "-m", "app.mcp.server"],
      "cwd": "/path/to/kubeintellect",
      "env": { "KUBEINTELLECT_API_URL": "http://localhost:8000" }
    }
  }
}


Observability

KubeIntellect app:

Signal Tool
LLM traces (tokens, cost, latency, prompts) Langfuse (self-hosted)
HTTP metrics + custom agent counters Prometheus via /metrics
Structured JSON logs Loki via Promtail
HITL decisions, workflow duration Prometheus custom counters

Custom counters (app/utils/metrics.py): - kubeintellect_agent_invocations_total{agent} - kubeintellect_workflow_duration_seconds - kubeintellect_hitl_decisions_total{decision}

Managed cluster: kube-prometheus-stack, Loki + Promtail, kubernetes-event-exporter, MongoDB + PostgreSQL exporters. See docs/observability.md.


Project Structure

app/
├── main.py                          # FastAPI app entry point
├── core/
│   ├── config.py                    # All settings (Pydantic BaseSettings)
│   └── llm_gateway.py               # LLM factory (Azure, OpenAI, Anthropic, Google, Bedrock, Ollama)
├── api/v1/
│   └── endpoints/
│       ├── chat_completions.py      # Main chat endpoint, HITL handling, streaming
│       └── tools.py                 # Dynamic tool management API
├── orchestration/
│   ├── workflow.py                  # Graph construction, run_kubeintellect_workflow()
│   ├── agents.py                    # Agent definitions (tools + system prompts)
│   ├── routing.py                   # Supervisor chain and router node
│   ├── state.py                     # AGENT_MEMBERS, KubeIntellectState
│   └── diagnostics.py               # DiagnosticsOrchestrator fan-out nodes
├── agents/tools/
│   ├── kubernetes_tools.py          # Aggregates all static tool categories
│   └── tools_lib/                   # One file per K8s resource type
│       ├── pod_tools.py
│       ├── deployment_tools.py
│       ├── log_store_tools.py        # Loki LogQL queries
│       ├── prometheus_query_tools.py # PromQL queries
│       └── ...
├── services/
│   ├── kubernetes_service.py
│   ├── tool_registry_service.py
│   ├── tool_storage_service.py
│   ├── conversation_context_service.py
│   ├── memory_orchestrator.py
│   ├── failure_pattern_service.py
│   └── user_preference_service.py
├── mcp/
│   └── server.py                    # MCP server — 37 tools, 7 resources, 5 prompts
└── utils/
    ├── ast_validator.py             # K8s API whitelist — hallucination detection
    ├── code_security.py             # AST static analysis + SHA-256 checksum
    ├── postgres_checkpointer.py     # LangGraph HITL state checkpointing
    └── metrics.py                   # Prometheus custom counters