KubeIntellect — Observability¶

KubeIntellect has two independent observability concerns that should not be conflated:

Concern	Scope
KubeIntellect app	Is the app healthy? What are LLMs costing? Are agents routing correctly?
Managed cluster	Prometheus/Grafana for the target Kubernetes cluster's workloads — deployed as a prerequisite; KubeIntellect queries it

This document covers observability of the KubeIntellect application itself. For managed-cluster observability setup, see docs/development.md.

Recommended Stack (priority order)¶

1. Langfuse          — LLM call traces, token cost, per-agent latency
2. Prometheus        — API health metrics, custom agent/tool counters
3. Grafana           — unified dashboard for all signals
4. Loki              — structured log aggregation (requires structured logging first)
5. DB exporters      — mongodb_exporter + postgres_exporter → same Prometheus

Grafana is the single pane of glass for everything. All five components integrate with it.

Langfuse: LLM Observability¶

Why this is the most critical gap¶

KubeIntellect has 11 agents (Supervisor + 10 workers), each making multiple LLM calls per workflow. A single RCA flow can chain 6–8 LLM calls. Without LLM tracing you are completely blind to:

Which agent made which call and what it cost
Where latency is coming from (supervisor routing, a worker agent, or a tool call)
Why the supervisor routed to the wrong agent (bad prompt? ambiguous query?)
Total token spend per query and per day

Langfuse vs LangSmith¶

Both integrate via the same LangChain callback mechanism. The config in app/core/config.py already has LANGCHAIN_API_KEY / LANGCHAIN_PROJECT for LangSmith, but LangSmith is hosted SaaS. Langfuse is the self-hostable open-source replacement — same integration path, deployable in your own cluster.

Integration points¶

# app/core/llm_gateway.py — add to both get_supervisor_llm() and get_worker_llm()
from langfuse.callback import CallbackHandler

langfuse_handler = CallbackHandler(
    public_key=settings.LANGFUSE_PUBLIC_KEY,
    secret_key=settings.LANGFUSE_SECRET_KEY,
    host=settings.LANGFUSE_HOST,
)

# Pass as callback to every llm.invoke() / llm.stream() call
llm = llm.with_config(callbacks=[langfuse_handler])

New env vars to add to app/core/config.py and .env.example:

LANGFUSE_PUBLIC_KEY=
LANGFUSE_SECRET_KEY=
LANGFUSE_HOST=http://langfuse.kubeintellect.svc.cluster.local
LANGFUSE_ENABLED=true

Key metrics to watch in Langfuse¶

Signal	What it tells you
Token usage by agent	Which agents are prompt-heavy; where to optimize
Cost per query	Real cost of a single user interaction
LLM latency p95 by agent	Bottleneck identification
Supervisor routing distribution	How often each worker agent is invoked
Failed completions	Timeout, rate limit, or context-length errors

Deployment (self-hosted)¶

Langfuse is bundled directly in the KubeIntellect Helm chart. langfuse.enabled: true is already set in charts/kubeintellect/values-kind.yaml — it deploys automatically as part of the standard Kind cluster setup.

make kind-kubeintellect-clean-deploy   # Langfuse deploys as part of this — no extra step
# OR, to deploy/update Langfuse only into an existing cluster:
make kind-langfuse-deploy

# Access via ingress: http://langfuse.local  (~2 min to become healthy on first start)
# Fallback (if ingress isn't working):
make port-forward-langfuse   # → http://localhost:3000

After a fresh cluster wipe, Langfuse seeds itself automatically — no manual registration needed.

On first startup, the LANGFUSE_INIT_* env vars (injected via langfuse-secret) auto-create: - Admin user: admin@kubeintellect.local / langfuse-admin - Org: KubeIntellect - Project: KubeIntellect - API keys: fixed, matching the values already in kubeintellect-core-secret

KubeIntellect connects automatically — no key-copying or restart required.

The seeded keys and user credentials are configured in charts/kubeintellect/values-kind.yaml under langfuse.initUser, langfuse.initOrg, and langfuse.initProject.

The app env vars (LANGFUSE_ENABLED=true, LANGFUSE_HOST, keys) are already wired in the ConfigMap and Secret in values-kind.yaml — no .env changes needed for Kind.

The integration is wired in app/core/llm_gateway.py — all four LLM factory functions (get_supervisor_llm, get_worker_llm, get_code_gen_llm, get_llm_with_params) attach the Langfuse CallbackHandler when LANGFUSE_ENABLED=true. When disabled, behaviour is unchanged.

langfuse Python package¶

The langfuse package (v4+) is in pyproject.toml and uv.lock. It is installed automatically during docker build. The integration uses the v4 API:

# app/core/llm_gateway.py
from langfuse import Langfuse                    # initialize OTel tracer with explicit credentials
from langfuse.langchain import CallbackHandler   # attach to LangChain LLMs (v4 import path)

Note: In langfuse v3 the import was from langfuse.callback import CallbackHandler and the constructor accepted secret_key/host directly. In v4 both changed — the Langfuse(...) client is initialized once with credentials, then CallbackHandler() is called with no arguments.

API Metrics (Prometheus)¶

FastAPI instrumentation¶

# app/main.py — add at startup
from prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app)

This exposes /metrics with standard HTTP metrics (request count, latency histograms, error rate) automatically.

Custom counters to add¶

Define in a new app/utils/metrics.py:

from prometheus_client import Counter, Histogram

agent_invocations = Counter(
    "kubeintellect_agent_invocations_total",
    "Total agent invocations",
    ["agent"]
)
tool_calls = Counter(
    "kubeintellect_tool_calls_total",
    "Total tool calls",
    ["tool", "status"]  # status: success | error
)
workflow_duration = Histogram(
    "kubeintellect_workflow_duration_seconds",
    "End-to-end workflow duration",
    buckets=[1, 2, 5, 10, 30, 60]
)
hitl_decisions = Counter(
    "kubeintellect_hitl_decisions_total",
    "HITL approval/denial decisions",
    ["decision"]  # approved | denied
)

Grafana dashboard panels¶

Request rate + error rate (from prometheus_fastapi_instrumentator)
Workflow duration p50/p95/p99
Agent invocation heatmap (which agents are most used)
HITL approval rate over time

Structured Logging¶

This is a prerequisite for Loki. Currently KubeIntellect emits unstructured text logs.

Target log format (JSON)¶

{
  "timestamp": "2026-03-26T14:23:01.123Z",
  "level": "INFO",
  "logger": "app.orchestration.routing",
  "message": "Routing to logs_agent",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "user_id": "user_abc",
  "agent": "logs_agent",
  "duration_ms": 142
}

Implementation¶

uv add python-json-logger

# app/utils/logger_config.py
import logging
from pythonjsonlogger import jsonlogger

def setup_logging():
    handler = logging.StreamHandler()
    handler.setFormatter(jsonlogger.JsonFormatter(
        "%(timestamp)s %(level)s %(name)s %(message)s"
    ))
    logging.root.setLevel(logging.INFO)
    logging.root.addHandler(handler)

Inject request_id at the chat completions entry point and propagate through workflow state so all log lines for a single user query are correlated.

Loki: Log Aggregation¶

Prerequisite: Structured logging must be in place first — there is no value in shipping unstructured text to Loki.

Loki is lightweight and Grafana-native (unlike ELK). It covers two distinct log streams:

Stream	Source	Purpose
KubeIntellect app logs	Structured JSON from `kubeintellect-core` pods	Debug routing, agent errors, HITL events
Managed cluster workload logs	All pods in managed namespaces	Historical log queries via `logs_agent`

Deployment¶

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack -n observability \
  --set fluent-bit.enabled=true \
  --set grafana.enabled=false  # use existing Grafana instance

Fluent Bit is deployed as a DaemonSet and ships logs to Loki automatically.

KubeIntellect tool integration¶

Once Loki is deployed, add a query_loki_logs tool in app/agents/tools/tools_lib/log_store_tools.py so the logs_agent can query historical logs rather than only live K8s API logs.

Database Monitoring¶

MongoDB (LibreChat chat history)¶

Deploy mongodb_exporter and point Prometheus at it. Key signals:

Metric	Alert threshold
`mongodb_connections_current`	> 80% of `maxIncomingConnections`
`mongodb_op_counters_total{type="query"}`	Sudden spike = LibreChat query issue
Storage size growth	> 80% of PVC capacity
Slow queries (>100ms)	Any pattern repeating

Grafana community dashboard ID: 7353

PostgreSQL (HITL checkpoints)¶

Key signals:

Metric	Alert threshold
`pg_stat_activity_count`	Approaching `POSTGRES_POOL_MAX_CONN` → HITL hangs
`pg_stat_user_tables_n_dead_tup{table="workflow_checkpoints"}`	High dead tuples → needs VACUUM
`pg_stat_user_tables_n_live_tup{table="workflow_checkpoints"}`	Row count growth → unbounded checkpoints
Query latency	p99 > 500ms

Grafana community dashboard ID: 9628

Grafana: Unified Dashboard¶

All five signal sources (Langfuse, Prometheus, Loki, mongodb_exporter, postgres_exporter) integrate into Grafana. Use a single Grafana deployment in the observability namespace.

helm install grafana grafana/grafana -n observability \
  --set persistence.enabled=true \
  --set adminPassword=<from-secrets>

Configure data sources: 1. Prometheus → http://prometheus-server.observability.svc.cluster.local 2. Loki → http://loki.observability.svc.cluster.local:3100 3. Langfuse → native Langfuse UI (separate, not a Grafana data source)

What NOT to do¶

Do not embed Prometheus/Loki inside KubeIntellect's Helm chart — these are shared cluster infrastructure; deploy them independently in an observability namespace
Do not skip structured logging before deploying Loki — unstructured logs in Loki are queryable but nearly useless
Do not use ELK instead of Loki — ELK is 5–10× heavier; Loki is sufficient for this workload and integrates with Grafana natively
Do not rely on LangSmith for production — it is SaaS and exports your prompts/completions to Anthropic/LangChain infrastructure; use self-hosted Langfuse

Data Retention Policy (1-Year Minimum)¶

All observability components are configured for at least 365 days of persistent storage.

Component	PVC Size	Retention Setting	Storage Rate
Prometheus TSDB	50Gi (`retentionSize=45GB`)	`retention=365d`	~80MB/day compressed
Alertmanager	2Gi	N/A (silence history)	Negligible
Grafana	2Gi	N/A (dashboards/settings)	Negligible
Loki	100Gi	`retention_period=8760h` (365d)	~200MB/day compressed
Langfuse ClickHouse	30Gi	`LANGFUSE_RETENTION_DAYS=365`	~50MB/day
Langfuse MinIO	50Gi	`LANGFUSE_RETENTION_DAYS=365`	~100MB/day trace payloads
Langfuse PostgreSQL	10Gi	N/A (metadata only)	~5MB/day
Langfuse Redis	2Gi	N/A (ephemeral queue)	No long-term data

Implementation notes: - Prometheus: retentionSize=45GB is set 10% below the 50Gi PVC to prevent TSDB corruption from a full disk. Data is deleted oldest-first when the size limit is hit. - Loki: compactor.retention_enabled=true is required for the compactor to enforce retention_period. Without it, chunks accumulate regardless of the period setting. - Langfuse: LANGFUSE_RETENTION_DAYS=365 is injected into both the langfuse-web and langfuse-worker containers. Langfuse uses this to TTL-expire old traces in ClickHouse via its built-in housekeeping jobs. - All PVCs use the Retain reclaim policy on Azure (managed-csi-retain) so data survives pod and even release deletion.

Resizing existing PVCs (if already deployed): - Azure (managed-csi): online resize is supported — patch the PVC spec.resources.requests.storage and restart the pod. - Kind (hostPath): requires delete + recreate (data is lost; Kind clusters are ephemeral by nature).

Deployment¶

# Full observability stack (run after kind-kubeintellect-clean-deploy):
make install-observability-kind

# Individual components:
make install-prometheus-kind    # Prometheus + Grafana + Alertmanager
make install-loki-kind          # Loki + Promtail
make install-event-exporter-kind  # kubernetes-event-exporter

# Access UIs:
# Prometheus: http://prometheus.local
# Grafana:    http://grafana.local  (admin / admin)
# Loki:       query via Grafana → Explore → Loki

# Port-forward fallbacks (if ingress not working):
make port-forward-prometheus    # → localhost:9090
make port-forward-grafana       # → localhost:3001

After deploying Grafana — import dashboards¶

MongoDB exporter: Import dashboard ID 7353 from grafana.com
PostgreSQL exporter: Import dashboard ID 9628 from grafana.com