# UbiOps Deployments Dashboard Grafana dashboard (`dashboard.json`) for monitoring UbiOps deployment pods on Kubernetes — health, resource usage, restarts, and limits. Data comes from Prometheus (`kube-state-metrics` + cAdvisor `container_*` metrics). ## Variables | Variable | Source | Purpose | |----------|--------|---------| | `datasource` | Prometheus datasource picker | Select the Prometheus instance | | `namespace` | `label_values(kube_pod_info, namespace)` | Namespace to scope to | | `deployment` | `label_values(kube_deployment_metadata_generation{namespace=$namespace}, deployment)` | Deployment to inspect (defaults to all, `.*`) | Pods are matched by `pod=~"$deployment.*"`, so a deployment selection covers all of its pods. ## Rows & panels **Overview** — at-a-glance stat tiles: Running / Pending / Failed pods, Restarts (1h), OOMKilled (1h), Waiting containers. **Resource Usage** — CPU and memory working-set usage per pod over time. **Deployment Status** — desired vs. available replicas, and container restart rate. **Resource Limits** — usage vs. limits for CPU and memory (aggregate and per-pod), plus per-pod limits and **% of limit** (green/yellow/red at 70%/90%) to spot pods approaching OOM. **Pod Details** — table of every pod with restart count and memory % of limit, sorted by restarts. ## Usage Default time range is the last 1h with 30s auto-refresh. Import into Grafana (schema `dashboard.grafana.app/v2`, built on Grafana v13), then pick a datasource, namespace, and deployment. ## Key things to watch - **OOMKilled (1h)** and **Memory % of Limit** — memory pressure / under-provisioned limits. - **Restarts** and **Container Restart Rate** — crash loops. - **Pending / Failed pods** — scheduling or startup problems. - **Replicas** (desired vs. available) — incomplete rollouts.