History

kvanbezouw 00e2c83beb Import part 1		2026-06-02 11:46:19 +02:00
..
dashboard.json	Import part 1	2026-06-02 11:46:19 +02:00
image.png	Import part 1	2026-06-02 11:46:19 +02:00
README.md	Import part 1	2026-06-02 11:46:19 +02:00

README.md

UbiOps Deployments Dashboard

Grafana dashboard (dashboard.json) for monitoring UbiOps deployment pods on Kubernetes — health, resource usage, restarts, and limits. Data comes from Prometheus (kube-state-metrics + cAdvisor container_* metrics).

Variables

Variable	Source	Purpose
`datasource`	Prometheus datasource picker	Select the Prometheus instance
`namespace`	`label_values(kube_pod_info, namespace)`	Namespace to scope to
`deployment`	`label_values(kube_deployment_metadata_generation{namespace=$namespace}, deployment)`	Deployment to inspect (defaults to all, `.*`)

Pods are matched by pod=~"$deployment.*", so a deployment selection covers all of its pods.

Rows & panels

Overview — at-a-glance stat tiles: Running / Pending / Failed pods, Restarts (1h), OOMKilled (1h), Waiting containers.

Resource Usage — CPU and memory working-set usage per pod over time.

Deployment Status — desired vs. available replicas, and container restart rate.

Resource Limits — usage vs. limits for CPU and memory (aggregate and per-pod), plus per-pod limits and % of limit (green/yellow/red at 70%/90%) to spot pods approaching OOM.

Pod Details — table of every pod with restart count and memory % of limit, sorted by restarts.

Usage

Default time range is the last 1h with 30s auto-refresh. Import into Grafana (schema dashboard.grafana.app/v2, built on Grafana v13), then pick a datasource, namespace, and deployment.

Key things to watch

OOMKilled (1h) and Memory % of Limit — memory pressure / under-provisioned limits.
Restarts and Container Restart Rate — crash loops.
Pending / Failed pods — scheduling or startup problems.
Replicas (desired vs. available) — incomplete rollouts.