mindef-overdracht/grafana/ubiops-sre/README.md
2026-06-02 11:46:19 +02:00

1.8 KiB

UbiOps Deployments Dashboard

Grafana dashboard (dashboard.json) for monitoring UbiOps deployment pods on Kubernetes — health, resource usage, restarts, and limits. Data comes from Prometheus (kube-state-metrics + cAdvisor container_* metrics).

Variables

Variable Source Purpose
datasource Prometheus datasource picker Select the Prometheus instance
namespace label_values(kube_pod_info, namespace) Namespace to scope to
deployment label_values(kube_deployment_metadata_generation{namespace=$namespace}, deployment) Deployment to inspect (defaults to all, .*)

Pods are matched by pod=~"$deployment.*", so a deployment selection covers all of its pods.

Rows & panels

Overview — at-a-glance stat tiles: Running / Pending / Failed pods, Restarts (1h), OOMKilled (1h), Waiting containers.

Resource Usage — CPU and memory working-set usage per pod over time.

Deployment Status — desired vs. available replicas, and container restart rate.

Resource Limits — usage vs. limits for CPU and memory (aggregate and per-pod), plus per-pod limits and % of limit (green/yellow/red at 70%/90%) to spot pods approaching OOM.

Pod Details — table of every pod with restart count and memory % of limit, sorted by restarts.

Usage

Default time range is the last 1h with 30s auto-refresh. Import into Grafana (schema dashboard.grafana.app/v2, built on Grafana v13), then pick a datasource, namespace, and deployment.

Key things to watch

  • OOMKilled (1h) and Memory % of Limit — memory pressure / under-provisioned limits.
  • Restarts and Container Restart Rate — crash loops.
  • Pending / Failed pods — scheduling or startup problems.
  • Replicas (desired vs. available) — incomplete rollouts.