Import part 1

2026-06-02 11:46:19 +02:00 · 2026-06-02 11:46:19 +02:00 · 00e2c83beb
commit 00e2c83beb
6 changed files with 5522 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,3 @@
 # Large benchmark output logs — reproducible, not versioned
 llm-throughput-tests-mindef-metadateren/results/**/benchmark_io.log
 *.log
--- a/grafana/ubiops-sre/README.md
+++ b/grafana/ubiops-sre/README.md
@ -0,0 +1,36 @@
 # UbiOps Deployments Dashboard
 Grafana dashboard (`dashboard.json`) for monitoring UbiOps deployment pods on Kubernetes — health, resource usage, restarts, and limits. Data comes from Prometheus (`kube-state-metrics` + cAdvisor `container_*` metrics).
 ## Variables
 | Variable | Source | Purpose |
 |----------|--------|---------|
 | `datasource` | Prometheus datasource picker | Select the Prometheus instance |
 | `namespace` | `label_values(kube_pod_info, namespace)` | Namespace to scope to |
 | `deployment` | `label_values(kube_deployment_metadata_generation{namespace=$namespace}, deployment)` | Deployment to inspect (defaults to all, `.*`) |
 Pods are matched by `pod=~"$deployment.*"`, so a deployment selection covers all of its pods.
 ## Rows & panels
 **Overview** — at-a-glance stat tiles: Running / Pending / Failed pods, Restarts (1h), OOMKilled (1h), Waiting containers.
 **Resource Usage** — CPU and memory working-set usage per pod over time.
 **Deployment Status** — desired vs. available replicas, and container restart rate.
 **Resource Limits** — usage vs. limits for CPU and memory (aggregate and per-pod), plus per-pod limits and **% of limit** (green/yellow/red at 70%/90%) to spot pods approaching OOM.
 **Pod Details** — table of every pod with restart count and memory % of limit, sorted by restarts.
 ## Usage
 Default time range is the last 1h with 30s auto-refresh. Import into Grafana (schema `dashboard.grafana.app/v2`, built on Grafana v13), then pick a datasource, namespace, and deployment.
 ## Key things to watch
 - **OOMKilled (1h)** and **Memory % of Limit** — memory pressure / under-provisioned limits.
 - **Restarts** and **Container Restart Rate** — crash loops.
 - **Pending / Failed pods** — scheduling or startup problems.
 - **Replicas** (desired vs. available) — incomplete rollouts.
--- a/grafana/ubiops-sre/dashboard.json
+++ b/grafana/ubiops-sre/dashboard.json
--- a/grafana/ubiops-sre/image.png
+++ b/grafana/ubiops-sre/image.png
--- a/grafana/vllm-metrics/README.md
+++ b/grafana/vllm-metrics/README.md
@ -0,0 +1,37 @@
 # vLLM Performance Dashboard
 Grafana dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) inference servers running as UbiOps deployments — request throughput, queue depth, KV cache pressure, and token rates. Fed by the `vllm:*` Prometheus metrics that vLLM exposes.
 > **Note:** `dashboard.json` is currently empty (0 bytes) — the export did not save. These docs are reconstructed from `image.png`; re-export the dashboard to capture the panel/query definitions.
 ## Variables
 - **Data Source** — Prometheus instance.
 - **Namespace** — Kubernetes namespace (e.g. `default`).
 - **Deployment** — the vLLM deployment / served model (e.g. `gpt-oss-120b`).
 ## Rows & panels
 **Request Stats**
 - *Requests Running* — requests currently being decoded.
 - *Requests Waiting* — requests queued for a slot.
 - *KV Cache Usage* — % of the GPU KV cache block pool in use (saturation → queuing).
 - *Request Rate* — incoming requests over time.
 - *Tokens Generated/sec* — output token throughput.
 - *Request States Over Time* — running vs. waiting (and swapped) requests as a timeseries.
 - *KV Cache Usage Over Time* — KV cache utilization trend.
 **Per-Minute Metrics (RPM / ITPM / OTPM)**
 - *Requests Per Minute (RPM)*.
 - *Input Tokens Per Minute (ITPM)* — prompt token volume.
 - *Output Tokens Per Minute (OTPM)* — generated token volume.
 ## Key things to watch
 - **KV Cache Usage** near 100% with rising **Requests Waiting** — the server is capacity-bound; scale up or shorten contexts.
 - **Tokens Generated/sec** / **OTPM** dropping while RPM holds — degraded decode throughput.
 - Sustained **Requests Waiting** — queue backlog and latency.
 ## Usage
 Default range in the screenshot is the last 2 days with auto-refresh. Import into Grafana, then select datasource, namespace, and deployment.
--- a/grafana/vllm-metrics/dashboard.json
+++ b/grafana/vllm-metrics/dashboard.json