Import part 1

2026-06-02 11:46:19 +02:00 · 2026-06-02 11:46:19 +02:00 · 00e2c83beb
commit 00e2c83beb
6 changed files with 5522 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,3 @@
+# Large benchmark output logs — reproducible, not versioned
+llm-throughput-tests-mindef-metadateren/results/**/benchmark_io.log
+*.log
--- a/grafana/ubiops-sre/README.md
+++ b/grafana/ubiops-sre/README.md
@ -0,0 +1,36 @@
+# UbiOps Deployments Dashboard
+
+Grafana dashboard (`dashboard.json`) for monitoring UbiOps deployment pods on Kubernetes — health, resource usage, restarts, and limits. Data comes from Prometheus (`kube-state-metrics` + cAdvisor `container_*` metrics).
+
+## Variables
+
+| Variable | Source | Purpose |
+|----------|--------|---------|
+| `datasource` | Prometheus datasource picker | Select the Prometheus instance |
+| `namespace` | `label_values(kube_pod_info, namespace)` | Namespace to scope to |
+| `deployment` | `label_values(kube_deployment_metadata_generation{namespace=$namespace}, deployment)` | Deployment to inspect (defaults to all, `.*`) |
+
+Pods are matched by `pod=~"$deployment.*"`, so a deployment selection covers all of its pods.
+
+## Rows & panels
+
+**Overview** — at-a-glance stat tiles: Running / Pending / Failed pods, Restarts (1h), OOMKilled (1h), Waiting containers.
+
+**Resource Usage** — CPU and memory working-set usage per pod over time.
+
+**Deployment Status** — desired vs. available replicas, and container restart rate.
+
+**Resource Limits** — usage vs. limits for CPU and memory (aggregate and per-pod), plus per-pod limits and **% of limit** (green/yellow/red at 70%/90%) to spot pods approaching OOM.
+
+**Pod Details** — table of every pod with restart count and memory % of limit, sorted by restarts.
+
+## Usage
+
+Default time range is the last 1h with 30s auto-refresh. Import into Grafana (schema `dashboard.grafana.app/v2`, built on Grafana v13), then pick a datasource, namespace, and deployment.
+
+## Key things to watch
+
+- **OOMKilled (1h)** and **Memory % of Limit** — memory pressure / under-provisioned limits.
+- **Restarts** and **Container Restart Rate** — crash loops.
+- **Pending / Failed pods** — scheduling or startup problems.
+- **Replicas** (desired vs. available) — incomplete rollouts.
--- a/grafana/ubiops-sre/dashboard.json
+++ b/grafana/ubiops-sre/dashboard.json
--- a/grafana/ubiops-sre/image.png
+++ b/grafana/ubiops-sre/image.png
--- a/grafana/vllm-metrics/README.md
+++ b/grafana/vllm-metrics/README.md
@ -0,0 +1,37 @@
+# vLLM Performance Dashboard
+
+Grafana dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) inference servers running as UbiOps deployments — request throughput, queue depth, KV cache pressure, and token rates. Fed by the `vllm:*` Prometheus metrics that vLLM exposes.
+
+> **Note:** `dashboard.json` is currently empty (0 bytes) — the export did not save. These docs are reconstructed from `image.png`; re-export the dashboard to capture the panel/query definitions.
+
+## Variables
+
+- **Data Source** — Prometheus instance.
+- **Namespace** — Kubernetes namespace (e.g. `default`).
+- **Deployment** — the vLLM deployment / served model (e.g. `gpt-oss-120b`).
+
+## Rows & panels
+
+**Request Stats**
+- *Requests Running* — requests currently being decoded.
+- *Requests Waiting* — requests queued for a slot.
+- *KV Cache Usage* — % of the GPU KV cache block pool in use (saturation → queuing).
+- *Request Rate* — incoming requests over time.
+- *Tokens Generated/sec* — output token throughput.
+- *Request States Over Time* — running vs. waiting (and swapped) requests as a timeseries.
+- *KV Cache Usage Over Time* — KV cache utilization trend.
+
+**Per-Minute Metrics (RPM / ITPM / OTPM)**
+- *Requests Per Minute (RPM)*.
+- *Input Tokens Per Minute (ITPM)* — prompt token volume.
+- *Output Tokens Per Minute (OTPM)* — generated token volume.
+
+## Key things to watch
+
+- **KV Cache Usage** near 100% with rising **Requests Waiting** — the server is capacity-bound; scale up or shorten contexts.
+- **Tokens Generated/sec** / **OTPM** dropping while RPM holds — degraded decode throughput.
+- Sustained **Requests Waiting** — queue backlog and latency.
+
+## Usage
+
+Default range in the screenshot is the last 2 days with auto-refresh. Import into Grafana, then select datasource, namespace, and deployment.
--- a/grafana/vllm-metrics/dashboard.json
+++ b/grafana/vllm-metrics/dashboard.json