Import part 1
This commit is contained in:
commit
00e2c83beb
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
# Large benchmark output logs — reproducible, not versioned
|
||||||
|
llm-throughput-tests-mindef-metadateren/results/**/benchmark_io.log
|
||||||
|
*.log
|
||||||
36
grafana/ubiops-sre/README.md
Normal file
36
grafana/ubiops-sre/README.md
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
# UbiOps Deployments Dashboard
|
||||||
|
|
||||||
|
Grafana dashboard (`dashboard.json`) for monitoring UbiOps deployment pods on Kubernetes — health, resource usage, restarts, and limits. Data comes from Prometheus (`kube-state-metrics` + cAdvisor `container_*` metrics).
|
||||||
|
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
| Variable | Source | Purpose |
|
||||||
|
|----------|--------|---------|
|
||||||
|
| `datasource` | Prometheus datasource picker | Select the Prometheus instance |
|
||||||
|
| `namespace` | `label_values(kube_pod_info, namespace)` | Namespace to scope to |
|
||||||
|
| `deployment` | `label_values(kube_deployment_metadata_generation{namespace=$namespace}, deployment)` | Deployment to inspect (defaults to all, `.*`) |
|
||||||
|
|
||||||
|
Pods are matched by `pod=~"$deployment.*"`, so a deployment selection covers all of its pods.
|
||||||
|
|
||||||
|
## Rows & panels
|
||||||
|
|
||||||
|
**Overview** — at-a-glance stat tiles: Running / Pending / Failed pods, Restarts (1h), OOMKilled (1h), Waiting containers.
|
||||||
|
|
||||||
|
**Resource Usage** — CPU and memory working-set usage per pod over time.
|
||||||
|
|
||||||
|
**Deployment Status** — desired vs. available replicas, and container restart rate.
|
||||||
|
|
||||||
|
**Resource Limits** — usage vs. limits for CPU and memory (aggregate and per-pod), plus per-pod limits and **% of limit** (green/yellow/red at 70%/90%) to spot pods approaching OOM.
|
||||||
|
|
||||||
|
**Pod Details** — table of every pod with restart count and memory % of limit, sorted by restarts.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Default time range is the last 1h with 30s auto-refresh. Import into Grafana (schema `dashboard.grafana.app/v2`, built on Grafana v13), then pick a datasource, namespace, and deployment.
|
||||||
|
|
||||||
|
## Key things to watch
|
||||||
|
|
||||||
|
- **OOMKilled (1h)** and **Memory % of Limit** — memory pressure / under-provisioned limits.
|
||||||
|
- **Restarts** and **Container Restart Rate** — crash loops.
|
||||||
|
- **Pending / Failed pods** — scheduling or startup problems.
|
||||||
|
- **Replicas** (desired vs. available) — incomplete rollouts.
|
||||||
2828
grafana/ubiops-sre/dashboard.json
Normal file
2828
grafana/ubiops-sre/dashboard.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
grafana/ubiops-sre/image.png
Normal file
BIN
grafana/ubiops-sre/image.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 155 KiB |
37
grafana/vllm-metrics/README.md
Normal file
37
grafana/vllm-metrics/README.md
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
# vLLM Performance Dashboard
|
||||||
|
|
||||||
|
Grafana dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) inference servers running as UbiOps deployments — request throughput, queue depth, KV cache pressure, and token rates. Fed by the `vllm:*` Prometheus metrics that vLLM exposes.
|
||||||
|
|
||||||
|
> **Note:** `dashboard.json` is currently empty (0 bytes) — the export did not save. These docs are reconstructed from `image.png`; re-export the dashboard to capture the panel/query definitions.
|
||||||
|
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
- **Data Source** — Prometheus instance.
|
||||||
|
- **Namespace** — Kubernetes namespace (e.g. `default`).
|
||||||
|
- **Deployment** — the vLLM deployment / served model (e.g. `gpt-oss-120b`).
|
||||||
|
|
||||||
|
## Rows & panels
|
||||||
|
|
||||||
|
**Request Stats**
|
||||||
|
- *Requests Running* — requests currently being decoded.
|
||||||
|
- *Requests Waiting* — requests queued for a slot.
|
||||||
|
- *KV Cache Usage* — % of the GPU KV cache block pool in use (saturation → queuing).
|
||||||
|
- *Request Rate* — incoming requests over time.
|
||||||
|
- *Tokens Generated/sec* — output token throughput.
|
||||||
|
- *Request States Over Time* — running vs. waiting (and swapped) requests as a timeseries.
|
||||||
|
- *KV Cache Usage Over Time* — KV cache utilization trend.
|
||||||
|
|
||||||
|
**Per-Minute Metrics (RPM / ITPM / OTPM)**
|
||||||
|
- *Requests Per Minute (RPM)*.
|
||||||
|
- *Input Tokens Per Minute (ITPM)* — prompt token volume.
|
||||||
|
- *Output Tokens Per Minute (OTPM)* — generated token volume.
|
||||||
|
|
||||||
|
## Key things to watch
|
||||||
|
|
||||||
|
- **KV Cache Usage** near 100% with rising **Requests Waiting** — the server is capacity-bound; scale up or shorten contexts.
|
||||||
|
- **Tokens Generated/sec** / **OTPM** dropping while RPM holds — degraded decode throughput.
|
||||||
|
- Sustained **Requests Waiting** — queue backlog and latency.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Default range in the screenshot is the last 2 days with auto-refresh. Import into Grafana, then select datasource, namespace, and deployment.
|
||||||
2618
grafana/vllm-metrics/dashboard.json
Normal file
2618
grafana/vllm-metrics/dashboard.json
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user