From 83e2efdb43aed8ee31fe092ba5a03f7a67d71091 Mon Sep 17 00:00:00 2001 From: kvanbezouw Date: Tue, 2 Jun 2026 11:49:50 +0200 Subject: [PATCH] Update ReadMes --- grafana/vllm-metrics/README.md | 8 -------- ubiops-deployments/README.md | 4 +++- 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/grafana/vllm-metrics/README.md b/grafana/vllm-metrics/README.md index be9fef2..c1a3e31 100644 --- a/grafana/vllm-metrics/README.md +++ b/grafana/vllm-metrics/README.md @@ -2,8 +2,6 @@ Grafana dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) inference servers running as UbiOps deployments — request throughput, queue depth, KV cache pressure, and token rates. Fed by the `vllm:*` Prometheus metrics that vLLM exposes. -> **Note:** `dashboard.json` is currently empty (0 bytes) — the export did not save. These docs are reconstructed from `image.png`; re-export the dashboard to capture the panel/query definitions. - ## Variables - **Data Source** — Prometheus instance. @@ -26,12 +24,6 @@ Grafana dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) in - *Input Tokens Per Minute (ITPM)* — prompt token volume. - *Output Tokens Per Minute (OTPM)* — generated token volume. -## Key things to watch - -- **KV Cache Usage** near 100% with rising **Requests Waiting** — the server is capacity-bound; scale up or shorten contexts. -- **Tokens Generated/sec** / **OTPM** dropping while RPM holds — degraded decode throughput. -- Sustained **Requests Waiting** — queue backlog and latency. - ## Usage Default range in the screenshot is the last 2 days with auto-refresh. Import into Grafana, then select datasource, namespace, and deployment. \ No newline at end of file diff --git a/ubiops-deployments/README.md b/ubiops-deployments/README.md index 069ae2c..47203fe 100644 --- a/ubiops-deployments/README.md +++ b/ubiops-deployments/README.md @@ -44,4 +44,6 @@ Secrets are exported empty and must be set per environment: Import this directory as a UbiOps project export (e.g. via `ubiops project_export create`), then fill in the secret environment variables -listed above before sending requests. +listed above before sending requests. Note that this implementation requires outbound internet acces. + +When running in airgapped environments, users can make use of the bring your own docker image functionality