Update ReadMes
This commit is contained in:
parent
901a8c8407
commit
83e2efdb43
@ -2,8 +2,6 @@
|
||||
|
||||
Grafana dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) inference servers running as UbiOps deployments — request throughput, queue depth, KV cache pressure, and token rates. Fed by the `vllm:*` Prometheus metrics that vLLM exposes.
|
||||
|
||||
> **Note:** `dashboard.json` is currently empty (0 bytes) — the export did not save. These docs are reconstructed from `image.png`; re-export the dashboard to capture the panel/query definitions.
|
||||
|
||||
## Variables
|
||||
|
||||
- **Data Source** — Prometheus instance.
|
||||
@ -26,12 +24,6 @@ Grafana dashboard for monitoring [vLLM](https://github.com/vllm-project/vllm) in
|
||||
- *Input Tokens Per Minute (ITPM)* — prompt token volume.
|
||||
- *Output Tokens Per Minute (OTPM)* — generated token volume.
|
||||
|
||||
## Key things to watch
|
||||
|
||||
- **KV Cache Usage** near 100% with rising **Requests Waiting** — the server is capacity-bound; scale up or shorten contexts.
|
||||
- **Tokens Generated/sec** / **OTPM** dropping while RPM holds — degraded decode throughput.
|
||||
- Sustained **Requests Waiting** — queue backlog and latency.
|
||||
|
||||
## Usage
|
||||
|
||||
Default range in the screenshot is the last 2 days with auto-refresh. Import into Grafana, then select datasource, namespace, and deployment.
|
||||
@ -44,4 +44,6 @@ Secrets are exported empty and must be set per environment:
|
||||
|
||||
Import this directory as a UbiOps project export (e.g. via
|
||||
`ubiops project_export create`), then fill in the secret environment variables
|
||||
listed above before sending requests.
|
||||
listed above before sending requests. Note that this implementation requires outbound internet acces.
|
||||
|
||||
When running in airgapped environments, users can make use of the bring your own docker image functionality
|
||||
|
||||
Loading…
Reference in New Issue
Block a user