mindef-overdracht/ubiops-deployments/README.md
2026-06-02 11:46:29 +02:00

48 lines
1.8 KiB
Markdown

# ubiops-deployments
UbiOps export (format spec `v8.0`, exported 2026-06-02) bundling the
deployments behind the MinDef metadata/throughput setup. All deployments are
OpenAI-compatible and run in request format (`supports_request_format: true`)
with `plain` input/output.
## Layout
```
deployments/
├── deployment-gpt-oss-chat/ # the LLM serving deployment
│ └── deployment_gpt-oss-120b.yaml
├── deployments-embedder/ # embedding model
│ └── deployment_bge-m3/
└── deployments-proxies/ # OpenAI-compatible proxy deployments
├── deployment_llm-proxy/
└── deployment_proxy-gpt-oss-batch-3x/
```
Each deployment folder holds its `deployment_*.yaml` (deployment config) and a
`versions/` folder with one `*.yaml` + `*.zip` per version (the YAML is the
version config, the ZIP is the packaged code).
## Deployments
| Deployment | Default version | Purpose |
|---|---|---|
| `gpt-oss-120b` | `v-gpt-120b-tool-calling` | Serves `openai/gpt-oss-120b` via vLLM on a `16gb_8vcpu_rtxpro` GPU instance. |
| `bge-m3` | `v3` | BGE-M3 embedding model. |
| `llm-proxy` | `v11` | OpenAI-compatible proxy routing requests to UbiOps deployments. |
| `proxy-gpt-oss-batch-3x` | `v1` | Proxy fanning batch requests across GPT-OSS instances. |
## Configuration
Secrets are exported empty and must be set per environment:
- `gpt-oss-120b``HF_TOKEN` (secret), `MODEL_NAME` (`openai/gpt-oss-120b`).
The serving version also sets `VLLM_USE_V1=1`, `GPU_MEMORY_UTILIZATION=0.90`,
and `MAX_MODEL_LEN=125000`, with a `/health` check on port 8000.
- `llm-proxy``UBIOPS_API_TOKEN` (secret).
## Importing
Import this directory as a UbiOps project export (e.g. via
`ubiops project_export create`), then fill in the secret environment variables
listed above before sending requests.