mindef-overdracht/ubiops-deployments/README.md

# ubiops-deployments

UbiOps export (format spec `v8.0`, exported 2026-06-02) bundling the
deployments behind the MinDef metadata/throughput setup. All deployments are
OpenAI-compatible and run in request format (`supports_request_format: true`)
with `plain` input/output.

## Layout

```
deployments/
├── deployment-gpt-oss-chat/        # the LLM serving deployment
│   └── deployment_gpt-oss-120b.yaml
├── deployments-embedder/           # embedding model
│   └── deployment_bge-m3/
└── deployments-proxies/            # OpenAI-compatible proxy deployments
    ├── deployment_llm-proxy/
    └── deployment_proxy-gpt-oss-batch-3x/
```

Each deployment folder holds its `deployment_*.yaml` (deployment config) and a
`versions/` folder with one `*.yaml` + `*.zip` per version (the YAML is the
version config, the ZIP is the packaged code).

## Deployments

| Deployment | Default version | Purpose |
|---|---|---|
| `gpt-oss-120b` | `v-gpt-120b-tool-calling` | Serves `openai/gpt-oss-120b` via vLLM on a `16gb_8vcpu_rtxpro` GPU instance. |
| `bge-m3` | `v3` | BGE-M3 embedding model. |
| `llm-proxy` | `v11` | OpenAI-compatible proxy routing requests to UbiOps deployments. |
| `proxy-gpt-oss-batch-3x` | `v1` | Proxy fanning batch requests across GPT-OSS instances. |

## Configuration

Secrets are exported empty and must be set per environment:

- `gpt-oss-120b` — `HF_TOKEN` (secret), `MODEL_NAME` (`openai/gpt-oss-120b`).
  The serving version also sets `VLLM_USE_V1=1`, `GPU_MEMORY_UTILIZATION=0.90`,
  and `MAX_MODEL_LEN=125000`, with a `/health` check on port 8000.
- `llm-proxy` — `UBIOPS_API_TOKEN` (secret).

## Importing

Import this directory as a UbiOps project export (e.g. via
`ubiops project_export create`), then fill in the secret environment variables
listed above before sending requests.