mindef-overdracht/ubiops-deployments/README.md

# ubiops-deployments

UbiOps export (format spec `v8.0`, exported 2026-06-02) bundling the
deployments behind the MinDef metadata/throughput setup. All deployments are
OpenAI-compatible and run in request format (`supports_request_format: true`)
with `plain` input/output.

## Layout

```
deployments/
├── deployment-gpt-oss-chat/        # the LLM serving deployment
│   └── deployment_gpt-oss-120b.yaml
├── deployments-embedder/           # embedding model
│   └── deployment_bge-m3/
└── deployments-proxies/            # OpenAI-compatible proxy deployments
    ├── deployment_llm-proxy/
    └── deployment_proxy-gpt-oss-batch-3x/
```

Each deployment folder holds its `deployment_*.yaml` (deployment config) and a
`versions/` folder with one `*.yaml` + `*.zip` per version (the YAML is the
version config, the ZIP is the packaged code).

## Deployments

| Deployment | Default version | Purpose |
|---|---|---|
| `gpt-oss-120b` | `v-gpt-120b-tool-calling` | Serves `openai/gpt-oss-120b` via vLLM on a `16gb_8vcpu_rtxpro` GPU instance. |
| `bge-m3` | `v3` | BGE-M3 embedding model. |
| `llm-proxy` | `v11` | OpenAI-compatible proxy routing requests to UbiOps deployments. |
| `proxy-gpt-oss-batch-3x` | `v1` | Proxy fanning batch requests across GPT-OSS instances. |

## Configuration

Secrets are exported empty and must be set per environment:

- `gpt-oss-120b` — `HF_TOKEN` (secret), `MODEL_NAME` (`openai/gpt-oss-120b`).
  The serving version also sets `VLLM_USE_V1=1`, `GPU_MEMORY_UTILIZATION=0.90`,
  and `MAX_MODEL_LEN=125000`, with a `/health` check on port 8000.
- `llm-proxy` — `UBIOPS_API_TOKEN` (secret).

## Importing

Import this directory as a UbiOps project export (e.g. via
`ubiops project_export create`), then fill in the secret environment variables
listed above before sending requests. Note that this implementation requires outbound internet acces.

When running in airgapped environments, users can make use of the bring your own docker image functionality