mindef-overdracht/ubiops-deployments/README.md
2026-06-02 11:46:29 +02:00

1.8 KiB

ubiops-deployments

UbiOps export (format spec v8.0, exported 2026-06-02) bundling the deployments behind the MinDef metadata/throughput setup. All deployments are OpenAI-compatible and run in request format (supports_request_format: true) with plain input/output.

Layout

deployments/
├── deployment-gpt-oss-chat/        # the LLM serving deployment
│   └── deployment_gpt-oss-120b.yaml
├── deployments-embedder/           # embedding model
│   └── deployment_bge-m3/
└── deployments-proxies/            # OpenAI-compatible proxy deployments
    ├── deployment_llm-proxy/
    └── deployment_proxy-gpt-oss-batch-3x/

Each deployment folder holds its deployment_*.yaml (deployment config) and a versions/ folder with one *.yaml + *.zip per version (the YAML is the version config, the ZIP is the packaged code).

Deployments

Deployment Default version Purpose
gpt-oss-120b v-gpt-120b-tool-calling Serves openai/gpt-oss-120b via vLLM on a 16gb_8vcpu_rtxpro GPU instance.
bge-m3 v3 BGE-M3 embedding model.
llm-proxy v11 OpenAI-compatible proxy routing requests to UbiOps deployments.
proxy-gpt-oss-batch-3x v1 Proxy fanning batch requests across GPT-OSS instances.

Configuration

Secrets are exported empty and must be set per environment:

  • gpt-oss-120bHF_TOKEN (secret), MODEL_NAME (openai/gpt-oss-120b). The serving version also sets VLLM_USE_V1=1, GPU_MEMORY_UTILIZATION=0.90, and MAX_MODEL_LEN=125000, with a /health check on port 8000.
  • llm-proxyUBIOPS_API_TOKEN (secret).

Importing

Import this directory as a UbiOps project export (e.g. via ubiops project_export create), then fill in the secret environment variables listed above before sending requests.