| .. | ||
| deployments/deployment-gpt-oss-chat | ||
| README.md | ||
ubiops-deployments
UbiOps export (format spec v8.0, exported 2026-06-02) bundling the
deployments behind the MinDef metadata/throughput setup. All deployments are
OpenAI-compatible and run in request format (supports_request_format: true)
with plain input/output.
Layout
deployments/
├── deployment-gpt-oss-chat/ # the LLM serving deployment
│ └── deployment_gpt-oss-120b.yaml
├── deployments-embedder/ # embedding model
│ └── deployment_bge-m3/
└── deployments-proxies/ # OpenAI-compatible proxy deployments
├── deployment_llm-proxy/
└── deployment_proxy-gpt-oss-batch-3x/
Each deployment folder holds its deployment_*.yaml (deployment config) and a
versions/ folder with one *.yaml + *.zip per version (the YAML is the
version config, the ZIP is the packaged code).
Deployments
| Deployment | Default version | Purpose |
|---|---|---|
gpt-oss-120b |
v-gpt-120b-tool-calling |
Serves openai/gpt-oss-120b via vLLM on a 16gb_8vcpu_rtxpro GPU instance. |
bge-m3 |
v3 |
BGE-M3 embedding model. |
llm-proxy |
v11 |
OpenAI-compatible proxy routing requests to UbiOps deployments. |
proxy-gpt-oss-batch-3x |
v1 |
Proxy fanning batch requests across GPT-OSS instances. |
Configuration
Secrets are exported empty and must be set per environment:
gpt-oss-120b—HF_TOKEN(secret),MODEL_NAME(openai/gpt-oss-120b). The serving version also setsVLLM_USE_V1=1,GPU_MEMORY_UTILIZATION=0.90, andMAX_MODEL_LEN=125000, with a/healthcheck on port 8000.llm-proxy—UBIOPS_API_TOKEN(secret).
Importing
Import this directory as a UbiOps project export (e.g. via
ubiops project_export create), then fill in the secret environment variables
listed above before sending requests.