2.0 KiB
ubiops-deployments
UbiOps export (format spec v8.0, exported 2026-06-02) bundling the
deployments behind the MinDef metadata/throughput setup. All deployments are
OpenAI-compatible and run in request format (supports_request_format: true)
with plain input/output.
Layout
deployments/
├── deployment-gpt-oss-chat/ # the LLM serving deployment
│ └── deployment_gpt-oss-120b.yaml
├── deployments-embedder/ # embedding model
│ └── deployment_bge-m3/
└── deployments-proxies/ # OpenAI-compatible proxy deployments
├── deployment_llm-proxy/
└── deployment_proxy-gpt-oss-batch-3x/
Each deployment folder holds its deployment_*.yaml (deployment config) and a
versions/ folder with one *.yaml + *.zip per version (the YAML is the
version config, the ZIP is the packaged code).
Deployments
| Deployment | Default version | Purpose |
|---|---|---|
gpt-oss-120b |
v-gpt-120b-tool-calling |
Serves openai/gpt-oss-120b via vLLM on a 16gb_8vcpu_rtxpro GPU instance. |
bge-m3 |
v3 |
BGE-M3 embedding model. |
llm-proxy |
v11 |
OpenAI-compatible proxy routing requests to UbiOps deployments. |
proxy-gpt-oss-batch-3x |
v1 |
Proxy fanning batch requests across GPT-OSS instances. |
Configuration
Secrets are exported empty and must be set per environment:
gpt-oss-120b—HF_TOKEN(secret),MODEL_NAME(openai/gpt-oss-120b). The serving version also setsVLLM_USE_V1=1,GPU_MEMORY_UTILIZATION=0.90, andMAX_MODEL_LEN=125000, with a/healthcheck on port 8000.llm-proxy—UBIOPS_API_TOKEN(secret).
Importing
Import this directory as a UbiOps project export (e.g. via
ubiops project_export create), then fill in the secret environment variables
listed above before sending requests. Note that this implementation requires outbound internet acces.
When running in airgapped environments, users can make use of the bring your own docker image functionality