ubiops-deployments

UbiOps export (format spec v8.0, exported 2026-06-02) bundling the deployments behind the MinDef metadata/throughput setup. All deployments are OpenAI-compatible and run in request format (supports_request_format: true) with plain input/output.

Layout

deployments/
├── deployment-gpt-oss-chat/        # the LLM serving deployment
│   └── deployment_gpt-oss-120b.yaml
├── deployments-embedder/           # embedding model
│   └── deployment_bge-m3/
└── deployments-proxies/            # OpenAI-compatible proxy deployments
    ├── deployment_llm-proxy/
    └── deployment_proxy-gpt-oss-batch-3x/

Each deployment folder holds its deployment_*.yaml (deployment config) and a versions/ folder with one *.yaml + *.zip per version (the YAML is the version config, the ZIP is the packaged code).

Deployments

Deployment	Default version	Purpose
`gpt-oss-120b`	`v-gpt-120b-tool-calling`	Serves `openai/gpt-oss-120b` via vLLM on a `16gb_8vcpu_rtxpro` GPU instance.
`bge-m3`	`v3`	BGE-M3 embedding model.
`llm-proxy`	`v11`	OpenAI-compatible proxy routing requests to UbiOps deployments.
`proxy-gpt-oss-batch-3x`	`v1`	Proxy fanning batch requests across GPT-OSS instances.

Configuration

Secrets are exported empty and must be set per environment:

gpt-oss-120b — HF_TOKEN (secret), MODEL_NAME (openai/gpt-oss-120b). The serving version also sets VLLM_USE_V1=1, GPU_MEMORY_UTILIZATION=0.90, and MAX_MODEL_LEN=125000, with a /health check on port 8000.
llm-proxy — UBIOPS_API_TOKEN (secret).

Importing

Import this directory as a UbiOps project export (e.g. via ubiops project_export create), then fill in the secret environment variables listed above before sending requests. Note that this implementation requires outbound internet acces.

When running in airgapped environments, users can make use of the bring your own docker image functionality

2.0 KiB Raw Blame History

ubiops-deployments

Layout

Deployments

Configuration

Importing

2.0 KiB

Raw Blame History