# ubiops-deployments UbiOps export (format spec `v8.0`, exported 2026-06-02) bundling the deployments behind the MinDef metadata/throughput setup. All deployments are OpenAI-compatible and run in request format (`supports_request_format: true`) with `plain` input/output. ## Layout ``` deployments/ ├── deployment-gpt-oss-chat/ # the LLM serving deployment │ └── deployment_gpt-oss-120b.yaml ├── deployments-embedder/ # embedding model │ └── deployment_bge-m3/ └── deployments-proxies/ # OpenAI-compatible proxy deployments ├── deployment_llm-proxy/ └── deployment_proxy-gpt-oss-batch-3x/ ``` Each deployment folder holds its `deployment_*.yaml` (deployment config) and a `versions/` folder with one `*.yaml` + `*.zip` per version (the YAML is the version config, the ZIP is the packaged code). ## Deployments | Deployment | Default version | Purpose | |---|---|---| | `gpt-oss-120b` | `v-gpt-120b-tool-calling` | Serves `openai/gpt-oss-120b` via vLLM on a `16gb_8vcpu_rtxpro` GPU instance. | | `bge-m3` | `v3` | BGE-M3 embedding model. | | `llm-proxy` | `v11` | OpenAI-compatible proxy routing requests to UbiOps deployments. | | `proxy-gpt-oss-batch-3x` | `v1` | Proxy fanning batch requests across GPT-OSS instances. | ## Configuration Secrets are exported empty and must be set per environment: - `gpt-oss-120b` — `HF_TOKEN` (secret), `MODEL_NAME` (`openai/gpt-oss-120b`). The serving version also sets `VLLM_USE_V1=1`, `GPU_MEMORY_UTILIZATION=0.90`, and `MAX_MODEL_LEN=125000`, with a `/health` check on port 8000. - `llm-proxy` — `UBIOPS_API_TOKEN` (secret). ## Importing Import this directory as a UbiOps project export (e.g. via `ubiops project_export create`), then fill in the secret environment variables listed above before sending requests.