# ubiops-deployments UbiOps export (format spec `v8.0`, exported 2026-06-02) bundling the deployments behind the MinDef metadata/throughput setup. All deployments are OpenAI-compatible and run in request format (`supports_request_format: true`) with `plain` input/output. ## Layout ``` deployments/ ├── deployment-gpt-oss-chat/ # the LLM serving deployment │ └── deployment_gpt-oss-120b.yaml ├── deployments-embedder/ # embedding model │ └── deployment_bge-m3/ └── deployments-proxies/ # OpenAI-compatible proxy deployments ├── deployment_llm-proxy/ └── deployment_proxy-gpt-oss-batch-3x/ ``` Each deployment folder holds its `deployment_*.yaml` (deployment config) and a `versions/` folder with one `*.yaml` + `*.zip` per version (the YAML is the version config, the ZIP is the packaged code). ## Deployments | Deployment | Default version | Purpose | |---|---|---| | `gpt-oss-120b` | `v-gpt-120b-tool-calling` | Serves `openai/gpt-oss-120b` via vLLM on a `16gb_8vcpu_rtxpro` GPU instance. | | `bge-m3` | `v3` | BGE-M3 embedding model. | | `llm-proxy` | `v11` | OpenAI-compatible proxy routing requests to UbiOps deployments. | | `proxy-gpt-oss-batch-3x` | `v1` | Proxy fanning batch requests across GPT-OSS instances. | ## Configuration Secrets are exported empty and must be set per environment: - `gpt-oss-120b` — `HF_TOKEN` (secret), `MODEL_NAME` (`openai/gpt-oss-120b`). The serving version also sets `VLLM_USE_V1=1`, `GPU_MEMORY_UTILIZATION=0.90`, and `MAX_MODEL_LEN=125000`, with a `/health` check on port 8000. - `llm-proxy` — `UBIOPS_API_TOKEN` (secret). ## Importing Import this directory as a UbiOps project export (e.g. via `ubiops project_export create`), then fill in the secret environment variables listed above before sending requests. Note that this implementation requires outbound internet acces. When running in airgapped environments, users can make use of the bring your own docker image functionality