50 lines
2.0 KiB
Markdown
50 lines
2.0 KiB
Markdown
# ubiops-deployments
|
|
|
|
UbiOps export (format spec `v8.0`, exported 2026-06-02) bundling the
|
|
deployments behind the MinDef metadata/throughput setup. All deployments are
|
|
OpenAI-compatible and run in request format (`supports_request_format: true`)
|
|
with `plain` input/output.
|
|
|
|
## Layout
|
|
|
|
```
|
|
deployments/
|
|
├── deployment-gpt-oss-chat/ # the LLM serving deployment
|
|
│ └── deployment_gpt-oss-120b.yaml
|
|
├── deployments-embedder/ # embedding model
|
|
│ └── deployment_bge-m3/
|
|
└── deployments-proxies/ # OpenAI-compatible proxy deployments
|
|
├── deployment_llm-proxy/
|
|
└── deployment_proxy-gpt-oss-batch-3x/
|
|
```
|
|
|
|
Each deployment folder holds its `deployment_*.yaml` (deployment config) and a
|
|
`versions/` folder with one `*.yaml` + `*.zip` per version (the YAML is the
|
|
version config, the ZIP is the packaged code).
|
|
|
|
## Deployments
|
|
|
|
| Deployment | Default version | Purpose |
|
|
|---|---|---|
|
|
| `gpt-oss-120b` | `v-gpt-120b-tool-calling` | Serves `openai/gpt-oss-120b` via vLLM on a `16gb_8vcpu_rtxpro` GPU instance. |
|
|
| `bge-m3` | `v3` | BGE-M3 embedding model. |
|
|
| `llm-proxy` | `v11` | OpenAI-compatible proxy routing requests to UbiOps deployments. |
|
|
| `proxy-gpt-oss-batch-3x` | `v1` | Proxy fanning batch requests across GPT-OSS instances. |
|
|
|
|
## Configuration
|
|
|
|
Secrets are exported empty and must be set per environment:
|
|
|
|
- `gpt-oss-120b` — `HF_TOKEN` (secret), `MODEL_NAME` (`openai/gpt-oss-120b`).
|
|
The serving version also sets `VLLM_USE_V1=1`, `GPU_MEMORY_UTILIZATION=0.90`,
|
|
and `MAX_MODEL_LEN=125000`, with a `/health` check on port 8000.
|
|
- `llm-proxy` — `UBIOPS_API_TOKEN` (secret).
|
|
|
|
## Importing
|
|
|
|
Import this directory as a UbiOps project export (e.g. via
|
|
`ubiops project_export create`), then fill in the secret environment variables
|
|
listed above before sending requests. Note that this implementation requires outbound internet acces.
|
|
|
|
When running in airgapped environments, users can make use of the bring your own docker image functionality
|