Advanced Usage¶
sml advanced bypasses the model catalog and the interactive menu. You specify every launch parameter on the command line. Use it when:
- The model you want isn’t in the curated catalog.
- You need to pass framework-specific flags (custom
--tp-size, attention backend, quant config, …). - You’re scripting from CI and want a fully declarative invocation.
For the guided flow with a curated catalog, use sml.
Arguments¶
| Argument | Environment Variable | Description |
|---|---|---|
--firecrest-system |
SML_FIRECREST_SYSTEM |
Target HPC system |
--partition |
SML_PARTITION |
SLURM partition |
--slurm-reservation |
SML_RESERVATION |
SLURM reservation (optional) |
--serving-framework |
Inference framework (sglang, vllm) — required |
|
--slurm-environment |
Local path to the environment .toml file — required |
|
--framework-args |
Arguments forwarded to the inference framework | |
--slurm-nodes |
Total nodes (default: replicas × nodes-per-replica) |
|
--slurm-replicas |
Number of replicas (default: 1) |
|
--slurm-nodes-per-replica |
Nodes per replica (default: 1) |
|
--slurm-time |
Job time limit HH:MM:SS (default: 00:05:00) |
|
--served-model-name |
Name under which the model is served (auto-generated if omitted) | |
--replica-port |
Port used by replicas (default: 5000) |
|
--use-router |
Enable router to load-balance across replicas | |
--router-args |
Arguments forwarded to the router | |
--disable-ocf |
Disable OCF wrapper | |
--pre-launch-cmds |
Shell commands to run before the framework starts |
Example: Apertus 8B on Clariden with sglang¶
sml advanced \
--firecrest-system clariden \
--partition normal \
--slurm-replicas 1 \
--slurm-nodes-per-replica 1 \
--serving-framework sglang \
--slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
--framework-args "--model-path /capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509 \
--served-model-name swiss-ai/Apertus-8B-Instruct-2509-$(whoami) \
--host 0.0.0.0 \
--port 8080"
Note: A model named
swiss-ai/Apertus-8B-Instruct-2509is usually already running. The--served-model-namesuffix avoids name collisions with shared deployments.
For more ready-to-run scripts per cluster and vendor, see examples/.
When to disable OCF¶
“OCF” and “OpenTela” refer to the same thing —
OCFis the on-disk binary name from the OpenTela project. The flag is--disable-ocffor historical reasons.
By default, every replica joins the OpenTela p2p mesh at startup. That registration is what makes the model resolvable through the public gateway at serving.swissai.svc.cscs.ch. See Architecture for the longer story.
Pass --disable-ocf when:
- You’re benchmarking max throughput. OpenTela adds a hop on the request path; disabling it gives you the framework’s raw numbers. See Benchmarking.
- You want the model kept private. With OpenTela disabled, the replica never registers with the mesh — so serving-api can’t find it and it isn’t reachable from outside the cluster. Useful for private fine-tunes or in-flight experiments.
- You’re running at scale and the mesh is in the way. If you’ve stood up your own routing in front of N replicas (or you’re driving load directly from another cluster job), OpenTela registration is just overhead.
If you disable it, you’re responsible for reaching the model yourself — usually directly via its host:port from another job on the same cluster.
Notes on flag style¶
sml advancedtakes system and partition as arguments, not env vars. This keeps each script reproducible without depending on shell state. (The interactivesmlflow is different — see the env-var tip there.)--framework-argsis a single quoted string forwarded verbatim to the framework. Keep it explicit; SML doesn’t massage it.
Next¶
- How to size a model — picking the right replica/node layout
- Benchmarking — throughput and latency measurement
- Architecture — how
sml advancedfits with the serving stack