Advanced Usage¶
sml advanced bypasses the model catalog and the interactive menu. You specify every launch parameter on the command line. Use it when:
- The model you want isn’t in the curated catalog.
- You need to pass framework-specific flags (custom
--tp-size, attention backend, quant config, …). - You’re scripting from CI and want a fully declarative invocation.
For the guided flow with a curated catalog, use sml.
Arguments¶
| Argument | Environment Variable | Description |
|---|---|---|
--firecrest-system |
SML_FIRECREST_SYSTEM |
Target HPC system |
--partition |
SML_PARTITION |
SLURM partition |
--slurm-account |
SML_ACCOUNT |
SLURM account used for job submission |
--slurm-reservation |
SML_RESERVATION |
SLURM reservation (optional) |
--serving-framework |
Inference framework (sglang, vllm) — required |
|
--slurm-environment |
Local path to the environment .toml file — required |
|
--framework-args |
Arguments forwarded to the inference framework | |
--slurm-replicas |
Number of replicas (default: 1) |
|
--slurm-nodes-per-replica |
Nodes per replica (default: 1) |
|
--slurm-time |
Job time limit HH:MM:SS (default: 02:00:00) |
|
--served-model-name |
Name under which the model is served (auto-generated if omitted) | |
--use-router |
Load-balance across replicas (needs replicas > 1) |
|
--router-args |
Arguments forwarded to the router | |
--disable-ocf |
Disable OCF wrapper | |
--otela-bootstrap-addr |
Override the OCF/OpenTela bootstrap peer (full multiaddr) | |
--dev |
Shorthand for the dev OCF/OpenTela bootstrap peer | |
--disable-metrics |
Disable vmagent metrics push | |
--disable-dcgm-exporter |
Disable DCGM GPU metrics exporter | |
--pre-launch-cmds |
Shell commands to run before the framework starts | |
--output-script DIR |
Render master.sh + rank scripts into DIR and exit (no submit) |
Total nodes is
--slurm-replicas × --slurm-nodes-per-replica. The framework HTTP port is 8080.
Example: Apertus 8B on Clariden with sglang¶
sml advanced \
--firecrest-system clariden \
--partition normal \
--serving-framework sglang \
--slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
--framework-args "--model-path /capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509 \
--served-model-name swiss-ai/Apertus-8B-Instruct-2509-$(whoami) \
--host 0.0.0.0 \
--enable-metrics"
Note: A model named
swiss-ai/Apertus-8B-Instruct-2509is usually already running. The--served-model-namesuffix avoids name collisions with shared deployments.
For more ready-to-run scripts per cluster and vendor, see examples/.
Inspecting what would be submitted (--output-script DIR)¶
--output-script DIR writes the rendered submission scripts into the given directory and exits without submitting:
sml advanced \
--firecrest-system clariden \
--partition normal \
--serving-framework sglang \
--slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
--framework-args "--model-path /capstor/.../Apertus-8B-Instruct-2509 \
--served-model-name swiss-ai/Apertus-8B-Instruct-2509-$(whoami) \
--host 0.0.0.0 --enable-metrics" \
--output-script /tmp/debug
Produces something like:
For a multi-node / router config the directory also gets follower.sh and/or router.sh. The layout is byte-identical to what a live submission writes to ~/.sml/job-${SLURM_JOB_ID}/ at job start — so each rank shape is its own bash file:
shellcheck /tmp/debug/*.sh # lint each independently
cat /tmp/debug/head.sh # inspect just the head-rank logic
sbatch /tmp/debug/master.sh # submit manually if you want
diff /tmp/debug/head.sh /tmp/older-debug/head.sh # compare runs
Useful for:
- Debugging a launch failure: see exactly what
--framework-argsyour invocation translated to (the--port 8080auto-injection, etc.), whichsruncalls would run, and what each rank shape does. - Reviewing changes during SML development: render against a known invocation before and after a code change, diff the rank scripts.
- Starting point for a hand-tuned job: edit any of the rank scripts, then
sbatch master.shdirectly.
After a real (non---output-script) submission, the same rank scripts also land on disk at ~/.sml/job-${SLURM_JOB_ID}/ for post-mortem inspection.
master.shis self-contained. Rank scripts are embedded ascat-heredocs and extracted at job start to$HOME/.sml/job-${SLURM_JOB_ID}/— shared FS, so every compute nodesrunreaches can read them. The siblinghead.sh/follower.sh/router.shfrom--output-scriptare inspection-only and never read at runtime; to hand-tune, edit the heredoc bodies insidemaster.sh.
When to disable OCF¶
“OCF” and “OpenTela” refer to the same thing —
OCFis the on-disk binary name from the OpenTela project. The flag is--disable-ocffor historical reasons.
By default, every replica joins the OpenTela p2p mesh at startup. That registration is what makes the model resolvable through the public gateway at serving.swissai.svc.cscs.ch. See Architecture for the longer story.
Pass --disable-ocf when:
- You’re benchmarking max throughput. OpenTela adds a hop on the request path; disabling it gives you the framework’s raw numbers. See Benchmarking.
- You want the model kept private. With OpenTela disabled, the replica never registers with the mesh — so serving-api can’t find it and it isn’t reachable from outside the cluster. Useful for private fine-tunes or in-flight experiments.
- You’re running at scale and the mesh is in the way. If you’ve stood up your own routing in front of N replicas (or you’re driving load directly from another cluster job), OpenTela registration is just overhead.
If you disable it, you’re responsible for reaching the model yourself — usually directly via its host:port from another job on the same cluster.
Pointing at a different OCF bootstrap peer¶
The bootstrap multiaddr the replica uses to join the mesh is baked in — it’s the prod peer by default. Two flags override it:
--dev— switch to the dev-datacenter peer. Shorthand for the most common alternate environment.--otela-bootstrap-addr <multiaddr>— point at an arbitrary peer, e.g. an OCF instance running in another datacenter or on a custom IP. Takes precedence over--devif both are passed (with a warning).
Example:
sml advanced \
--firecrest-system clariden \
--partition normal \
--serving-framework sglang \
--slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
--framework-args "..." \
--dev
Or for a custom peer:
The chosen multiaddr is recorded under ocf_bootstrap_addr in the telemetry payload, so launches against different environments are distinguishable downstream.
Notes on flag style¶
sml advancedtakes system and partition as arguments, not env vars. This keeps each script reproducible without depending on shell state. (The interactivesmlflow is different — see the env-var tip there.)--framework-argsis a single quoted string forwarded verbatim to the framework. Keep it explicit; SML doesn’t massage it.
Next¶
- How to size a model — picking the right replica/node layout
- Benchmarking — throughput and latency measurement
- Architecture — how
sml advancedfits with the serving stack