Advanced Usage¶

sml advanced bypasses the model catalog and the interactive menu. You specify every launch parameter on the command line. Use it when:

The model you want isn’t in the curated catalog.
You need to pass framework-specific flags (custom --tp-size, attention backend, quant config, …).
You’re scripting from CI and want a fully declarative invocation.

For the guided flow with a curated catalog, use sml.

Arguments¶

Argument	Environment Variable	Description
`--firecrest-system`	`SML_FIRECREST_SYSTEM`	Target HPC system
`--partition`	`SML_PARTITION`	SLURM partition
`--slurm-reservation`	`SML_RESERVATION`	SLURM reservation (optional)
`--serving-framework`		Inference framework (`sglang`, `vllm`) — required
`--slurm-environment`		Local path to the environment `.toml` file — required
`--framework-args`		Arguments forwarded to the inference framework
`--slurm-nodes`		Total nodes (default: `replicas × nodes-per-replica`)
`--slurm-replicas`		Number of replicas (default: `1`)
`--slurm-nodes-per-replica`		Nodes per replica (default: `1`)
`--slurm-time`		Job time limit `HH:MM:SS` (default: `00:05:00`)
`--served-model-name`		Name under which the model is served (auto-generated if omitted)
`--replica-port`		Port used by replicas (default: `5000`)
`--use-router`		Enable router to load-balance across replicas
`--router-args`		Arguments forwarded to the router
`--disable-ocf`		Disable OCF wrapper
`--pre-launch-cmds`		Shell commands to run before the framework starts

Example: Apertus 8B on Clariden with sglang¶

sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-replicas 1 \
  --slurm-nodes-per-replica 1 \
  --serving-framework sglang \
  --slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
  --framework-args "--model-path /capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509 \
    --served-model-name swiss-ai/Apertus-8B-Instruct-2509-$(whoami) \
    --host 0.0.0.0 \
    --port 8080"

Note: A model named swiss-ai/Apertus-8B-Instruct-2509 is usually already running. The --served-model-name suffix avoids name collisions with shared deployments.

For more ready-to-run scripts per cluster and vendor, see examples/.

When to disable OCF¶

“OCF” and “OpenTela” refer to the same thing — OCF is the on-disk binary name from the OpenTela project. The flag is --disable-ocf for historical reasons.

By default, every replica joins the OpenTela p2p mesh at startup. That registration is what makes the model resolvable through the public gateway at serving.swissai.svc.cscs.ch. See Architecture for the longer story.

Pass --disable-ocf when:

You’re benchmarking max throughput. OpenTela adds a hop on the request path; disabling it gives you the framework’s raw numbers. See Benchmarking.
You want the model kept private. With OpenTela disabled, the replica never registers with the mesh — so serving-api can’t find it and it isn’t reachable from outside the cluster. Useful for private fine-tunes or in-flight experiments.
You’re running at scale and the mesh is in the way. If you’ve stood up your own routing in front of N replicas (or you’re driving load directly from another cluster job), OpenTela registration is just overhead.

If you disable it, you’re responsible for reaching the model yourself — usually directly via its host:port from another job on the same cluster.

Notes on flag style¶

sml advanced takes system and partition as arguments, not env vars. This keeps each script reproducible without depending on shell state. (The interactive sml flow is different — see the env-var tip there.)
--framework-args is a single quoted string forwarded verbatim to the framework. Keep it explicit; SML doesn’t massage it.

Next¶

How to size a model — picking the right replica/node layout
Benchmarking — throughput and latency measurement
Architecture — how sml advanced fits with the serving stack