Skip to content

Advanced Usage

sml advanced bypasses the model catalog and the interactive menu. You specify every launch parameter on the command line. Use it when:

  • The model you want isn’t in the curated catalog.
  • You need to pass framework-specific flags (custom --tp-size, attention backend, quant config, …).
  • You’re scripting from CI and want a fully declarative invocation.

For the guided flow with a curated catalog, use sml.

Arguments

Argument Environment Variable Description
--firecrest-system SML_FIRECREST_SYSTEM Target HPC system
--partition SML_PARTITION SLURM partition
--slurm-account SML_ACCOUNT SLURM account used for job submission
--slurm-reservation SML_RESERVATION SLURM reservation (optional)
--serving-framework Inference framework (sglang, vllm) — required
--slurm-environment Local path to the environment .toml file — required
--framework-args Arguments forwarded to the inference framework
--slurm-replicas Number of replicas (default: 1)
--slurm-nodes-per-replica Nodes per replica (default: 1)
--slurm-time Job time limit HH:MM:SS (default: 02:00:00)
--served-model-name Name under which the model is served (auto-generated if omitted)
--use-router Load-balance across replicas (needs replicas > 1)
--router-args Arguments forwarded to the router
--disable-ocf Disable OCF wrapper
--otela-bootstrap-addr Override the OCF/OpenTela bootstrap peer (full multiaddr)
--dev Shorthand for the dev OCF/OpenTela bootstrap peer
--disable-metrics Disable vmagent metrics push
--disable-dcgm-exporter Disable DCGM GPU metrics exporter
--pre-launch-cmds Shell commands to run before the framework starts
--output-script DIR Render master.sh + rank scripts into DIR and exit (no submit)

Total nodes is --slurm-replicas × --slurm-nodes-per-replica. The framework HTTP port is 8080.

Example: Apertus 8B on Clariden with sglang

sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --serving-framework sglang \
  --slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
  --framework-args "--model-path /capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509 \
    --served-model-name swiss-ai/Apertus-8B-Instruct-2509-$(whoami) \
    --host 0.0.0.0 \
    --enable-metrics"

Note: A model named swiss-ai/Apertus-8B-Instruct-2509 is usually already running. The --served-model-name suffix avoids name collisions with shared deployments.

For more ready-to-run scripts per cluster and vendor, see examples/.

Inspecting what would be submitted (--output-script DIR)

--output-script DIR writes the rendered submission scripts into the given directory and exits without submitting:

sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --serving-framework sglang \
  --slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
  --framework-args "--model-path /capstor/.../Apertus-8B-Instruct-2509 \
    --served-model-name swiss-ai/Apertus-8B-Instruct-2509-$(whoami) \
    --host 0.0.0.0 --enable-metrics" \
  --output-script /tmp/debug

Produces something like:

Wrote 2 file(s) to /tmp/debug:
  master.sh
  head.sh

For a multi-node / router config the directory also gets follower.sh and/or router.sh. The layout is byte-identical to what a live submission writes to ~/.sml/job-${SLURM_JOB_ID}/ at job start — so each rank shape is its own bash file:

shellcheck /tmp/debug/*.sh        # lint each independently
cat /tmp/debug/head.sh            # inspect just the head-rank logic
sbatch /tmp/debug/master.sh       # submit manually if you want
diff /tmp/debug/head.sh /tmp/older-debug/head.sh   # compare runs

Useful for:

  • Debugging a launch failure: see exactly what --framework-args your invocation translated to (the --port 8080 auto-injection, etc.), which srun calls would run, and what each rank shape does.
  • Reviewing changes during SML development: render against a known invocation before and after a code change, diff the rank scripts.
  • Starting point for a hand-tuned job: edit any of the rank scripts, then sbatch master.sh directly.

After a real (non---output-script) submission, the same rank scripts also land on disk at ~/.sml/job-${SLURM_JOB_ID}/ for post-mortem inspection.

master.sh is self-contained. Rank scripts are embedded as cat-heredocs and extracted at job start to $HOME/.sml/job-${SLURM_JOB_ID}/ — shared FS, so every compute node srun reaches can read them. The sibling head.sh / follower.sh / router.sh from --output-script are inspection-only and never read at runtime; to hand-tune, edit the heredoc bodies inside master.sh.

When to disable OCF

“OCF” and “OpenTela” refer to the same thing — OCF is the on-disk binary name from the OpenTela project. The flag is --disable-ocf for historical reasons.

By default, every replica joins the OpenTela p2p mesh at startup. That registration is what makes the model resolvable through the public gateway at serving.swissai.svc.cscs.ch. See Architecture for the longer story.

Pass --disable-ocf when:

  • You’re benchmarking max throughput. OpenTela adds a hop on the request path; disabling it gives you the framework’s raw numbers. See Benchmarking.
  • You want the model kept private. With OpenTela disabled, the replica never registers with the mesh — so serving-api can’t find it and it isn’t reachable from outside the cluster. Useful for private fine-tunes or in-flight experiments.
  • You’re running at scale and the mesh is in the way. If you’ve stood up your own routing in front of N replicas (or you’re driving load directly from another cluster job), OpenTela registration is just overhead.

If you disable it, you’re responsible for reaching the model yourself — usually directly via its host:port from another job on the same cluster.

Pointing at a different OCF bootstrap peer

The bootstrap multiaddr the replica uses to join the mesh is baked in — it’s the prod peer by default. Two flags override it:

  • --dev — switch to the dev-datacenter peer. Shorthand for the most common alternate environment.
  • --otela-bootstrap-addr <multiaddr> — point at an arbitrary peer, e.g. an OCF instance running in another datacenter or on a custom IP. Takes precedence over --dev if both are passed (with a warning).

Example:

sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --serving-framework sglang \
  --slurm-environment src/swiss_ai_model_launch/assets/envs/sglang.toml \
  --framework-args "..." \
  --dev

Or for a custom peer:

sml advanced \
  ... \
  --otela-bootstrap-addr /ip4/10.0.0.42/tcp/43905/p2p/QmYourPeerId...

The chosen multiaddr is recorded under ocf_bootstrap_addr in the telemetry payload, so launches against different environments are distinguishable downstream.

Notes on flag style

  • sml advanced takes system and partition as arguments, not env vars. This keeps each script reproducible without depending on shell state. (The interactive sml flow is different — see the env-var tip there.)
  • --framework-args is a single quoted string forwarded verbatim to the framework. Keep it explicit; SML doesn’t massage it.

Next