Batch Job Scripts¶
This page provides ready-to-use batch script templates for running MC/DC on HPC systems with the three most common job schedulers: Slurm, Flux, and LSF.
Each template follows the same workflow:
Load required modules.
Activate the Python environment.
Launch MC/DC with the appropriate MPI wrapper.
Adapt the resource requests (nodes, tasks, GPUs, wall-time, queue) to your allocation and problem size.
Slurm¶
Slurm is widely used on LLNL’s Quartz and Dane, as well as many university and national lab clusters.
CPU-only (Numba mode):
#!/bin/bash
#SBATCH --job-name=mcdc_run
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=36
#SBATCH --time=01:00:00
#SBATCH --partition=pbatch
module load python/3.11
source /path/to/your/venv/bin/activate
srun python input.py --mode=numba --caching
GPU (Nvidia, e.g., Lassen-like systems with Slurm):
#!/bin/bash
#SBATCH --job-name=mcdc_gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH --partition=gpu
module load python/3.11 cuda/11.8
source /path/to/your/venv/bin/activate
srun python input.py --mode=numba --target=gpu --gpu_strategy=event
Flux¶
Flux is the scheduler on LLNL’s Tioga and El Capitan systems (AMD MI250X / MI300A GPUs).
CPU-only:
#!/bin/bash
module load cray-mpich python/3.11
source /path/to/your/venv/bin/activate
flux run -N 2 -n 72 python input.py --mode=numba --caching
GPU (AMD MI300A, El Capitan):
#!/bin/bash
module load cray-mpich rocm/6.0.0 python/3.11
source /path/to/your/venv/bin/activate
flux run -N 2 -n 8 -g 1 --queue=mi300a \
python input.py --mode=numba --target=gpu \
--gpu_arena_size=100000000 --gpu_strategy=event
This launches MC/DC on 2 nodes with 8 GPUs total (4 per node) on the MI300A partition.
LSF¶
LSF is used on LLNL’s Lassen (IBM POWER9 + Nvidia V100).
CPU-only:
#!/bin/bash
#BSUB -J mcdc_run
#BSUB -nnodes 2
#BSUB -W 60
#BSUB -q pbatch
module load gcc/8 cuda/11.8
conda activate mcdc-env
jsrun -n 8 -r 4 -a 1 -c 10 python input.py --mode=numba --caching
GPU (Nvidia V100, Lassen):
#!/bin/bash
#BSUB -J mcdc_gpu
#BSUB -nnodes 1
#BSUB -W 30
#BSUB -q pbatch
module load gcc/8 cuda/11.8
conda activate mcdc-env
jsrun -n 4 -r 4 -a 1 -g 1 \
python input.py --mode=numba --target=gpu --gpu_strategy=async
This runs MC/DC on 1 node with 4 GPUs using the asynchronous scheduler.
Tips¶
Start small: Test with a short wall-time and few particles before submitting large production runs.
Use caching: Adding
--cachingsaves Numba-compiled binaries so that subsequent runs skip the JIT compilation step.Clear cache when updating MC/DC: If you update the code, run once with
--clear_cache --cachingto regenerate binaries.Check module order: On some systems the order of
module loadcommands matters. Load the compiler/MPI module before CUDA/ROCm.Interactive debugging: Request an interactive node first (
salloc,flux alloc, orlalloc) to test your command before committing to a batch job.