Batch Job Scripts

This page provides ready-to-use batch script templates for running MC/DC on HPC systems with the three most common job schedulers: Slurm, Flux, and LSF.

Each template follows the same workflow:

  1. Load required modules.

  2. Activate the Python environment.

  3. Launch MC/DC with the appropriate MPI wrapper.

Adapt the resource requests (nodes, tasks, GPUs, wall-time, queue) to your allocation and problem size.

Slurm

Slurm is widely used on LLNL’s Quartz and Dane, as well as many university and national lab clusters.

CPU-only (Numba mode):

#!/bin/bash
#SBATCH --job-name=mcdc_run
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=36
#SBATCH --time=01:00:00
#SBATCH --partition=pbatch

module load python/3.11
source /path/to/your/venv/bin/activate

srun python input.py --mode=numba --caching

GPU (Nvidia, e.g., Lassen-like systems with Slurm):

#!/bin/bash
#SBATCH --job-name=mcdc_gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH --partition=gpu

module load python/3.11 cuda/11.8
source /path/to/your/venv/bin/activate

srun python input.py --mode=numba --target=gpu --gpu_strategy=event

Flux

Flux is the scheduler on LLNL’s Tioga and El Capitan systems (AMD MI250X / MI300A GPUs).

CPU-only:

#!/bin/bash

module load cray-mpich python/3.11
source /path/to/your/venv/bin/activate

flux run -N 2 -n 72 python input.py --mode=numba --caching

GPU (AMD MI300A, El Capitan):

#!/bin/bash

module load cray-mpich rocm/6.0.0 python/3.11
source /path/to/your/venv/bin/activate

flux run -N 2 -n 8 -g 1 --queue=mi300a \
    python input.py --mode=numba --target=gpu \
    --gpu_arena_size=100000000 --gpu_strategy=event

This launches MC/DC on 2 nodes with 8 GPUs total (4 per node) on the MI300A partition.

LSF

LSF is used on LLNL’s Lassen (IBM POWER9 + Nvidia V100).

CPU-only:

#!/bin/bash
#BSUB -J mcdc_run
#BSUB -nnodes 2
#BSUB -W 60
#BSUB -q pbatch

module load gcc/8 cuda/11.8
conda activate mcdc-env

jsrun -n 8 -r 4 -a 1 -c 10 python input.py --mode=numba --caching

GPU (Nvidia V100, Lassen):

#!/bin/bash
#BSUB -J mcdc_gpu
#BSUB -nnodes 1
#BSUB -W 30
#BSUB -q pbatch

module load gcc/8 cuda/11.8
conda activate mcdc-env

jsrun -n 4 -r 4 -a 1 -g 1 \
    python input.py --mode=numba --target=gpu --gpu_strategy=async

This runs MC/DC on 1 node with 4 GPUs using the asynchronous scheduler.

Tips

  • Start small: Test with a short wall-time and few particles before submitting large production runs.

  • Use caching: Adding --caching saves Numba-compiled binaries so that subsequent runs skip the JIT compilation step.

  • Clear cache when updating MC/DC: If you update the code, run once with --clear_cache --caching to regenerate binaries.

  • Check module order: On some systems the order of module load commands matters. Load the compiler/MPI module before CUDA/ROCm.

  • Interactive debugging: Request an interactive node first (salloc, flux alloc, or lalloc) to test your command before committing to a batch job.