.. _batch_scripts: ================= Batch Job Scripts ================= This page provides ready-to-use batch script templates for running MC/DC on HPC systems with the three most common job schedulers: Slurm, Flux, and LSF. Each template follows the same workflow: #. Load required modules. #. Activate the Python environment. #. Launch MC/DC with the appropriate MPI wrapper. Adapt the resource requests (nodes, tasks, GPUs, wall-time, queue) to your allocation and problem size. Slurm ----- `Slurm `_ is widely used on LLNL's Quartz and Dane, as well as many university and national lab clusters. **CPU-only (Numba mode):** .. code-block:: bash #!/bin/bash #SBATCH --job-name=mcdc_run #SBATCH --nodes=2 #SBATCH --ntasks-per-node=36 #SBATCH --time=01:00:00 #SBATCH --partition=pbatch module load python/3.11 source /path/to/your/venv/bin/activate srun python input.py --mode=numba --caching **GPU (Nvidia, e.g., Lassen-like systems with Slurm):** .. code-block:: bash #!/bin/bash #SBATCH --job-name=mcdc_gpu #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --gpus-per-task=1 #SBATCH --time=00:30:00 #SBATCH --partition=gpu module load python/3.11 cuda/11.8 source /path/to/your/venv/bin/activate srun python input.py --mode=numba --target=gpu --gpu_strategy=event Flux ---- `Flux `_ is the scheduler on LLNL's Tioga and El Capitan systems (AMD MI250X / MI300A GPUs). **CPU-only:** .. code-block:: bash #!/bin/bash module load cray-mpich python/3.11 source /path/to/your/venv/bin/activate flux run -N 2 -n 72 python input.py --mode=numba --caching **GPU (AMD MI300A, El Capitan):** .. code-block:: bash #!/bin/bash module load cray-mpich rocm/6.0.0 python/3.11 source /path/to/your/venv/bin/activate flux run -N 2 -n 8 -g 1 --queue=mi300a \ python input.py --mode=numba --target=gpu \ --gpu_arena_size=100000000 --gpu_strategy=event This launches MC/DC on 2 nodes with 8 GPUs total (4 per node) on the MI300A partition. LSF --- `LSF `_ is used on LLNL's Lassen (IBM POWER9 + Nvidia V100). **CPU-only:** .. code-block:: bash #!/bin/bash #BSUB -J mcdc_run #BSUB -nnodes 2 #BSUB -W 60 #BSUB -q pbatch module load gcc/8 cuda/11.8 conda activate mcdc-env jsrun -n 8 -r 4 -a 1 -c 10 python input.py --mode=numba --caching **GPU (Nvidia V100, Lassen):** .. code-block:: bash #!/bin/bash #BSUB -J mcdc_gpu #BSUB -nnodes 1 #BSUB -W 30 #BSUB -q pbatch module load gcc/8 cuda/11.8 conda activate mcdc-env jsrun -n 4 -r 4 -a 1 -g 1 \ python input.py --mode=numba --target=gpu --gpu_strategy=async This runs MC/DC on 1 node with 4 GPUs using the asynchronous scheduler. Tips ---- * **Start small:** Test with a short wall-time and few particles before submitting large production runs. * **Use caching:** Adding ``--caching`` saves Numba-compiled binaries so that subsequent runs skip the JIT compilation step. * **Clear cache when updating MC/DC:** If you update the code, run once with ``--clear_cache --caching`` to regenerate binaries. * **Check module order:** On some systems the order of ``module load`` commands matters. Load the compiler/MPI module before CUDA/ROCm. * **Interactive debugging:** Request an interactive node first (``salloc``, ``flux alloc``, or ``lalloc``) to test your command before committing to a batch job.