.. _batch_scripts:
=================
Batch Job Scripts
=================
This page provides ready-to-use batch script templates for running MC/DC on
HPC systems with the three most common job schedulers: Slurm, Flux, and LSF.
Each template follows the same workflow:
#. Load required modules.
#. Activate the Python environment.
#. Launch MC/DC with the appropriate MPI wrapper.
Adapt the resource requests (nodes, tasks, GPUs, wall-time, queue) to your
allocation and problem size.
Slurm
-----
`Slurm `_ is widely used on LLNL's Quartz and Dane,
as well as many university and national lab clusters.
**CPU-only (Numba mode):**
.. code-block:: bash
#!/bin/bash
#SBATCH --job-name=mcdc_run
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=36
#SBATCH --time=01:00:00
#SBATCH --partition=pbatch
module load python/3.11
source /path/to/your/venv/bin/activate
srun python input.py --mode=numba --caching
**GPU (Nvidia, e.g., Lassen-like systems with Slurm):**
.. code-block:: bash
#!/bin/bash
#SBATCH --job-name=mcdc_gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --time=00:30:00
#SBATCH --partition=gpu
module load python/3.11 cuda/11.8
source /path/to/your/venv/bin/activate
srun python input.py --mode=numba --target=gpu --gpu_strategy=event
Flux
----
`Flux `_ is the scheduler on LLNL's Tioga and
El Capitan systems (AMD MI250X / MI300A GPUs).
**CPU-only:**
.. code-block:: bash
#!/bin/bash
module load cray-mpich python/3.11
source /path/to/your/venv/bin/activate
flux run -N 2 -n 72 python input.py --mode=numba --caching
**GPU (AMD MI300A, El Capitan):**
.. code-block:: bash
#!/bin/bash
module load cray-mpich rocm/6.0.0 python/3.11
source /path/to/your/venv/bin/activate
flux run -N 2 -n 8 -g 1 --queue=mi300a \
python input.py --mode=numba --target=gpu \
--gpu_arena_size=100000000 --gpu_strategy=event
This launches MC/DC on 2 nodes with 8 GPUs total (4 per node) on the MI300A partition.
LSF
---
`LSF `_ is used on LLNL's
Lassen (IBM POWER9 + Nvidia V100).
**CPU-only:**
.. code-block:: bash
#!/bin/bash
#BSUB -J mcdc_run
#BSUB -nnodes 2
#BSUB -W 60
#BSUB -q pbatch
module load gcc/8 cuda/11.8
conda activate mcdc-env
jsrun -n 8 -r 4 -a 1 -c 10 python input.py --mode=numba --caching
**GPU (Nvidia V100, Lassen):**
.. code-block:: bash
#!/bin/bash
#BSUB -J mcdc_gpu
#BSUB -nnodes 1
#BSUB -W 30
#BSUB -q pbatch
module load gcc/8 cuda/11.8
conda activate mcdc-env
jsrun -n 4 -r 4 -a 1 -g 1 \
python input.py --mode=numba --target=gpu --gpu_strategy=async
This runs MC/DC on 1 node with 4 GPUs using the asynchronous scheduler.
Tips
----
* **Start small:** Test with a short wall-time and few particles before
submitting large production runs.
* **Use caching:** Adding ``--caching`` saves Numba-compiled binaries so that
subsequent runs skip the JIT compilation step.
* **Clear cache when updating MC/DC:** If you update the code, run once with
``--clear_cache --caching`` to regenerate binaries.
* **Check module order:** On some systems the order of ``module load`` commands
matters. Load the compiler/MPI module before CUDA/ROCm.
* **Interactive debugging:** Request an interactive node first
(``salloc``, ``flux alloc``, or ``lalloc``) to test your command before
committing to a batch job.